Email validation in PHP

Thursday 5 January 2006This is almost 19 years old. Be careful.

Naturally, fixing a bug in my comment system and mentioning it here yesterday merely encouraged the quiet malcontents to bring up their petty annoyances! I’m kidding: keep the bug reports coming. Platypus pointed out that the email validation didn’t deal properly with his domain name.

So I’ve improved the email validation code. I know what a lot of you are thinking: why validate the email at all? I do it because it’s a good way to prevent spam. Maybe someday I’ll get rid of it, but for now it stays.

The problem in Platypus’s case is that I look up MX records for the domain, and pl.atyp.us doesn’t have one, but atup.us does. This is the code I’m now using:

function IsValidEmail($email)
{
    // I got this originally from
    // http://www.developer.com/lang/php/article.php/10941_3290141_2
    // Create the syntactical validation regular expression
    $atom_re = "[a-z0-9!#$%&'*+\\/=?^_`{|}~-]+";
    $regexp = "/^(" . $atom_re . ")(\\." . $atom_re .
                ")*@([a-z0-9-]+)(\\.[a-z0-9-]+)*(\\.[a-z]{2,4})$/i";

    // Presume that the email is invalid
    $valid = 0;

    // Validate the syntax
    if (preg_match($regexp, $email)) {
        if (function_exists("getmxrr")) {
            list($username, $domaintld) = split("@", $email);
            while (substr_count($domaintld, ".") > 0) {
                // Validate the domain
                if (getmxrr($domaintld, $mxrecords)) {
                    $valid = 1;
                    break;
                }

                // Didn't find an MX record.
                // If we have a subdomain, move up the hierarchy.
                list($dummy, $domaintld) = split(".", $domaintld, 2);
            }
        }
        else {
            // Couldn't check the domain with getmxrr, assume the best.
            $valid = 1;
        }
    }

    return $valid;
}

Anyone else have a complaint?

Updated: I’ve since improved the code to deal with some of the issues in the comment thread here: Email validation again.

Comments

[gravatar]
I knew you weren't referring to me, Ned; nobody has ever accused me of being a quiet malcontent. ;)

Thanks for the fix.
[gravatar]
> Anyone else have a complaint?

Must...resist urge...to make...bad joke
[gravatar]
It doesn't accept "Rik Hemsley" (that's me) @rikkus.info, which would be a valid email address if I had set up such an alias on my mail server.

Hasn't anyone written an RFC822 compliant email address validator for PHP? Perhaps one could be based on the Perl version: http://mythic-beasts.com/~pdw/cgi-bin/emailvalidate
[gravatar]
Please, if you think you can meaningfully "validate" email addresses with a regex, do us all a favour and read what RFC 3696 has to say on the topic:

http://www.apps.ietf.org/rfc/rfc3696.html#sec-3

Note that, because quoting and escaping can be used, *any* printable ASCII character is valid in an email address. If you exclude any of them, you're excluding valid email addresses.

Please don't second-guess the email system. Ask the DNS to look up the domain and its MX; you'll get a DNS error response if it fails. If it succeeds, just use the email address as is to deliver messages; you'll get an SMTP error response if it fails.

Anything else is just over-complication, and asking for trouble.
[gravatar]
I understand that email address can technically contain all sorts of characters that people don't think they can. Rik's example of a space in the email address is just one example.

But this limitation is not realistically going to bite anyone (even Rik admits that he doesn't have an alias set up to use his example address). For example, Outlook, gmail, and Horde won't send to that address, while Thunderbird and Yahoo will.

I know the value of following a standard to the letter, and how it improves interoperability, and so on. I also know the value of spending time on the things that will truly make a difference. When people have complained here about actual email addresses that didn't work, I fixed the validation. I'm a little interested in supporting esoteric forms, but not much.
[gravatar]
Technically, the rule is that if something doesn't have an MX record you look for an A record for the host. Of course this doesn't give you any information about whether the host actually accepts email.

(The whole email spam problem has made really verifying email addresses all but impossible; the only truly reliable way to tell if an address is deliverable is to send it real email and see what happens.)

[gravatar]
I was being pedantic, but I think it's worth doing such things properly. I find that my real email address (rik@rikkus.info) is rejected by some major web sites which say that it's incorrect. Their 'checking' is so pathetic it rejects anything that doesn't match some out-of-date fixed list of TLDs!
[gravatar]
Rik, as it happens, I agree with you more than I thought: I improved the validation: http://www.nedbatchelder.com/blog/20060107T093734.html
[gravatar]
This is simple explanation..
I've latest posting..
You done a good job..

I expert your posting:-)

Thanks a lot..!

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.