Valid email addresses?

Sunday 14 August 2005This is more than 19 years old. Be careful.

A commenter on my Madlibs story (Blake Winton) complained that his perfectly valid email address was being refused by my email validator. His email address has a plus sign in it, and my validator didn’t like it. I was using this regex:

^([_a-z0-9-]+)(\.[_a-z0-9-]+)*@([a-z0-9-]+)(\.[a-z0-9-]+)*(\.[a-z]{2,4})$

Recognizing the limitations of this validator, I thought I would look up the real rules for email addresses. They are spelled out in RFC 2822, but after digging through there, nothing was clear at all. Not only are many characters allowed that I have never seen in email addresses (~ # ? { *), but there are also a handful of different quoting mechanisms!

Helpfully, Paul Warren has written a Perl regular expression to deal with all of the complexity. It is a 6200-character behemoth, and I don’t intend to adapt it for my use!

Of course, some of this complexity is for full addresses that won’t be entered into a comment form, but where to draw the line? For now, I’ve added the other special characters as allowed. If I get more complaints, I’ll revise further.

Comments

[gravatar]
That regex hurts me. I sometimes use [^"() \s]+@, though that might be a little lax.

But at a minimum [a-zA-Z0-9+_%-], I guess -- + is common, and I think I've seen an apostrophe before (once), and I think %, like +, is sometimes used as an extension (so x%foo goes to x), but I can't remember.
[gravatar]
Here's an address that would work for me that you're validator won't allow: yes+this-will*get#to$me%benjiyork.com

I doubt you're trying to validate to keep people from entering fake addresses (because it's virtually impossible to verify an email address), therefore you must be doing it to help people who mis-type their address. If so, I would suggest warning the user when their address doesn't match your regex, but letting them continue anyway if they so choose.
[gravatar]
Actually, I do it for two reasons: to discourage anonymous posting (which is impossible to prevent), and to help keep out spam.
[gravatar]
I'd like '+' to be allowed. In gmail, you can use a custom "tag" for email addresses: (myname+comments@gmail.com). That way you can keep track of whether an email is "leaked" to spammers and take appropriate action -- filter, ignore, whatever.
[gravatar]
If it makes you feel any better, most web-forms reject my email address. I even found one a while ago that didn't convert the plus sign on the way in, so when the form came back to me, it had my (failed) address displaying as "bwinton blog@latte.ca". Ugh. At least my name can be spelt in low-ascii. I shudder to think about what Frédéric Crozat (for instance) must go through.
[gravatar]
You know, I've religiously used the "synthesize a per-site email address" for years and only ever had one escape and be used maliciously. I'm beginning to wonder whether it's worth the effort.
[gravatar]
Given that a valid email address may still be completely undeliverable, why are you checking validity? What problem is solved by having a check at all?
[gravatar]
The complainants and pseudo experts all seem to take great pains to quote RFC 822. Well, newsflash, RFC 822 is obsoleted. At least point to the correct series of RFC documents. They don't even point at the correct section.subsection. They just say "RFC 822".

As a matter of practicality, most of the more esoteric characters can be avoided. They certainly are not in common use by any sane administrator. It is usually part of some homebrew filtering system. That is good and fine, but please don't expect everyone to change to suit one whack job.

BTW, the single quote might be allowed, but anyone sane who stores it in a db is going to avoid that one like the plague.

ps. nice touch having a second chance at a post before final commit.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.