Sunday 14 August 2005 — This is more than 19 years old. Be careful.
A commenter on my Madlibs story (Blake Winton) complained that his perfectly valid email address was being refused by my email validator. His email address has a plus sign in it, and my validator didn’t like it. I was using this regex:
^([_a-z0-9-]+)(\.[_a-z0-9-]+)*@([a-z0-9-]+)(\.[a-z0-9-]+)*(\.[a-z]{2,4})$
Recognizing the limitations of this validator, I thought I would look up the real rules for email addresses. They are spelled out in RFC 2822, but after digging through there, nothing was clear at all. Not only are many characters allowed that I have never seen in email addresses (~ # ? { *), but there are also a handful of different quoting mechanisms!
Helpfully, Paul Warren has written a Perl regular expression to deal with all of the complexity. It is a 6200-character behemoth, and I don’t intend to adapt it for my use!
Of course, some of this complexity is for full addresses that won’t be entered into a comment form, but where to draw the line? For now, I’ve added the other special characters as allowed. If I get more complaints, I’ll revise further.
Comments
But at a minimum [a-zA-Z0-9+_%-], I guess -- + is common, and I think I've seen an apostrophe before (once), and I think %, like +, is sometimes used as an extension (so x%foo goes to x), but I can't remember.
I doubt you're trying to validate to keep people from entering fake addresses (because it's virtually impossible to verify an email address), therefore you must be doing it to help people who mis-type their address. If so, I would suggest warning the user when their address doesn't match your regex, but letting them continue anyway if they so choose.
As a matter of practicality, most of the more esoteric characters can be avoided. They certainly are not in common use by any sane administrator. It is usually part of some homebrew filtering system. That is good and fine, but please don't expect everyone to change to suit one whack job.
BTW, the single quote might be allowed, but anyone sane who stores it in a db is going to avoid that one like the plague.
ps. nice touch having a second chance at a post before final commit.
Add a comment: