Saturday 22 August 2009 — This is more than 15 years old. Be careful.
My recent work on a consumer-facing product brought up the old problem: how to validate an email address before using it? There’s a classic tension here between those developers that want to prevent typos from floundering around in the system, giving users feedback as soon as possible that it seems like they’ve made a mistake; and those developers that want to be sure that any valid email address can be used.
The usual advice on this matter is to not bother with validation, because it’s a fool’s errand, instead simply send an email to the address with a confirmation link. If the user clicks the link, then the address must have been valid.
I don’t like this advice because the vast, vast majority of email addresses do validate with a simple regex, and the vast, vast majority of failures against the regex represent real mistakes, not obscure but valid email addresses. Catching user mistakes early is a good thing. Having the user wait for an email that will never come, then go back to enter their email address again is a pain.
This is the regex I used:
/^[^@ ]+@[^@ ]+\.[^@ ]+$/
In other words, an email address has to have stuff, at-sign, stuff, dot, stuff. The stuff can have dots in it, but can’t have at-signs or spaces. And by the way, before matching against the regex, trim whitespace from the ends of the address.
As a gesture of reconciliation with the purists, I propose this: check the user-entered email address against this regex. If it matches, it’s valid. If it doesn’t match, show the user an “invalid email address” error box that has two buttons: “Fix mistake” which lets the user re-enter an email address, and “Use it anyway” which takes the email address as-is even though it failed the match.
This is the best of both worlds, since the common case of a catchable typo in an email address will force the user to double-check their entry, but any address can be used if the user knows what they are doing. Most users will never see the error box, since they’ll enter their address correctly.
I’ve never seen a work flow like this, but it seems like a really simple solution to the problem. Is there something I’m over-looking? Is it too geeky?
BTW, I made that image with the command-line dialog tool:
dialog \
--title 'Email address' \
--yes-label 'Let me fix it' \
--no-label 'Use it anyway' \
--yesno 'The email address "root@localhost" seems invalid.' 7 40
Comments
1. Thwarts user error when entering email
2. Reduces (re)development when email address syntax changes (if it ever)
I know I've seen a number of email validation regexes which filter the tld to between 2 and 4 characters. While most TLD's are in this range (.uk, .com, .name) there are instances where they are not (.museum, or .travel). It is naive of us to think that these "limits" (even though the ICANN says TLDs can be anything with 2 or more chars) are constant. Your solution doesn't care about limits. Like you said:
stuff = anything but at and space
stuff + at + stuff
/^[^@:space:]+@[^@ ]+\.[^@:space:]+$/
Hope this helps!
Spaces in the local-part don't appear to be allowed unless the local-part is quoted. (see RFC5322; local-part is either a dot-atom or quoted-string)
For example, on Facebook, where a user enters their email address, how often does a space appear before the @ as a valid address, compared to how often a space appears as a typo? I'm certain the typo wins out at least 100-to-1, probably more like 1000-to-1.
I've once worked on a website where quantity of very "basic" users can create an account.
I've used a quite complicated validator AND the receive-an-email-and-click-the-link process
I can't remember how often people typed valid e-mail addresses but with typos in it. They never received the e-mail and barked that it's not working. If they were not to expect a confirmation e-mail, they wouldn't even know there was a problem and the whole user-website relation would be nil.
Anyway, I'll try this solution in a real website to see it in a real case.
In this case: just display a text next to the email entry saying that the email address looks invalid.
^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$
me@jordan.name
or
you@eu.travel
Expanding off of that, it's a simple fix. Just remove the three from the last limit: ^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,})$
But of course, this would fail to validate if the ICANN started to allow numbers or underscores (_) or any other special non-word character in the tld.
The whole point of this post is that we can neatly side-step the problem by using a simple regex that accounts for 99.9% of the real emails in use, and give the remaining .1% a button that says, "I know better than your stupid regex does, just let me use the email address".
Add a comment: