|Ned Batchelder : Blog | Code | Text | Site|
Email validation again
» Home : Blog : January 2006
OK, so I'm a liar.
On Thursday, I updated my email validation code in response to an problem a reader was having with it. In the comments, as I expected, I was chastised for excluding some valid but unlikely addresses. I explained why I was not going to support those addresses. Then I went ahead and supported them.
Here's what I said to defend not supporting address with spaces in them:
As it turns out, I am interested enough in supporting esoteric forms that I went ahead and did it. I guess once the idea of doing a better job was planted by Rik and Ben, I couldn't resist. It was like an Everest to climb because it was there. So now quoted email addresses and escaped characters are accepted. Here's the current code:
One of the difficulties in writing code like this is just wading through the dense RFC's that define the syntax. A document pointed to in the comments by Ben Finney was very helpful: RFC 3696 summarizes the rules in English.
Invaluable while making changes to impenetrable regular expressions are unit tests which both prove that the code works properly, and prove that the code still works properly. That is, they serve both as functional tests and regression tests. I wrote some of those too, so I really think this code works:
OK: firstname.lastname@example.org matches: local is joe, domain is example.com
By the way: I am not validating these email address so that I can be sure mail will be delivered. Unless you ask for email notifications, I never send email to these addresses. I validate them to prevent spam and discourage anonymous comments. And yes, I know lame-o validation is a weak defense.
Any more complaints?