Wednesday 2 September 2009 — This is more than 15 years old. Be careful.
Trent Mick wrote to me some time ago asking for a feature on this blog: could I make it so that email notifications of blog comments would thread together nicely?
The email subject lines from my notifications look like this:
A comment on “Weird URL data encoding” from Richard Schwartz
I use Thunderbird for email, and don’t thread my inbox, so I never considered threading. Trent sent along information from a friend which said that “References:” headers were the key that would make a set of emails into a single thread.
I hacked for a little while, and could not get them to thread. I created a fake message id from the blog post and had all comment notifications have a References header with the id in it. No threading. I added unique Message-ID headers to each comment, then made subsequent comments have all previous message ids in a References header. No threading.
I tried the same in Gmail, and nothing seemed to thread the messages together. Googling around, it seemed others had come to the conclusion that only the subject line matters. Apparently if two messages have the same subject (plus or minus some “Re:” prefixes), then they are in the same thread.
But what is the actual algorithm? I know that there can be differences in the subject lines (“Re:” and all). What are these mail clients doing to decide that two messages are in a thread?
I like having the author name in the subject line, it makes the Inbox listing richer. But it’s also what’s keeping these messages from threading. Is there a way to get the best of both worlds?
I know I’ve seen threads in Thunderbird where the subject line changes completely mid-thread. Is that because they have Reply-To headers? Comment notifications aren’t replies to each other, but maybe that’s a way to force threading?
Comments
Since it's only for display, you don't even need to change the routing part of the email address (foo@bar) if that matters, just the display part: "Blog Daemon" <blogdaemon@nedbatchelder.com> becomes "Richard Schwartz" <blogdaemon@nedbatchelder.com>.
Also, some MUAs can change their threading heuristics. Mutt, for example, uses the standard threading headers, but can also try to gather threads together by looking at Subject: lines. That's only used as a last-resort, though. Not sure about other MUAs.
https://wiki.mozilla.org/MailNews:Message_Threading
The bottom line is that Thunderbird 3.0 (currently nearing beta 4 release) honors the References/In-Reply-To headers a lot. This includes both single-folder views and cross-folder views. (Previously, threading was not possible in cross-folder saved searches.)
Thunderbird 2.0 was crazy for subject threading, although it was capable of doing references threading acceptably if so configured. (But not as well as 3.0 is/will be.)
For example, I can't count the number of friends/family I know who use their Inbox as an address book... ask them to invite "Fred" to lunch and the first thing they do is find an email Fred sent to them - any one, it doesn't matter - and hit "Reply". They then delete the subject and body, and type their "new" message. Never mind that it likely includes headers that link it to Fred's original message.
If you're writing an email client like, oh, GMail, what do you do in this case? Should you honor the headers in the email, even though the subject and body have *nothing* to do with the original Fred message? Talk about confusing! You'll have M's of users bitching at you for burying messages in threads that, to the user, are completely unrelated.
Given this imperfect user behavior, wouldn't you be better off ignoring the mail headers or, at most, using them as "hints" and instead look to the Subject line as the definitive measure of threading?
Anyhow... Cory's suggestion is the one that first came to mind for me. It's also how other forum products, like Google Groups and Google Code maintain threading with their email comment notifications.
But if you do spoof the From header, be sure to also set the Sender header to your own address or to a stand-in address for your blog. See RFC-5322, section 3.6.2 http://tools.ietf.org/html/rfc5322
Add a comment: