A few weeks ago, we had a baffling problem with our web application: some JSON responses were being gzipped incorrectly. I asked about it on Server Fault: Incorrect gzipping of http requests, can’t find who’s doing it.
The final resolution was that Akamai was gzipping the request, and adding a “Content-Encoding: gzip” header. But we’d already put in a “Content-Encoding: identity” header, the browser saw both, only attended to the first (identity), and couldn’t interpret the gzipped gibberish it saw in the content.
It turns out we aren’t supposed to use “Content-Encoding: identity” on responses, and removing that from our JSON code solved the problem.
But there was a mystery remaining: Akamai also adds an “X-N: S” header to the response. What the heck is that?
A friend has friends at Akamai, and sent them the question. Back came the answer:
A long time ago, when there was a browser called “Netscape”, :-) there was a bug that prevented embedded images from rendering if the HTTP headers were exactly some length. (If the terminating \r\n begins on character 256, 257, or 258.) So if the header size is in this range the Akamai server adds that header...
Wow, talk about bug workarounds encased in amber. That’s a really old bug and code is still trying to sidestep it. Looking on Google, it looks like other web intermediaries are also adding headers to fix it: Apache used to send X-Pad, and WebSTAR sent X-BrowserAlignment.
I doubt the affected browser is even out there in the wild any more, but Akamai is still adding this header to requests, plugging away a decade later. It’s astounding to think of the labyrinth of special checks and bug adaptations in software like this, the extra cycles expended in the name of obsolete components that are no longer even listening on the other end.
The problem of course is that once you’ve added code like this, how can you be sure it’s safe to remove? Who’s even checking over the code to consider that it might be safe? Accommodations like this get in the code and generally never come out, though Apache removed theirs.
One last micro-mystery: what do the N and S mean in “X-N: S”? I’m betting on “Netscape Sucks”!
Comments
The only thing more fun that completing a feature you have been working on for 3 years: being able to finally rip it out of the code base. No this is not a joke, it is actually quite cathartic.
While neighbourly, I'd argue that it's bad engineering. A hard fail (Internet Protocol test suites anyone? :-) results in bugs getting fixed. If a system 'seems to work', they often don't. It's just human (and corporate) nature.
An example of the waste this sort of thing engenders is the time my company spends working around Jurassic bugs in the browsers that run our markup and code. We do web applications, and a substantial part of the budget on each project is spent on 'backwards compatibility'. Libraries and experience reduce the factor, but it's still time that would be better spent 'adding value'. I shudder how much time and energy is wasted globally...
@Leon, yes, we'd have a more efficient streamlined ecosystem if the web had adopted an XML-like strictness from the beginning, but would we have as large an ecosystem? What would the adoption of the internet have looked like if many pages displayed error messages instead of a partial rendering? I think the loose connection between browsers and servers has allowed growth that wouldn't have happened under a different strategy. This is especially true when you consider the extension of the protocols. Would we ever have started putting images in web pages if the first ones displayed error messages in the older browsers that didn't implement the img tag? Graceful (or not) fallback has allowed us to build out HTML and other capabilities.
Two reasons why they are mad, which boil down to 'I want to do research, not make any modifications for your unimportant code management reasons. Code management is your problem, not mine.' Yes that is a direct quote.
1. When we remove an old file format which they no longer need and is a problem for code support and or implementing new features, then the old files need to be upgraded. This is agreed upon by Research management, which are all also researchers. Anyone with an old experiment using an old version of the file needs to use the existing version of the research interface to upgrade their files before moving to the new release. We warn them with plenty of time, have deprecation warnings, and tell them to upgrade their files; but they never do. They get mad when that deprecation warning becomes an error and they now need to go to the previous release and run an upgrade. It is much worse if the upgrade requires making decisions; very rare but can happen. Their job is to do research and all these engineering requirement hoops piss them off to no end. Why not just support all the old formats going back in time? It would save them the effort of doing an upgrade or making a 'mrec.SaveUser()' function call; saves auto-upgrade.
2. Researchers love to copy the parameters that other peoples experiments use without bothering to determine if that parameter is even needed for their experiment. They do not bother looking at the deprecation warnings. So when everyone agrees to remove a feature based on a parameter and then we remove the parameter after it has been deprecated for 10 releases, they get an error saying that it was removed on XXX in release YYY and either a set of instructions on what to do, or a url to the release notes. But what the researcher sees is that their experiments are failing and it's our fault.
It is one thing to agree that the old way of doing the best pel calculation is wrong and that we need to switch to the new system based on a log value, not a score/prob value, but it is another for every person in a 600+ research organization to understand the long tail impacts that making a minor change can have. There is a cognitive dissidence between the research mind set which is cultivated way back as a freshman in college all the way through a doctorate, and the software 'engineering' mindset cultivated through years of code and release management.
It is a social engineering problem which I have seen at every large company which has a significant R:D ratio; that I have been involved with. Talking with the folks over at IBM, we are in pretty good shape overall.
Bonus: IT upgrades the version of something like SciPy which they have been asking for because some new feature or bug fix is holding them back and costing them time, and then it is our fault that they have to change their scripts if an API has changed in that package. It's the core speech recognition engine teams fault that IT did what they asked. I love that one. A few people do not understand that 'python' the language and tools is different from the 'python' engine interface, as it is all just python. This problem is very very rare and stems from a confusion of what things are really 'engine' features and which are 'python' features. This has been seen in the Django project as well where there will be issues with improper use of module X which is not part of Django, but because the persons only experience with python is 'Django' they do not realize the difference.
The biggest problem is really #2, as we have fairly good data package management now, and it is only researchers who do not follow the research guidelines whom run into problems there. The parameter problem is much bigger, because while we have parameter set files, it is very easy and convenient to just set a parameter in a script somewhere, and many scripts are not properly version controlled as they are 'just to try something out'. Even when in version control it is a dvcs with tons of branches, and you never know which are actually still active. The research mentality is to keep everything, as it might be useful at some point in the future.
Thanks for the chance to rant ;-) it was quite cathartic.
However in the case at hand, I think competition and pragmatism are responsible rather than Postel's Law. Netscape's browser was released, people were using it, and to users and customers it would appear that the server was broken. If one vendor's server worked around the problem and others didn't, it would be a point in favor of the ones that did.
@Ned, defining a protocol strictly (and enforcing adherence) doesn't prevent extensibility, rather it increases the freedom of the designer to extend because the rules are well-understood.
Add a comment: