It’s funny how things happen. I was at the Boston Python Meetup the other night, and one of the things we got talking about was the intricacies of Unicode, including the Byte Order Mark (BOM). The BOM is a “character” in the Unicode standard, U+FEFF. It doesn’t render as anything (it is considered a Zero-Width Non-Breaking Space, or ZWNBSP). It’s purpose in life is to be a tell-tale indicator of the endianness of a UTF-16 text file.
If you read the first two bytes of a Unicode file, and they are 0xFF 0xFE, then you know that it is UTF-16, little-endian (low-order byte first). If they are 0xFE 0xFF, then you know it is UTF-16, big-endian.
So we were talking about this Thursday night, and about how funky things can happen and to diagnose them correctly, you have to grok all this BOM stuff (not to mention other things like UTF-8, UTF-16, and so on).
OK, so the next day, I’m setting up a Remote Desktop Connection on my Windows box, and the video settings offer me 1280×1024 or full-screen. But I want 1400×1050. I figure I’ll create the .rdp file at 1280×1024, then open it up, find the display resolution and change it. I open the .rdp file, and what do you know, it’s text! I edit the text, save the file, double-click the .rdp file, and nothing happens. Huh?
Then the previous night’s conversation comes back to me. Sure enough, when I hexdump the edited .rdp file and an unedited one, the original has a BOM, and the edited one does not. Both are legit UTF-16, both are little-endian, but the one with the BOM works, and the one without does not. Now that is not proper Unicode support, but at least I understand what went wrong. I open the file again in TextPad, use the document properties to instruct it to write the BOM, save the file again, and everything works gloriously.
Ah, Unicode. It makes life so much simpler, doesn’t it?