Joel recently wrote about Unicode: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!), and I have posted pointers about this before. Here’s my own tip, which I learned from the internationalization (i18n) guy at Iris.
Create some test data to use when testing your software for Unicode goodness. If you don’t speak a language that uses interesting characters, create some text that uses those characters, but looks like it’s in English. For example, at work, I used these strings (the first is single-byte, but upper code page, the second is two-byte):
Obviously, these don’t “say” Kubi in any real way. They just look to English speakers like they do. This solves a problem with testing: if your developers or testers aren’t familiar with what the right result is, how will they notice the wrong result? By using these strings (or actually, ones that are more about your own work), it will be clear if the output goes a little bit wrong.