Unicode test strings

Friday 17 October 2003

Joel recently wrote about Unicode: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!), and I have posted pointers about this before. Here’s my own tip, which I learned from the internationalization (i18n) guy at Iris.

Create some test data to use when testing your software for Unicode goodness. If you don’t speak a language that uses interesting characters, create some text that uses those characters, but looks like it’s in English. For example, at work, I used these strings (the first is single-byte, but upper code page, the second is two-byte):

«küßî»

“ЌύБЇ”

Obviously, these don’t “say” Kubi in any real way. They just look to English speakers like they do. This solves a problem with testing: if your developers or testers aren’t familiar with what the right result is, how will they notice the wrong result? By using these strings (or actually, ones that are more about your own work), it will be clear if the output goes a little bit wrong.

Comments

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.