|Ned Batchelder : Blog | Code | Text | Site|
Faked translations: poxx.py
» Home : Blog : December 2010
Internationalizing an application consists of two broad areas of work: marking all the human text for translation, and then localizing (translating) them all into whatever languages you want to support. The first phase is tricky because after you've marked a string for translation, it still looks the same, because there isn't yet a translation for it. So you start with an English application, and then do a bunch of work to find and mark all the strings, and what you end up with looks and behaves exactly the same, if you've done it right.
This makes it difficult to know that you've marked all the strings. The end result is precisely as if you had done nothing at all. If you miss a string, you won't find out until you get back a translation, and try it out. And then you're looking at your application in a foreign language, which if you are like me, means you don't understand what you're reading.
To solve these problems on a recent project, I wrote this little script. Before describing it, let's review the mechanics of localizing an application, in this case, a Django application:
I can do all of these steps myself except number 3. Step 3 is also the time-consuming one that likely will be done far away from you, and so on: it's the difficult part. This poxx.py replaces step 3. It munges a .po file, creating synthetic "translations" for your strings. With it, you can see your application in a pseudo-translated state that lets you know that you've properly marked strings for translation, and shows you where you haven't yet marked them. I use poxx.py to create a translation for the language "xx" (hence the name poxx.py), then set my application to use language "xx".
What poxx.py does is create a "translation" by swapping the case of all the vowels. So where your English site shows "Please log in to comment," your poxx'ed site will show "PlEAsE lOg In tO cOmmEnt." You can still read the text, but the translated and the un-translated stand out from one another, all without need for an actual speaker of another language.
Most of the complexity in poxx.py arises from the fact that the text in a .po file is not all human readable: HTML tags and data replacement tokens should be left alone. So it uses a simple HTML parser to find the pieces that will be displayed, and only munges them.
It works great for me, I hope you find it useful too. You'll need polib as a prerequisite.