« | » Main « | »

Faked translations: poxx.py

Friday 31 December 2010

Internationalizing an application consists of two broad areas of work: marking all the human text for translation, and then localizing (translating) them all into whatever languages you want to support. The first phase is tricky because after you've marked a string for translation, it still looks the same, because there isn't yet a translation for it. So you start with an English application, and then do a bunch of work to find and mark all the strings, and what you end up with looks and behaves exactly the same, if you've done it right.

This makes it difficult to know that you've marked all the strings. The end result is precisely as if you had done nothing at all. If you miss a string, you won't find out until you get back a translation, and try it out. And then you're looking at your application in a foreign language, which if you are like me, means you don't understand what you're reading.

To solve these problems on a recent project, I wrote this little script. Before describing it, let's review the mechanics of localizing an application, in this case, a Django application:

  1. Edit all your source to mark strings for translation, with trans tags and gettext() function calls.
  2. Run the makemessages script to extract all the marked strings into .po files, one for each language.
  3. Have someone edit each .po file, entering translations for every string.
  4. Compile the edited .po files into binary .mo files that will be used during execution.
  5. Set the language for the app, and run it to see your awesome translated application!

I can do all of these steps myself except number 3. Step 3 is also the time-consuming one that likely will be done far away from you, and so on: it's the difficult part. This poxx.py replaces step 3. It munges a .po file, creating synthetic "translations" for your strings. With it, you can see your application in a pseudo-translated state that lets you know that you've properly marked strings for translation, and shows you where you haven't yet marked them. I use poxx.py to create a translation for the language "xx" (hence the name poxx.py), then set my application to use language "xx".

What poxx.py does is create a "translation" by swapping the case of all the vowels. So where your English site shows "Please log in to comment," your poxx'ed site will show "PlEAsE lOg In tO cOmmEnt." You can still read the text, but the translated and the un-translated stand out from one another, all without need for an actual speaker of another language.

Most of the complexity in poxx.py arises from the fact that the text in a .po file is not all human readable: HTML tags and data replacement tokens should be left alone. So it uses a simple HTML parser to find the pieces that will be displayed, and only munges them.

It works great for me, I hope you find it useful too. You'll need polib as a prerequisite.

#!/usr/bin/env python
"""Munge a .po file so we English-bound can see what strings aren't marked 
for translation yet.

Run this with a .po file as an argument.  It will set the translated strings 
to be the same as the English, but with vowels in the wrong case:

    ./poxx.py locale/xx/LC_MESSAGES/django.po    

Then set LANGUAGE_CODE='xx' in settings.py, and you'll see wacky case for
translated strings, and normal case for strings that still need translating.

This code is in the public domain.


import re, sys
import polib    # from http://bitbucket.org/izi/polib
import HTMLParser

class HtmlAwareMessageMunger(HTMLParser.HTMLParser):
    def __init__(self):
        self.s = ""

    def result(self):
        return self.s

    def xform(self, s):
        return re.sub("[aeiouAEIOU]", self.munge_vowel, s)

    def munge_vowel(self, v):
        v = v.group(0)
        if v.isupper():
            return v.lower()
            return v.upper()

    def handle_starttag(self, tag, attrs, closed=False):
        self.s += "<" + tag
        for name, val in attrs:
            self.s += " "
            self.s += name
            self.s += '="'
            if name in ['alt', 'title']:
                self.s += self.xform(val)
                self.s += val
            self.s += '"'
        if closed:
            self.s += " /"
        self.s += ">"

    def handle_startendtag(self, tag, attrs):
        self.handle_starttag(tag, attrs, closed=True)

    def handle_endtag(self, tag):
        self.s += "</" + tag + ">"

    def handle_data(self, data):
        # We don't want to munge placeholders, so split on them, keeping them
        # in the list, then xform every other token.
        toks = re.split(r"(%\(\w+\)s)", data)
        for i, tok in enumerate(toks):
            if i % 2:
                self.s += tok
                self.s += self.xform(tok)

    def handle_charref(self, name):
        self.s += "&#" + name + ";"

    def handle_entityref(self, name):
        self.s += "&" + name + ";"

def munge_one_file(fname):
    po = polib.pofile(fname)
    count = 0
    for entry in po:
        hamm = HtmlAwareMessageMunger()
        entry.msgstr = hamm.result()
        if 'fuzzy' in entry.flags:
            entry.flags.remove('fuzzy') # clear the fuzzy flag
        count += 1
    print "Munged %d messages in %s" % (count, fname)

if __name__ == "__main__":
    for fname in sys.argv[1:]:

Splitting Planet Python

Thursday 30 December 2010

For more than eight years, this blog has had a single rss feed, with everything mixed together. It was simpler that way, but it meant that Planet Python aggregated all of my posts, regardless of their interest to a Python audience.

In keeping with Planet Python policy, I've at long last created a special feed for them to pull. So as not to completely banish serendipity from Pythonista's lives, I haven't built the feed only around the python tag, but from a list of tags I think Pythonistas might reasonably be interested in. So I've included math, but not art, for example. Javascript is in, PHP is out. No longer will you have to wade through cakes or politics just because you are interested in Django.

Of course, if you want the full range of topics, the original full feed is still here.

Made by Joel

Monday 20 December 2010

Years ago I wrote a page about making business card cubes, and I still get someone linking to it every few months. Usually, it's a blog post along the lines of, "I have all these left over business cards, so I made a cube." The most recent reference by Saaleha Idrees Bamjee said the same thing, but she made such beautiful colored puzzle cubes, I had to read the whole post.

In it, she linked off to Made by Joel, a blog by Joel Henriques, a toy designer from Oregon. Joel's got a great, simple, home-made esthetic, but still manages to rise above the pedestrian, giving his projects a lovely spirit. For example, look at his marionette made from wooden scraps, and then the simplified version for toddlers.

I love the fact that small children can be entertained with simple things, and that playfulness can be given expression with what's lying around, without recourse to a brand name toy.

To top it off, I see that Joel has a book in the works, with the same publisher and imprint that produced my wife Susan's books!

On my own

Sunday 19 December 2010

Friday was my last day at Hewlett-Packard, I'm now a freelancer.

It's been interesting to reflect on my time at HP. Generally, I don't like the large company dynamic. A big difference between Tabblo as a start-up and Tabblo as a technology group inside HP was the extent to which our attention was focused on HP itself rather than on the truly important external concerns. A lot of that was due to the shift in our work, but a lot of it was also because of how large organizations work. It's very easy to become preoccupied with internal issues that simply don't matter to customers.

This doesn't seem to be peculiar to HP, it's just a natural side-effect of trying to get tens or hundreds of thousands of people to act like a cohesive unit. There are some things you can only do once you've gotten that many people together, but I am not driven to do those kinds of things, so I choose not to work in that large a group.

But HP's particular blight of the last few years was obsessive cost-cutting, something that I hope was due to Mark Hurd, and that is now a thing of the past. Back in March, I wrote a blog post about a particularly bone-headed policy change. The worst part about these measures is the seduction of cutting visible costs at the expense of invisible ones:

These [cost-cutting] policies do nothing to improve the mood among HP employees, and they do nothing to make HP products better. Every one of them is a trade-off of the visible against the invisible, and the invisible that suffers is everything you want in a company: productivity, morale, loyalty, and innovation. It's hard enough to build great products, I don't need my employer, a giant profitable tech company, nickel-and-diming me to make it even harder.

That pressure inside the company made it very difficult to feel like HP and we were together in trying to build something great. More often, it felt like we shipped something in spite of the corporation.

I can't say I would have stayed at HP even if it had been run perfectly. I still will be glad not to be commuting, and I like the flexibility that freelancing gives me to be involved in more than one thing at a time. I'm also hoping that freelancing will allow me stay focused on technical rather than organizational work.

So I'll be on my own for now. Of course, to mark the occasion of my last day, my family made a cake: it's me on a horse, riding into the sunset:

Cake: riding into the sunset

Shop-Vac allegory

Friday 3 December 2010

I'm fascinated by new technology being used in old ways, or old things being done in new ways, or something like that. A kinetic typography video brought strange associations to mind today.

Kinetic typography is a recent innovation, and one that will may pass quickly. A passage of text is animated, with the type itself the subject of the animation. Many of them are flashy, but shallow. The words zip and zoom, but in the end, it's just a different way to display text.

Or so I thought. Today, I saw Jarrett Heather's adaptation of Jonathan Coulton's song Shop-Vac:

I loved this animation because of the numerous references sprinkled through it. It reminds me of traditional art, where symbols in a painting or a play were obvious to audiences of the day, but now must be explained to modern viewers. When we look at Shop-Vac, we see all sorts of visual connections that deepen the message of the song and draw us into its world. The word "mall" is clearly a reference to Macy's, the gourmet food store is Whole Foods, the paired Starbucks reference a Google Map navigated world of uniformly branded unfamiliar intersections, and so on.

100 years from now, this video would need a dozen footnotes, as Shakespeare plays do today. I don't think kinetic typography is going to become a serious art form, but you never know. In any case, Jarrett has created art with it in this case.

BTW: Jarrett is also a fabulous piano player, here he is playing a 14-minute medley of JoCo songs.

« | » Main « | »