« | » Main « | »

Spore creature creator and steganography

Wednesday 25 June 2008

Spore is the wildly anticipated new game from Will Wright, and Creature Creator is the first part of it to be released for us to try. It allows you to build creatures, Mr-Potato-Head-style, which will eventually be usable in the full game:

A Spore creatureA Spore creatureA Spore creature

It's fun to put arms and legs and body parts together to make creatures, but the more impressive part of the technology is that once you make your creature, it's fully animated already, with a repertoire of moves like walking, sitting, dancing and greeting. This is no small feat considering you aren't constrained to building a humanoid creature. For example, my tricyclotops has three legs, and that front centered leg participates in the animations in a way that seems very natural, considering I've never seen a creature with legs in that formation.

The developers behind this animation have written up the technology: Real-time Motion Retargeting to Highly Varied User-Created Morphologies. One of the authors, John DeWeese, has a handful of riotously varied creatures on his Spore page.

If you look at the sidebar on the Spore creature pages, you'll see instructions that you can save those PNG files, and drag them into Creator, and you'll have the creature. That interested me: one of the things I did with Aptus was to save the coordinate info for a picture as a tEXt record in the PNG file. Aptus can open a PNG file it saved, and instead of dealing with pixel data, can read the coordinates and recreate the Mandelbrot view directly, allowing you to continue exploring from there.

Looking at the PNGs on the Spore page though, they have not done this. There is no data other than the image. But dragging the PNG into Creator does indeed give you the creature as structured data. Renaming the PNG doesn't affect the data transfer, but any sort of editing of the image does. They're using steganography, hiding one message inside another. In this case, they seem to be using the least significant bits in all the pixels.

Some quick Python shows details. Using PIL, we can examine the numeric values of the pixels:

# Open an image, and show the RGBA data for the first ten pixels.
import Image, sys

im = Image.open(sys.argv[1])
for pix in list(im.getdata())[:10]:
    print pix

produces:

(0, 1, 0, 1)
(0, 1, 1, 1)
(0, 0, 1, 0)
(1, 1, 0, 0)
(1, 0, 0, 0)
(0, 0, 1, 0)
(1, 1, 0, 1)
(0, 0, 1, 0)
(1, 1, 1, 0)
(0, 1, 1, 0)
(1, 0, 1, 1)
(1, 0, 1, 0)
(0, 0, 0, 1)
(0, 1, 1, 1)
(0, 1, 0, 1)
(1, 0, 1, 1)
(0, 1, 1, 1)
(1, 0, 1, 1)
(0, 1, 0, 0)
(1, 1, 0, 1)

These pixels are part of the black transparent edge of the image, except it isn't truly black and it isn't truly transparent. There's one bit of information being encoded in each channel, or four bits per pixel.

We can go further and yank out the full 8-bit data:

# Open an image file, read the low bits as 8-bit data,
# and write it out to a .out file next to the image.
import Image, sys

def stegdata(imfile):
    """ Read the low bits of pixels as 8-bit data. """
    im = Image.open(imfile)
    bytes = ""
    hi = 0
    for ipix, (r,g,b,a) in enumerate(im.getdata()):
        nyb = (r%2)*8 + (g%2)*4 + (b%2)*2 + (a%2)
        if ipix % 2:
            bytes += chr(hi + nyb)
            hi = 0
        else:
            hi = nyb*16
    return bytes

fname = sys.argv[1]
data = stegdata(fname)
open(fname+'.out', 'w').write(data)

I had to guess here how to put the bits back together into a byte. The results are full-spectrum 8-bit data, but I don't know how to interpret it:

000000: 57 2c 82 d2 e6 ba 17 5b  7b 4d 20 32 76 eb f4 8b  W,.....[{M 2v...
000010: 4a b7 54 8b b6 9c a7 ba  d5 6a 5f a4 54 15 f1 1f  J.T......j_.T...
000020: c6 90 df 98 54 72 6d 62  58 71 69 f6 63 fe 7e 23  ....TrmbXqi.c.~#
000030: 49 97 de 81 d7 08 ec 5a  1a 63 57 e6 8e 27 16 03  I......Z.cW..'..
000040: 80 5c 56 a1 34 6b d8 fb  49 46 f9 d6 7b 32 ce 6b  .\V.4k..IF..{2.k
000050: a3 2c 35 4e f7 e7 52 1a  62 1f ce 8e 47 5f e7 ba  .,5N..R.b...G_..
000060: 14 ea 74 58 39 ac eb 53  ee c1 c8 3b cc 38 11 d6  ..tX9..S...;.8..
000070: fd 3f dd 41 ff 35 03 a3  67 c4 a6 43 1c 82 24 41  .?.A.5..g..C..$A
000080: b7 1d ce 66 5a 32 b3 f0  34 6b f3 0f 73 f9 ee f6  ...fZ2..4k..s...
000090: 05 41 56 7b 27 19 40 25  bc e7 b1 02 c9 43 e7 7d  .AV{'.@%.....C.}
0000a0: fd b4 11 82 52 1f c8 d0  3c ad 92 ee 1e 57 6d e7  ....R...<....Wm.
0000b0: ad fd 72 53 b3 fd 1a 9b  10 52 57 01 86 11 42 7c  ..rS.....RW...B|
0000c0: a2 74 ed f6 1b 28 33 cf  7a 19 79 fa b0 6c 04 a7  .t...(3.z.y..l..
0000d0: 89 36 c5 08 d8 ee e8 de  5a a3 b8 48 3d 94 62 0d  .6......Z..H=.b.
0000e0: 0a 38 4c 21 5d 15 b8 54  e1 ea d7 0b 12 bf 8a a0  .8L!]..T........
0000f0: a3 e0 96 1a a8 79 c3 44  62 9e de 02 ea a0 31 8d  .....y.Db.....1.
000100: 96 12 e0 7c ad e0 a5 9f  fe 89 54 a6 54 f2 9d 6c  ...|......T.T..l
000110: 42 c1 f0 14 8d 15 49 a5  d3 80 2c b1 26 ca af 80  B.....I...,.&...
000120: a8 cf a8 a4 77 02 60 ea  c0 d8 4d 2c d9 18 1e 67  ....w.`...M,...g
000130: 8f 9a 29 67 30 92 b5 62  da 1d c1 30 21 f8 eb 21  ..)g0..b...0!..!
000140: fe d8 c2 a6 64 cf 52 dc  58 d1 0c ef d0 60 fb 9b  ....d.R.X....`..
000150: 02 7a e9 d1 d6 a7 3c 01  79 7b da a7 9b 0b ef 3f  .z....<.y{.....?
000160: 80 a3 d1 87 d2 81 50 d1  a2 59 c0 65 c3 8b c5 7b  ......P..Y.e...{
000170: 8c e5 56 50 bf c2 6e 50  82 26 23 9a 76 2b e7 3b  ..VP..nP.&#.v+.;
000180: 4f 5a ec f3 87 aa 27 fe  33 74 40 48 ba db 4f 25  OZ....'.3t@H..O%
... ...

Nothing jumps out at me here. As an exercise in code-breaking, this one is probably possible, since we have a way to generate as many cases as we need. I looked at other images, and there was no clear pattern.

It's interesting that Spore chose steganography here, since it's usually described as a way to hide a message so that its very existence is a secret. But there's no sensitive data here, and they tipped us off to its presence with their instructions. Perhaps they wanted to save space by using those unneeded low bits? Perhaps they didn't have the tools for manipulating tEXt records?

In any case, Spore is already a fertile breeding ground, both for wild new life forms, and geek interest in its technology.

Subversion's biggest hole

Saturday 21 June 2008

Subversion 1.5 has just been released, but it does nothing to fix what I consider to be Subversion's biggest hole. I know people deride svn for not being distributed, or for doing a bad job of merging, and both of those will be solved when everyone finally switches over to git, as I'm sure we all will eventually.

But I don't miss those things. The thing I miss that Subversion is lacking is a repository setting for globally ignored files. You can set a property on a directory to ignore (for example) *.pyc files. And you can set your client to always ignore *.pyc files in any working tree on your machine. But there is no way to set a repository to ignore *.pyc files anywhere they appear in the tree.

This is such a common need in software development, and is truly a property of the repository, not the client. Why are we still either touching every directory in the tree, or touching every client on the team?

And probably this will also be solved when we all switch over to git...

Reporting server reliability

Thursday 19 June 2008

The next time someone inquires about how reliable your system is, say this:

We're almost at five 9's: we're at five 8's!

If you don't know what I'm talking about, the Wikipedia article on uptime explains it pretty well: "Five nines" refers to a system being available 99.999% of the time, and is considered really good. "Five eights" would be 88.888% of the time, which would be horrible, and is in no way considered "almost" five nines.

Python import helper

Wednesday 18 June 2008

Aptus has dependencies on three large packages, wxPython, Numpy, and PIL. The simple thing to do would be to import the modules and use the methods I need. But if the module is missing, an unhelpful ImportError message is all you get. And if the module is present, but isn't recent enough, then the method call may fail with a missing name.

To solve these problems, I use this helper function instead:

def importer(name):
    """ Import modules in a helpful way, raising detailed exceptions
        if the module can't be found or isn't the proper version.
    """
    if name == 'wx':
        url = "http://wxpython.org/"
        try:
            import wx
        except ImportError:
            raise Exception("Need wxPython, from " + url)
        if not hasattr(wx, 'BitmapFromBuffer'):
            raise Exception("Need wxPython 2.8 or greater, from " + url)
        return wx
    
    elif name == 'numpy':
        url = "http://numpy.scipy.org/"
        try:
            import numpy
        except ImportError:
            raise Exception("Need numpy, from " + url)
        return numpy
    
    elif name == 'Image':
        url = "http://pythonware.com/products/pil/"
        try:
            import Image
        except ImportError:
            raise Exception("Need PIL, from " + url)
        if not hasattr(Image, 'fromarray'):
            raise Exception("Need PIL 1.1.6 or greater, from " + url)
        return Image

Then, instead of "import wx", I use:

wx = importer("wx")

and if anything goes wrong, the exception includes helpful details.

This technique still suffers from the problem of detecting that the module is actually missing. Because of Python's impoverished exceptions, catching ImportError doesn't necessarily mean that the module was missing, although that's the most likely reason.

Iron Man videos

Monday 16 June 2008

I saw Iron Man on Sunday, and enjoyed it. Here are two funny videos related to the film have been making the rounds:

Bad web typography: full justify

Sunday 15 June 2008

Full justification on the web is usually a bad idea.

Typographers use full justification to get an elegant-looking block of type. The straight right edge is a strong visual element on the page, and can add to the controlled overall look. But typographers care about more than just the outline of the rectangle. They care about the evenness of the type within the rectangle, something they call "color". The goal is to get an evenly filled area, with no large changes in density.

Because full justification involves stretching word spaces, if a line has to be stretched too much, the spaces become wide enough to be noticeable white blobs on the page. The line of text is then "too loose", and interrupts the flow of reading.

In traditional typography, hyphenation is used to reduce the need to make loose lines. By breaking words into smaller chunks, the lines can be filled more naturally, and they don't have to be stretched too far.

But web browsers don't hyphenate. As a result, paragraphs often suffer. Here are some examples from the OpenID news for February 2008:

A full-justified paragraph

(I've blurred it a bit to emphasize the color.) This paragraph is OK, with just two problem lines: the fourth ("some of the top ...") and fourth from the bottom ("to support the community ..."). These lines are loose enough that I stumble when reading them, as if they were typed "some .. of .. the .. top .."

But then we come to the other problem with full justification on the web:

A full-justified paragraph with a URL in it

Occasionally URLs appear in paragraphs, and these are very large "words" that completely screw up the line before them. Technical writing is especially prone to this as other non-word content appears in running text, such as function names.

I think full justification is one of those technology hold-overs: the new technology trying to mimic the old. Books and newspapers use full justification, so we try to do it on the web also. But content on the web rarely appears in a constrained rectangle. Full justification in print is appealing partly because the justified right edge of the text is a good echo of the right edge of the paper, or of the left edge of the next column in a newspaper. In a single column of text in the middle of a browser window, full justification isn't gaining you much, and brings you pain in the form of loose lines.

Except in specialized cases, or where you know very clearly what type of content will appear, you shouldn't use full justification on the web. The lack of hyphenation is a killer.

As it happens, there are browser-side hyphenation solutions, but they also have their drawbacks: code size and execution time.

BTW: it isn't just the web that suffers from hyphen-less justification. Amazon's Kindle has the same problem, something I noticed right away when I first tried one out. I'm not sure why they wouldn't have built hyphenation into a reading device. And I'm reading a Salman Rushdie book published by Penguin which uses no hyphenation. Why would a traditionally-published book forgo the tried and true technology of good-looking pages?

Pylint

Saturday 14 June 2008

Being a developer, I'm a sucker for rules to follow to improve my code, and for tools that can help me to follow them. Being a Python developer, I don't have a static type checking compiler to help me. Pylint aims to fill some of those gaps.

It examines your Python source code and reports on all sorts of things it doesn't like. Like most tools of this sort (its name is a reference to the classic lint tool for C), it can be annoyingly picky. Since its job is to flag things that might be a problem, it errs on the alarmist noisy side.

Pylint tries to apply light type checking to methods and variables, so it will complain about constructs simply because they interfere with that goal:

W0142: 26:MyClass.my_method: Used * or ** magic

Excuse me, but ** is not magic, it's a powerful language feature. Reading pylint's warnings, you get the feeling that it won't be completely happy until you are coding within the intersection of Python and Java.

But pylint's best feature is its configurability. In the settings file, you can disable individual messages:

# Disable the message(s) with the given id(s).
disable-msg=C0103,R0903,W0142,C0324,R0201,C0111,W0232,W0702,W0703,W0201

and also configure all sorts of other settings. This is important because pylint also natters on about style issues: valid names, line length, number of statements per method, and so on. Pylint also lets you disable a particular message in specific files, classes, or methods, which is extremely useful for overriding warnings about tricky cases, or simple misdiagnoses.

As with every other lint-like tool I've ever used, the first order of business is deciding what you really want pylint to tell you about. Initially, its reporting will be about things that just don't matter, and you'll disable a ton of messages. Then you'll get to the things that you'll agree are minor issues and you'll want to clean up, like unused imports.

The next rung of messages are helpful because they get you to think about the way you've written your code. For example, this code:

def my_method(self, arg1, extras=[]):
    // blah blah...

will get you this warning:

W0102:247:MyClass.my_method: Dangerous default value [] as argument

Pylint warns about this because you could append to extras in the body of the method, and that would modify the single list object that is used as the default value for every invocation of the method, something you almost certainly didn't intend. Changing the code to this avoids the possibility and the warning:

def my_method(self, arg1, extras=None):
    extras = extras or []
    // blah blah...

Whether you want to adopt this idiom uniformly, or stick with the more common extras=[] is something you'll have to decide. Pylint did you the favor of bringing it to your attention so that you could think through the issue and decide. In this case, you may be able to simply leave extras as None and use it as is in the body of the function, but you get the point.

Occasionally, you'll get unambiguous value from the pylint output. I ran pylint on a large actively developed code base, and it reported on an instance of an undefined variable. I looked, and sure enough, that code shouldn't work. Digging further, I looked at who called that code, and once I was done pulling on all of the threads I discovered, I had a couple hundred lines of code that were not used any more, and I could delete them.

I don't know whether I'll stick with pylint. It's a tricky balance to get it set properly so that it warns about things I genuinely believe to be issues.

The other minor downside to pylint is that you have to install three separate packages to get it to work. Logilab would do well to provide a single installer for pylint and its dependencies.

BTW: There are other tools for static checking Python code, but I haven't used them recently: PyFlakes and PyChecker.

One of the worst decisions in history

Saturday 14 June 2008

Conservatives are wailing and wringing their hands over the Supreme Court's ruling that Guantanamo detainees are entitled to challenge their detention. John McCain hyperbolically called it one of the worst decisions in history.

I'm not sure why people (even conservatives) think this is so horrible. If, as John McCain asserts, these are "bad people", then judges will agree that they can be detained. It will take time to hear those cases, and the government will have to defend their detention. But if these men are as bad as we say, then judges will agree to hold them. We are not in danger here of some kind of Get Out of Jail Free card. Mention has been made a few times of their food being an issue in these cases. Oh, please. Let's give federal judges some credit here.

McCain said it was "terribly unfortunate because we need to go ahead and adjudicate these cases". We've been holding some of these people for six years. As near as I can tell, the only thing keeping us from adjudicating their cases is the extensive process the Bush Administration has undertaken to invent a parallel legal system outside the U.S. Constitution.

Perhaps it is no surprise, but I agree with what Obama said about the decision:

This is an important step toward re-establishing our credibility as a nation committed to the rule of law and rejecting a false choice between fighting terrorism and respecting habeas corpus.

Wordle

Friday 13 June 2008

Jonathan Feinberg has made a cool application: Wordle. You give it a corpus of text, and it creates a word cloud with size based on frequency. Here are the titles of all my blog posts over the years:

Wordle

Wordle automatically nestles all the words together, and when you create one, you can set parameters like font and layout algorithm. Very cool.

The thing that surprised me about my title cloud is that it looks like a tag cloud, but these are actually words pulled from the titles, not the names of tags. Maybe all that manual tagging isn't worth it after all?

Classic photos in Lego

Thursday 12 June 2008

Photographer Mike Balakov has recreated a number of classic photographs in Lego.

Behind the Gare Saint-Lazare

(here's the original.) The Lego constructions themselves are quite simple, but the photo set up is often complex, to get the right lighting and lens characteristics:

Set up of the shot

This is reminiscent of the Brick Testament, an extensive (4000 shots!) recreation of Bible scenes in Lego. My ten-year-old Ben is fascinated by these, and has been working on his own scenes from Greek mythology in Lego. It's a fun way to study the material, exercise your Lego creativity, and learn some more about photography at the same time.

How hot was it?

Tuesday 10 June 2008

The weather so far this week in Boston has been extraordinarily hot: 95 degrees or so for the last three days. As graphic evidence, here is a CVS receipt I had left in a bag in my car yesterday:

A receipt printed on thermal paper

This receipt is printed thermally: the paper is impregnated with a dye which turns dark with sufficient heat, and the print head in the cash register uses a tiny heating element to darken individual pixels to produce the image.

As you can see, the paper got hot enough simply from being in the car to darken substantially! I haven't found a reference for the temperature at which thermal paper darkens, but I'm guessing it wasn't very comfortable in there...

Fake web addresses

Tuesday 10 June 2008

I saw Sex In The City over the weekend (it was just OK if you liked the series), and noticed the web addresses they tossed around. Carrie's homepage is carriebradshaw.com, which is really live, but is the perfect movie prop: it looks good on screen, but there's no real content. It's just enough to let an actor click around a little.

A significant email address in the movie was at jjpny.com, which is registered to New Line Productions, but has no web page. I wonder if email sent to john@jjpny.com will go anywhere?

This reminds me of the classic debate over invalid IP addresses appearing in movies. Are they just mistakes by ignorant propmasters, or like phone numbers starting with 555, purposefully invalid so that fans won't bother a hapless civilian?

With domain names, it's a little easier, since you can actually register the fake names you want to use...

Photoshop blend mode math

Saturday 7 June 2008

I've used Photoshop and Gimp to do simple image manipulation, and have always been fascinated and baffled by the blend modes, those mystifying choices for how to combine two layers in an image. Dodge? Burn? I could choose them and see what happens, but what I really wanted was to understand them.

Nathan Moinvaziri has concisely summed up the blend modes in the form of C macros: Photoshop Blend Mode Math. I don't know if it really helps to see Dodge defined as:

#define Blend_ColorDodge(A,B) ((uint8)((A == 255) ? A:((B << 8) / (255 - A) > 255) ? 255:((B << 8) / (255 - A))))

but it at least gives me another way to look at the whole concept.

One of Nathan's sources for the basic information was Paul Dunn's Insight into Photoshop 7.0 Blending Modes, which has visual aids and a more traditional math notation.

280slides

Saturday 7 June 2008

280 Slides is an interesting application, not so much because of what it does, but because of how it does it. It's a nicely made presentation tool, but pretty basic as far as these things go (no animations or transitions, for example). But the whole thing runs in your browser, with no Flash, and has a nice snappy feel. The most startling thing though, is the language it is written in: something called Objective-J, an Objective-C clone implemented in 13Kb of JavaScript, executing in the browser.

The rest of the application is delivered in .j files:

import <SlideKit/SKPresentation.j>
import "EditorController.j"

//...

@implementation Document : CPDocument

//...

- (CPData)dataOfType:(CPString)aType error:({CPError})anError
{
    var dictionary = [CPDictionary dictionary],
        data = [CPData dataWithString:@""],
        archiver = [[CPKeyedArchiver alloc] initForWritingWithMutableData:data];

    [archiver encodeObject:_presentation forKey:DocumentPresentationKey];
    [archiver finishEncoding];
    
    if (aType == DocumentExportType)
        return data;
    
    [data setString:"documentName=" + encodeURIComponent(_documentName) +
        "&numSlides=" + [[_presentation slides] count] + "\n" + [data string]];
    
    return data;
}

//...

@end

Objective-J is one of those ideas which seems crazy at first, but then turns out to be not impossible, and even do-able. I wonder if it will be adopted by other Mac developers as a way to on-ramp their skills to the web.

Ajaxian has an interview with the developers (audio, unfortunately), and there's a lot of info in the comments there. They'll be distributing the language as objective-j.org.

Dealing with experts

Tuesday 3 June 2008

Driving to work today, I saw a truck for a disaster recovery firm. On the back, to announce their capabilities, it read,

FIRE | FLOOD | MOLD | EXPERTS

The next time my house is overrun with experts, I'll know who to call!

Michigan and Florida flap

Sunday 1 June 2008

I've been amazed and disgusted watching the Democrats crawl toward the finish line. Yesterday's decision about Michigan and Florida may have made it possible to end the Obama/Clinton mudfest, but Clinton is still reserving the right to appeal the decision. There are so many disgusting parts of the whole mess:

  • Clinton proposing that the votes be taken at their face, even though Obama wasn't on the ballot in Michigan because he did what they both agreed to do: not run in those states,
  • Clinton supporting that proposal by comparing this mess to Zimbabwe,
  • The crazy primary scheduling process that caused the problem in the first place,
  • Michigan and Florida residents blaming the DNC when the action was taken by their own state parties, and consequences were made very clear from the beginning.

And the funniest part of all? Clinton supporters claiming they'll vote for McCain if the delegates aren't reinstated. One voter was quoted as saying she'd vote for McCain because she couldn't stick with a party that would do this to Michigan and Florida. Guess what? The Republican party made the exact same decision, for the exact same reasons, they just didn't get the news coverage because it didn't affect the outcome. BTW: I'd give a link to a press release about the RNC decision, but their website seems to be all Obama all the time, so there's no mention of it there. The party rules seem pretty clear, though:

16.a.1: If a state or state party violates the Rules of the Republican Party relating to the timing of the selection process resulting in the election of delegates or alternate delegates to the national convention before the call to the national convention is issued, then the number of delegates to the national convention from that state shall be reduced by fifty percent (50%), and the corresponding alternate delegates shall also be reduced.

and the Democrat's rules are just as unambiguous. In fact, after reading these two clauses, I don't understand why the original threat was to seat no delegates, or why there had to be any "agreement" about what to do at all. After all the turmoil, the parties did just what they had said they would do over a year ago.

Clinton voters switching to McCain is either stupidity or spite, neither of which is a good reason to vote for president. Actually, come to think of it, maybe Clinton supporters will feel at home in the GOP. What with saying anything to win, whining about the slant of the media, and exploiting electoral confusion to get ahead, they might feel right at home in Bush's party.

I just hope that Clinton will read the writing on the wall and do the right thing soon. She has a chance here to save her reputation by doing what is good for the party, though if she clings to the rock face by her fingernails much longer, it's going to be very difficult for her to come out of this looking good.

« | » Main « | »