Spore creature creator and steganography

Wednesday 25 June 2008This is over 16 years old. Be careful.

Spore is the wildly anticipated new game from Will Wright, and Creature Creator is the first part of it to be released for us to try. It allows you to build creatures, Mr-Potato-Head-style, which will eventually be usable in the full game:

A Spore creature
A Spore creature
A Spore creature

It’s fun to put arms and legs and body parts together to make creatures, but the more impressive part of the technology is that once you make your creature, it’s fully animated already, with a repertoire of moves like walking, sitting, dancing and greeting. This is no small feat considering you aren’t constrained to building a humanoid creature. For example, my tricyclotops has three legs, and that front centered leg participates in the animations in a way that seems very natural, considering I’ve never seen a creature with legs in that formation.

The developers behind this animation have written up the technology: Real-time Motion Retargeting to Highly Varied User-Created Morphologies. One of the authors, John DeWeese, has a handful of riotously varied creatures on his Spore page.

If you look at the sidebar on the Spore creature pages, you’ll see instructions that you can save those PNG files, and drag them into Creator, and you’ll have the creature. That interested me: one of the things I did with Aptus was to save the coordinate info for a picture as a tEXt record in the PNG file. Aptus can open a PNG file it saved, and instead of dealing with pixel data, can read the coordinates and recreate the Mandelbrot view directly, allowing you to continue exploring from there.

Looking at the PNGs on the Spore page though, they have not done this. There is no data other than the image. But dragging the PNG into Creator does indeed give you the creature as structured data. Renaming the PNG doesn’t affect the data transfer, but any sort of editing of the image does. They’re using steganography, hiding one message inside another. In this case, they seem to be using the least significant bits in all the pixels.

Some quick Python shows details. Using PIL, we can examine the numeric values of the pixels:

# Open an image, and show the RGBA data for the first ten pixels.
import Image, sys

im = Image.open(sys.argv[1])
for pix in list(im.getdata())[:10]:
    print pix

produces:

(0, 1, 0, 1)
(0, 1, 1, 1)
(0, 0, 1, 0)
(1, 1, 0, 0)
(1, 0, 0, 0)
(0, 0, 1, 0)
(1, 1, 0, 1)
(0, 0, 1, 0)
(1, 1, 1, 0)
(0, 1, 1, 0)
(1, 0, 1, 1)
(1, 0, 1, 0)
(0, 0, 0, 1)
(0, 1, 1, 1)
(0, 1, 0, 1)
(1, 0, 1, 1)
(0, 1, 1, 1)
(1, 0, 1, 1)
(0, 1, 0, 0)
(1, 1, 0, 1)

These pixels are part of the black transparent edge of the image, except it isn’t truly black and it isn’t truly transparent. There’s one bit of information being encoded in each channel, or four bits per pixel.

We can go further and yank out the full 8-bit data:

# Open an image file, read the low bits as 8-bit data,
# and write it out to a .out file next to the image.
import Image, sys

def stegdata(imfile):
    """ Read the low bits of pixels as 8-bit data. """
    im = Image.open(imfile)
    bytes = ""
    hi = 0
    for ipix, (r,g,b,a) in enumerate(im.getdata()):
        nyb = (r%2)*8 + (g%2)*4 + (b%2)*2 + (a%2)
        if ipix % 2:
            bytes += chr(hi + nyb)
            hi = 0
        else:
            hi = nyb*16
    return bytes

fname = sys.argv[1]
data = stegdata(fname)
open(fname+'.out', 'w').write(data)

I had to guess here how to put the bits back together into a byte. The results are full-spectrum 8-bit data, but I don’t know how to interpret it:

000000: 57 2c 82 d2 e6 ba 17 5b  7b 4d 20 32 76 eb f4 8b  W,.....[{M 2v...
000010: 4a b7 54 8b b6 9c a7 ba  d5 6a 5f a4 54 15 f1 1f  J.T......j_.T...
000020: c6 90 df 98 54 72 6d 62  58 71 69 f6 63 fe 7e 23  ....TrmbXqi.c.~#
000030: 49 97 de 81 d7 08 ec 5a  1a 63 57 e6 8e 27 16 03  I......Z.cW..'..
000040: 80 5c 56 a1 34 6b d8 fb  49 46 f9 d6 7b 32 ce 6b  .\V.4k..IF..{2.k
000050: a3 2c 35 4e f7 e7 52 1a  62 1f ce 8e 47 5f e7 ba  .,5N..R.b...G_..
000060: 14 ea 74 58 39 ac eb 53  ee c1 c8 3b cc 38 11 d6  ..tX9..S...;.8..
000070: fd 3f dd 41 ff 35 03 a3  67 c4 a6 43 1c 82 24 41  .?.A.5..g..C..$A
000080: b7 1d ce 66 5a 32 b3 f0  34 6b f3 0f 73 f9 ee f6  ...fZ2..4k..s...
000090: 05 41 56 7b 27 19 40 25  bc e7 b1 02 c9 43 e7 7d  .AV{'.@%.....C.}
0000a0: fd b4 11 82 52 1f c8 d0  3c ad 92 ee 1e 57 6d e7  ....R...<....Wm.
0000b0: ad fd 72 53 b3 fd 1a 9b  10 52 57 01 86 11 42 7c  ..rS.....RW...B|
0000c0: a2 74 ed f6 1b 28 33 cf  7a 19 79 fa b0 6c 04 a7  .t...(3.z.y..l..
0000d0: 89 36 c5 08 d8 ee e8 de  5a a3 b8 48 3d 94 62 0d  .6......Z..H=.b.
0000e0: 0a 38 4c 21 5d 15 b8 54  e1 ea d7 0b 12 bf 8a a0  .8L!]..T........
0000f0: a3 e0 96 1a a8 79 c3 44  62 9e de 02 ea a0 31 8d  .....y.Db.....1.
000100: 96 12 e0 7c ad e0 a5 9f  fe 89 54 a6 54 f2 9d 6c  ...|......T.T..l
000110: 42 c1 f0 14 8d 15 49 a5  d3 80 2c b1 26 ca af 80  B.....I...,.&...
000120: a8 cf a8 a4 77 02 60 ea  c0 d8 4d 2c d9 18 1e 67  ....w.`...M,...g
000130: 8f 9a 29 67 30 92 b5 62  da 1d c1 30 21 f8 eb 21  ..)g0..b...0!..!
000140: fe d8 c2 a6 64 cf 52 dc  58 d1 0c ef d0 60 fb 9b  ....d.R.X....`..
000150: 02 7a e9 d1 d6 a7 3c 01  79 7b da a7 9b 0b ef 3f  .z....<.y{.....?
000160: 80 a3 d1 87 d2 81 50 d1  a2 59 c0 65 c3 8b c5 7b  ......P..Y.e...{
000170: 8c e5 56 50 bf c2 6e 50  82 26 23 9a 76 2b e7 3b  ..VP..nP.&#.v+.;
000180: 4f 5a ec f3 87 aa 27 fe  33 74 40 48 ba db 4f 25  OZ....'.3t@H..O%
... ...

Nothing jumps out at me here. As an exercise in code-breaking, this one is probably possible, since we have a way to generate as many cases as we need. I looked at other images, and there was no clear pattern.

It’s interesting that Spore chose steganography here, since it’s usually described as a way to hide a message so that its very existence is a secret. But there’s no sensitive data here, and they tipped us off to its presence with their instructions. Perhaps they wanted to save space by using those unneeded low bits? Perhaps they didn’t have the tools for manipulating tEXt records?

In any case, Spore is already a fertile breeding ground, both for wild new life forms, and geek interest in its technology.

Comments

[gravatar]
They're not really "saving space" because by doing this they're killing the compression of the image.

Those long stretches of transparent black would compress to just about nothing, but now they're full of high entropy data.


They really should have used their own chunks, but you're probably right in not having the tools. I've seen many APIs that just give you a loadImage call, and all format specific-data like PNG chunks is not loaded.

(Or they just didn't know PNG supports user-specified chunks)

[gravatar]
Maybe the reason for using steganography wasn't as technical as that. I think this is a great way to distribute virtual resources like game creatures and whatnot - eg, a web page that diplays "if you like this creature, just drag the picture into your game window" or something like that.

It's like those scammy "you are bidding on a JPEG image of $THING" ebay auctions, but the image really IS the thing :)
[gravatar]
Maybe you're looking too far when discussing whether they didn't have the tools or whether it was to save data. My own guess is that they did it this way because they could.

Myself, I find it awesome, in a probably incredibly geeky way.
[gravatar]
Since the low bits seem to cover the 0x00 to 0xff range evenly (based on a cursory glance), I would guess that it is compressed before embedding into the PNG.
[gravatar]
@Foone: you are probably right that the compression is compromised, nullifying the space savings. Keeping the data in 8-bit form would have allowed it to be compressed directly also.

@Sean: the point of the tEXt records in PNG files is that it would also let you simply drag a PNG file, and the program can extract the structured information. It's a side channel of information that doesn't interfere with the pixels, but is still part of the PNG file.

@Justin: I made the same assumption, but there's no clue how it might have been compressed. zlib didn't like it!
[gravatar]
I put the 400 example values you posted through Ruby (irb FTW!) and they look pretty random. Average value is 130. Standard deviation is 74.

I think it's encrypted.
[gravatar]
As part of the rather silly reddit thread about this, I computed how much it affects the compression. Using one of your example thumbnails above, it's 29k for the embedded data, and 27k if the data was "properly" stored in a PNG chunk.
[gravatar]
I didn't think that reduced compression vs. additional tExt data would be significant, thanks Foone for doing the experiment to show that.

One reason might be to make it difficult for people to create images that do not contain what they appear to contain.

So far I'm going with the "Because it is freakin' awesome" reason as the most plausible :)
[gravatar]
My computer is dead at the moment (hooray for Internet at work), so I can't run these tests myself, but the obvious next step is to do things like this:

Create a creature, and save it as a reference.

Modify one bit of the image above the LSB (change a 0x01 red byte to 0x81 perhaps), and see if that image will load. This will verify the existence or lack of a CRC on the non-data bits.

Open the creature from the image file, and save it with no changes. Compare the data layers on the new image. This will show whether timestamps or other transient data is saved with the creature (making every save of the creature different, even if the creature itself is identical).

Assuming transient data is not saved with the creature, open the creature and make a tiny change... such as altering the color of a single layer by a shade or two. Save the creature, and see if the entire data layer changes (implying compression and/or encryption) or if only a small part changes (implying no compression/encryption).

Assuming compression is used (very likely), save the bitstream using various methods of converting the bits to bytes. For example, each pixel could be a nibble (4 bits), probably in RGBA or ARGB bit order. Or, each layer could be a separate bitstream, one bit per pixel with the 2KB chunks concatenated (again, probably in RGBA or ARGB order). A script could create dozens of 8KB interpretations of the data pretty quickly, and you could try standard decompression routines on each one to see if any are valid.

Alternately, someone good at decompilation could find the file import routine, and bypass all this baseless speculation. :-)
[gravatar]
@Myrddin: Most of your tests don't actually work, unfortunately. The program does not let you re-save a creature unless you change it. The program is not fluid enough to let you do something like change the color of one layer. Instead it has complex shading and texturing patterns based around a single color and design that the user chooses.

I suppose there's also the possibility that, instead of storing the data this way "because it's cool" (though it is :D) they stored the data this way purposefully to make it that much harder for us users to crack. They don't want people figuring out their data storage algorithms and writing programs to create tEXt blocks that imitate their data but actually do something different - possibly destructive.
[gravatar]
You guys are overthinking this by a mile. The reason they don't use custom PNG chunks is simple. By using the alpha channel, any program that supports 32 bit image copies from the clipboard properly can be used as a transfer mechanism for the data. This isn't the case with custom PNG chunks or other side-channels than just the raw pixel data.
[gravatar]
I had the same thought as George. The "just copy over the image" mode of interaction is more robust if only the pixel data is important.
[gravatar]
To sort of illustrate my clipboard reasoning in the post above -- note that you don't actually have to save the creature image in your browser before dragging it into the creature creator. You can simply mouse over one of the images Ned posted of Spore creatures above and drag and drop right into the creature creator (works with both IE and Firefox) without saving because all of the needed data is preserved across the standard clipboard drag and drop action.

I don't view this as steganography at all insomuch as that implies they are trying to hide something from the user. I think it was just a (very good) practical design decision that some clever programmer thought up to allow maximum flexibility for clipboard-related image interchange.
[gravatar]
@Ned - what I meant was, maybe the inner workings of PNG were secondary to the idea of "hey, wouldn't it be cool if we could distribute working creature as images, almost like trading cards?" .. to which the answer was "hey we could do that with steganography + PNG"
[gravatar]
Posted by 'slurpme' on Reddit a while ago, not sure if he's correct or not, just reposting as it might be useful:

---------------------------------------------------

You might be interested to know...

Someone else pointed out that the alpha channel is very "noisy" in the image...

And this seems to be true, when you examine the image the "transparent" parts of the image actually contain a LOT of information, when you look at the parts of the image that have an alpha value of either 0 or 1 most of the RGB values actually evaluate to binary (that is they have either a value of either 0 or 1)!

When you then convert those to ascii (using the ARGB values as binary) you get:

AUTU@DUPP@EPPED@P@QPPPAAQT@EUUUEATTD@@DDQDEUDPEATAEDT@ETPUDUPEUUDP@AAATDAPUQQPTUQ@DTE@QAQDQUPUAUQPTATUAQA@@@QDETPEPEQAAA@DUTQPADETATQTUPEPPTEQADU@TPAD (rest omitted for brevity)

So there is definitely some information being encoded in there... As to what it means... I'd have to run comparisons against other images for that...
[gravatar]
oh wait.. I get it now :/
[gravatar]
Nicely spotted by the article writer, and an interesting topic.

I've often imagined how steganography could enter the mainstream in a commercial service. Spore have shown some commercial genius in so doing!

As George McBay hinted, commerical factors appear to be the main motive for choosing steganography.

Besides the one he points out - the easy 'portability' of the graphic and encrypted creature, there is a more obvious motive.

The closed source nature of the steganographic content restricts the editing of Spore creatures to Spore's tools.

If the creature files were, for example, using xml instead, any kid could open the file, edit content between a tag and anyone capable of writing a viewer/player app, if not an editor could translate the edited file into an alien version of Pinochio.

This is what VRML/X3D (xml ver of former)/what-ever-it's-called-these-days is.

Hard to say if such an open source model could work for a game and its parent company. There are however many games that have open file formats, which allow the creation of mods.

A question I have is, what would a more open format look like, and would it be possible within similar file sizes?

I'll be looking for other uses of this commerical play of steganography, and clones of Spore spawning, with the possibility of a competitor using a more open format.

Would it be too absurd for Sketchup to offer something similar?

The possibilities are really cool!
[gravatar]
Are you sure the creature ID isn't just encoded in the filename?
[gravatar]
Err, sorry for the noise, I just read "renaming the PNG doesn't...". Note to self, read article carefully before commenting.
[gravatar]
The data seems to be encrypted or something. I have looked at the LSB bits in many many orders and so far have come up with nothing. Its definately not ASCII or UNICODE. Each creature has different data starting at first byte. So there is some form of seeded encryption or possible simple XOR masking.
[gravatar]
Well, I had to try to solve it. Couldn't do so, but maybe the following would help someone else:

1. The whole data is in the png file. At first I suspected that the png encodes only a url from which the full creature could be loaded. Disconnecting from the internet didn't prevent spore from loading a (not loaded before, not cached) creature.

2. The data is not encoded only in the LSBs. Changed the MSBs of the red channel and the creature failed to load. This also proves that the data is not only coded in the alpha channel (as some reddit users assumed).

3. There is some kind of error correction? changed quite a few pixel, and that didn't prevent the creature from being loaded.

4. Took a creature named it aaaa and saved. renamed it baaa and saved again. Most pixels have changed, only 5% of pixels remained the same, no apparent pattern to those which stayed the same.
[gravatar]
@George McBay: If I'm not mistaken drag & drop of an image will only copy the url to the (temporary) image to the clipboard, not the pixel data.
[gravatar]
@rouli: your #2 and #3 seem to contradict each other. I doubt very much that data is encoded in the MSBs of the red channel, since those bits would have to serve two masters: the structured data and the look of the thumbnail. The advantage of using the LSBs is that visually you can't tell the different between adjacent colors, so you can set them to whatever you want without affecting the look of the thumbnail.

Your point #2 might simply mean that the entire file is checksummed to prevent mismatching the thumbnail and the data.
[gravatar]
@Ned:

I don't think the data is encoded solely in the red channel's msb, and indeed it won't make much sense. There may be a general checksum, or even that spore checks if the resulting creature looks like the thumbnail provided, but I still don't think the data is encoded only in the LSBs. As further evidence, when I saved the same creature, once named "aaaa", and once named "baaa", not only the LSBs were changed.

I know this is not a complete proof, but on the other hand, it seems to be a valid option ...
[gravatar]
Has anyone tried replacing one creature's LSB with those from another? If it is strictly the LSB, then the Creature Creator should display one image, while loading the other.

Also, regarding the Reddit comment, that data has a very biased distribution. I see '@' (0x40) to 'U' (0x55) represented, with very few of the characters in that range present. I can't imagine they'd actually waste that many bits to make it just a make a very limited range of ASCII values. Can anyone validate the original author's experiment?
[gravatar]
@Anm (sorry Ned for hijacking your post's comments :)

Tried replacing one creature's lsb with another. It didn't work. Though, it can still mean that there's a checksum for the whole "data" (by whole I mean, the picture included).
[gravatar]
I'm curious how you went about changing the LSB for the image. The PNG standard defines a CRC for each chunk of data (http://www.w3.org/TR/PNG-Structure.html) and if you didn't make the changes using a library that would recalculate these, then that would be a major source of error.
[gravatar]
I used pil (as Ned did for getting the LSBs). Since pil repacks the image as png, I believe the correct CRCs are created as well.
pil creates PNGs that are excepted by spore. As I mentioned before, I changed a number of bits, saved the image using pil, and the creature was loaded into spore without a problem.
[gravatar]
Ugh... I'd like to host a repository of code and examples, but EULA claims ownership of all creatures and derivative works, and explicitly says you can't modify creatures with other tools of software.
[gravatar]
I'd like to point out they probably don't use compression. They have to fit a very precise number of bits, and every compression algorithm has a worst case that grows the data.

Also, resaving the file (copy original, then do NOOP edits in the name field) seems to induce changes through out the stegdata.
[gravatar]
That doesn't mean you can't reverse engineer it, it just means that a person who uses a tool that modifies the creatures is breaking the EULA. At this point a way to read the data (or even determine exactly what qualifies as creature data/dna) hasn't even been found.
[gravatar]
More random brainstorming...

The size of the data would make using a shared encryption key, exactly 8k to match the data length, a very real possibility. They could even salt it, randomly choosing one of several keys. My experiments with resaving the file (only a time-stamp change?) almost hint at this.

Of course, they would want to hide the encryption key in the application. In theory, pushing the everything is procedural theme, they could generate a key using the salt hint to seed a pseudo-random algorithm. But that's just speculation.

But I think I may be getting ahead of myself. I need to go look at my resaves to determine exactly what parts of the image have changed. In theory, the image portion (everything but the LSB) should be identical.
[gravatar]
Has anyone tried any of the steganography detection tools? There are a number listed on the wikipedia page for steganography. A lot of them appear to be windows only, so I'm not able to run them :(
[gravatar]
@Anm:
I'll save you the time - even when resaving the same creature with the same name, other bits, not just the LSBs, are changed.
Another possibility (other than encryption) is that they do use compression, and that the timestamp is one of the first things encoded in the data (and thus, would change the appearance of the rest of the data). However, I lean to the encryption theory for now.

Some experiments that I don't understand their meaning:
Saved the same creature after doing NOOP changes to its name. Then copied whole pixels from the first image to the second, and tried opening the "merged" image.
Successfully transfered the first 6 pixels from one to the other. More than that fails. Just changing the 6 pixels to (0,0,0,0) fails. Copying 6 pixels in the middle of the image fails (at least sometimes), though copying 1 or 2 pixels seems to work.
any ideas what that means?
[gravatar]
In the course of exploring (Mac Trial edition), I copied my Creatures directory to a new location, to make room for a directory of experiments. After copying the directory back, most of my creatures are not listed as "my creatures", and several no longer show at all.

I've noticed missing creatures from downloads before, but I thought it was due to download method (e.g., accidentally grabbing *_lrg.png from the web Sporepedia).

This may hint to an index of locally created creatures, probably stored outside the Creatures directory. It also implies ability of the SCC to load a PNG into the catalog is an unreliable test, with several false negatives.

Non-LSB pixel changes could be non-deterministic rendering, possibly from changes in background, camera angle, or animation pose. In fact, careful comparison of the RGB channels in Photoshop shows exactly that.

Knowing this, I'm going to proceed with the LSB theory.
[gravatar]
what do you mean by comparison of the RGB channels in Photoshop? What do you find there?

Actually, for now I accept the LSB theory - I've checked the values of the RGB channels when the transparency is set to either 0 or 1 (that is - in transparent pixels), and only the LSB is set.
Another clue - I've successfully resaved a creature in spore without changing any pixel (even the LSBs were not changed). So it seems that the data is not newly encoded everytime, with a newly created key. Maybe the creature-data is encoded relatively to the non-data (=image data)?
[gravatar]
I loaded several saves of the same creature into Photoshop. After merging each with a white background, I added them as layers in a new image. Then I incrementally pushed each to the top layer and set is composition method to "difference". This highlights the differences between the top layer and the next layer below it. But even just flipping through the images I was able to see minor perspective differences once they were all lined up together.

About how many images did you go through to get a matching resave? It may hint at a low number of encryption salts. Except...

I think we're expecting a change: we know the systems stores a timestamp of when the creature was created. Even downloaded creatures have creation dates earlier than the file date. And when you first create a creature, the time stamp is accurate to the second. Are you sure you did not accidentally duplicate a file?
[gravatar]
I may have duplicated the image by mistake, it does seem the more reasonable option. I moved the creature from the creatures directory, and saved again, but according to your last post, it may not be enough.

It was my first attempt, which makes it much more suspicious.
[gravatar]
Apologies in advance for ramblings, I'm typing this as I go and probably won't edit much.

Re: missing creatures

I have two creatures I made (one was only a recolor of an existing, so I have a new copy) do not show up in my catalog (the SCC keeps BSoD-ing my desktop). I also have an install at my parent's house and the first three creatures I made there (including a successful exit of the program) also don't show up when started up again.

Just managed to grab one of the parent's place critters, but can't grab one created on this machine. The other one I just did did note me as the creator, doesn't have a lock symbol on it, unlike another user's creation, and is noted as being shared, however it is not under "my creations".

Ah ha. Deleting the "bad" file from the My Docs\My Creations folder allowed me to load it up again. Fantastic. Files are identical to the bit, what this means, I don't know. Moved the pics from my desktop into the \My Creations folder and they load up just fine too, they just don't show under My Creations. Meh, I can live with that. Attempt to edit...BSOD again. Bah, this computer hasn't been stable since it fried its CPU...four years ago? Something like that, nearly every damn piece has been replaced at this point.

Trying again. BSOD, but a little farther into the process almost had the full screen view of the thing. Geeze! Again, BSOD on splash screen....and corrupted text on reboot. This is not good. Not good at all...

Hard power down seems to have de-corrupted whatever it was that was giving me graphics problems. USB keyboard was not initially recognized, but was when Windows booted. One more time, one more time....

Yay! Edits R Us. Added a spin bit and removed it (essentially leaving the creature the same) and saved, old and new files were different, didn't look too closely.

....if the creatures are the same, but the files different, would it be possible to subtract one from the other and possibly get useful data out of it? I know that some encryption schemes can be partially deciphered that way (as you're subtracting out the plain text and are left with only an A-B ciphertext).
[gravatar]
I have been looking at this program from essentially the other end, and that is the data files the Creature Creator makes and uses after you drop the picture on the editor.

The data files are a new variant of EA's old DBPF format. Most of the data is compressed using the zlib deflate() routine. There is a prefix of 5 bytes in the game files prefixed to the compressed data:

10 FB XX XX XX or 50 FB XX XX XX

The fields XX XX XX above are the uncompressed size, chopped to 24 bits, in a high-to-low byte order.

In the game, at least the large part of the creature data is in XML format, and is highly compressible, for one test file I made I compressed one chunk about 10:1 with WinRar.

There are likely at least two chunks to the data, one with the creator and creatire names an comment in it, and another with the XML creature data.

So, perhaps the 0x10FB or 0x50FB will help someone solve the puzzle.
[gravatar]
@Wes,
There is a Sim2 wiki page that goes into detail on that format at http://www.sims2wiki.info/wiki.php?title=DatabasePackedFile and includes reference to 0x10FB "Compression ID (0x10FB) QFS Compression"
[gravatar]
@Wes
I understand you have an xml representation of a creature. Is that correct? can you upload it (with the creature's png representation) or explain in more detail how to get it?

thanks,
rouli

btw, I've had some minor progress, will post about it on the weekend.
[gravatar]
I can decompress any of the creature creator .package data files. The package format is different, but the chunks use the same compression format as The Sims 2, but with only a 5 byte header, and a (sometimes) alternate type word (0xFB50) that as best I can tell decompresses in exactly the same manner as the older one.

It is also very obvious that zlib 1.2.3 is used for the compression, the zlib license is copied into the EULA, and can be seen in the executable as plain text in WOrdPad (not that I would disassemble the executable in violation of the EULA).

My point was just that if someone is looking for a Rosetta Stone to determine the pattern used in encoding the bits, the five bytes I mentioned will likely turn up when they bits are ordered properly. The decryption code is available at the link you mentioned, and at other locations, but what was used in The Sims 2 used a 9-byte header that included the compressed size, in the CC game files this information is available in the index, and is not repeated in the data chunks, while the compressed size is available in both locations, albeit chopped to 24 bits and in reversed byte order.
[gravatar]
@rouli (and anyone else interested)

Per your request, I have posted a creature and a .rar archive of the EXTRACTED files, including the .xml file. I put these on the CustomSims3 {dot} com website (well, Spore is out, Sims 3 is not, what else can I say), in the Meshing and Modding forum, in a thread titled "Chickenopolis".

Membership is not required to read the messages and download the attachments, but I do not have the wonder spam filters that Ned has here, so if you want to comment there, you have to sign up.
[gravatar]
I've been trying to crack the file format myself, but didn't get any further than what has already been discussed here. So I'm adding this comment here just to subscribe to any further comments :)
[gravatar]
When you change the name and resave the creature, is the .png image and the creature's pose exactly the same each time? The creature moves about randomly while it's on the screen. If you take a snapshot of the creature using the camera tool, successive images will be slightly different even if taken seconds apart on the same creature with no modifications. If the creature's pose is slightly different between re-saves perhaps that could account for the pixel differences outside of the LSB?
[gravatar]
I don't know if that explains all of the differences outside of the LSB, but as the only skeptic of the LSB theory, I can say that I'm now convinced that the creature's data is in the LSBs, and I'm sorry that my previous posts may cause doubt.
Though, the LSBs do contain some information about the other bits (which may be a simple checksum) that prevents one from copying the LSBs of one creature to another.

@Wes - Thanks!
[gravatar]
I agree with rouli on all points. There is simply no other place for the data to be without visible artifacts. I wrote a quick little PNG parser, and there is nothing else there.

The checksum of image is a a good point, and would explain some of our tests.
[gravatar]
The CC uses the procedural data to make a conventional mesh, using parts that are in the data files that ship with the program. Animations are then applied to these meshes. The components seem to be in a file called CSA_Graphics.package, while the composited meshes can be found in GraphicsCache.package.

The meshes are not in the creature creator files, just references to the components and their placement. I would not believe that the idle animations onscreen would have any effect on the construction details.
[gravatar]
I'm not suggesting the idle animations would have an effect on the construction details of the 3d model, im suggesting the idle animations might have an effect on the actual 2d rendering in the .png, and that those variations might account for the changes in image data outside the LSB when you've made no changes other than renaming the creature.

I dont know if there's any basis to this hypothesis or not, I'm not opening these PNGs in binary editors like some of the people posting here, it was just a hypothesis that popped into my head while I was scanning through this thread.

My point is just that if you save the same creature twice, and the 2d rendering in the .png is not identical each time because of slight variations in the pose of the creature, then you would need to take that into account when trying to compare the differences between the two image files bit by bit.
[gravatar]
Hmmm.... For what it's worth I've gotten around to taking a few minutes to check out my hypothesis and it does look like I was wrong. If you save the same creature multiple times with different names, the .png image is identical each time as far as I can see from just looking at it. The pose is exactly the same each time the creature is saved. the idle animations and the position the creature is looking in at the time that the save button is hit does not appear to have any effect on the image in the .png.

So whatever variations there are in the pixel data from one save to the next do not have a noticeable affect in the image appearance, which would imply that the data variations all have to do with the steganographic model encoding
[gravatar]
OK, another clue.
Each character has an ID composed of two ulongs. In the uncompressed XML creature files, this is represented as
0x40626000, 0xd05c53a3
although that data would be compressed in the PNG file. In the file I took that line from, there are 15 instances of that sequence, one for each block, although that seems to vary by creature complexity.

Each creature has a unique pair of ID numbers, at least on of which is generated by hashing the name, although I do not know the hashing algorithm. That pair of ID numbers is used to tie different parts of the creature definition together.

The line I showed above may be the single most significant variation in the otherwise identical creatures with different names.
[gravatar]
Sorry, in my previous post, there were tags blockid and /blockid, enclosed in angle brackets, on each end of the two hex numbers, but the tags got swallowed when I posted the message (or else they are there and the browser is swallowing them).
[gravatar]
Remember, the description and tags get encoded too, so for the person wanting to do controlled testing with minor changes, simply change a single character in the description.

Because this text has to fit into the image along with the data describing the creature, it means there's an enforced upper limit on the length of the description and tags which is probably pretty low.
[gravatar]
A short update:
Had little time to further explore the file format, but you may find the following creature interesting -
http://bp3.blogger.com/_hDVLcOyq0vk/SG-MwdO5wsI/AAAAAAAAAC8/DsqJcibOPsA/s1600-h/x0.png

As you can see (and if you can't, go to my defunct blog, I have only one post, and it's about spore), that image is blank, but surprisingly, when one loads it in spore, it turns out to contain a basic worm-like creature. I'm not sure how exactly I did it, I guess it is due to some weird program bug (no foul play on my part).

Anyway, this may help us find that elusive "checksum", because there is no picture data to interfere with the creature's data. On the other hand, if anyone would create another creature file with no real image present, comparing both creatures may shed light on the encoding process.
[gravatar]
@rouli

At the same site you obtained the package I made for you is a Spore mesh viewer tool. It's weighty (built on the Ogre library), but it has an archive splitter function. You could use that to extract the character data yourself from characters you made (it is stored in the EditorSave.package and the GraphicsCache.package files).
[gravatar]
Blank creature files and chimeras:

Could the checksum be copied from one creature (the intended base image) to the second (the chimera) in order to get the creature to load properly?

Other possibilities include making minor changes to the creature data (the XML file) such that you make an asymmetrical creature (possible?) or change things very minorly (a rotation of to rotation ) so that in essence you create a new creature by changing the LSB in tiny amounts, loading, saving (generating a new checksum!) and then tweaking some more. Grueling process, but if you can change the LSB data to significant degrees, without copying an entirely different creature's data onto a picture, then you could make things like Fiddler Crabs with ease (just remove that pesky tag that insists that a claw is on both sides!). Interestingly, Fiddler Crab's large claws aren't genetic, if it loses its large claw, the other increases in size on the next shell-molting. Odd, certainly, but currently impossible in the CC.
[gravatar]
To those still tracking this thread -
I finally got to write why you shouldn't post any creature to sporepedia, http://www.rouli.net/2008/07/sporepedia-leaks-email-addresses.html.
In short, sporepedia expose your email address, so watch out.

On the other hand, sporepedia also exposes some of the fields one may find in the encoded creature:
s5.assetId=500006865544;
 s5.attackRating=4.0;
 s5.author=s29;
 s5.baseGear=0.0; s5.bite=2.0; s5.boneCount=45;
 s5.carnivoreRating=1.0; s5.charge=2.0; s5.created=new Date(1217013619767);
 s5.cuteness=53.332813; s5.dance=0.0; s5.description=null;
 s5.featured=null; s5.footCount=4; s5['function']=s0;
 s5.gesture=2.0; s5.glide=0.0; s5.grasperCount=0;
 s5.health=2.0; s5.height=1.444589; s5.herbivoreRating=1.0;
 s5.id=500006865544; s5.maxAttack=2.0; s5.maxSocial=2.0;
 s5.meanness=13.0; s5.name="Rana"; s5.parentId=null;
 s5.posture=0.0; s5.quality=null; s5.rating=-1.0;
 s5.sing=1.0; s5.social=3.0; s5.spit=0.0;
 s5.sprint=0.0; s5.status=s1; s5.stealth=0.0;
 s5.strike=0.0; s5.tags=null; s5.thumbnailSize=27585;
 s5.totalEvoPoints=665; s5.type='CREATURE';
But I guess no one is watching this thread and I'm posting this in vain.
[gravatar]
Well, seeing that some traffic to my site came from this thread, I guess people are still registered to it, and I'm not writting comments in vain :)
Anyhow, I found out that some guys at Something Awful have reverse engineered Spore creature creator and found out how creatures are coded within the .png files (well, at least most of it).
It turns out that the non LSB bits in the picture are used as an encryption key for the (compressed) creature's model. Moreover, the bits are permutated, just to make things a lot more fun.
To those inclined, I've posted a simple walkthrough on the encoding procedure with some of the open problems still ahead.
I would very much like to continue the discussion!
[gravatar]
Huh, this is a fascinating read! Curious if any further progress had been made!
[gravatar]
Any thing cool happen since 2009?

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.