Wednesday 25 June 2008 — This is over 16 years old. Be careful.
Spore is the wildly anticipated new game from Will Wright, and Creature Creator is the first part of it to be released for us to try. It allows you to build creatures, Mr-Potato-Head-style, which will eventually be usable in the full game:
It’s fun to put arms and legs and body parts together to make creatures, but the more impressive part of the technology is that once you make your creature, it’s fully animated already, with a repertoire of moves like walking, sitting, dancing and greeting. This is no small feat considering you aren’t constrained to building a humanoid creature. For example, my tricyclotops has three legs, and that front centered leg participates in the animations in a way that seems very natural, considering I’ve never seen a creature with legs in that formation.
The developers behind this animation have written up the technology: Real-time Motion Retargeting to Highly Varied User-Created Morphologies. One of the authors, John DeWeese, has a handful of riotously varied creatures on his Spore page.
If you look at the sidebar on the Spore creature pages, you’ll see instructions that you can save those PNG files, and drag them into Creator, and you’ll have the creature. That interested me: one of the things I did with Aptus was to save the coordinate info for a picture as a tEXt record in the PNG file. Aptus can open a PNG file it saved, and instead of dealing with pixel data, can read the coordinates and recreate the Mandelbrot view directly, allowing you to continue exploring from there.
Looking at the PNGs on the Spore page though, they have not done this. There is no data other than the image. But dragging the PNG into Creator does indeed give you the creature as structured data. Renaming the PNG doesn’t affect the data transfer, but any sort of editing of the image does. They’re using steganography, hiding one message inside another. In this case, they seem to be using the least significant bits in all the pixels.
Some quick Python shows details. Using PIL, we can examine the numeric values of the pixels:
# Open an image, and show the RGBA data for the first ten pixels.
import Image, sys
im = Image.open(sys.argv[1])
for pix in list(im.getdata())[:10]:
print pix
produces:
(0, 1, 0, 1)
(0, 1, 1, 1)
(0, 0, 1, 0)
(1, 1, 0, 0)
(1, 0, 0, 0)
(0, 0, 1, 0)
(1, 1, 0, 1)
(0, 0, 1, 0)
(1, 1, 1, 0)
(0, 1, 1, 0)
(1, 0, 1, 1)
(1, 0, 1, 0)
(0, 0, 0, 1)
(0, 1, 1, 1)
(0, 1, 0, 1)
(1, 0, 1, 1)
(0, 1, 1, 1)
(1, 0, 1, 1)
(0, 1, 0, 0)
(1, 1, 0, 1)
These pixels are part of the black transparent edge of the image, except it isn’t truly black and it isn’t truly transparent. There’s one bit of information being encoded in each channel, or four bits per pixel.
We can go further and yank out the full 8-bit data:
# Open an image file, read the low bits as 8-bit data,
# and write it out to a .out file next to the image.
import Image, sys
def stegdata(imfile):
""" Read the low bits of pixels as 8-bit data. """
im = Image.open(imfile)
bytes = ""
hi = 0
for ipix, (r,g,b,a) in enumerate(im.getdata()):
nyb = (r%2)*8 + (g%2)*4 + (b%2)*2 + (a%2)
if ipix % 2:
bytes += chr(hi + nyb)
hi = 0
else:
hi = nyb*16
return bytes
fname = sys.argv[1]
data = stegdata(fname)
open(fname+'.out', 'w').write(data)
I had to guess here how to put the bits back together into a byte. The results are full-spectrum 8-bit data, but I don’t know how to interpret it:
000000: 57 2c 82 d2 e6 ba 17 5b 7b 4d 20 32 76 eb f4 8b W,.....[{M 2v...
000010: 4a b7 54 8b b6 9c a7 ba d5 6a 5f a4 54 15 f1 1f J.T......j_.T...
000020: c6 90 df 98 54 72 6d 62 58 71 69 f6 63 fe 7e 23 ....TrmbXqi.c.~#
000030: 49 97 de 81 d7 08 ec 5a 1a 63 57 e6 8e 27 16 03 I......Z.cW..'..
000040: 80 5c 56 a1 34 6b d8 fb 49 46 f9 d6 7b 32 ce 6b .\V.4k..IF..{2.k
000050: a3 2c 35 4e f7 e7 52 1a 62 1f ce 8e 47 5f e7 ba .,5N..R.b...G_..
000060: 14 ea 74 58 39 ac eb 53 ee c1 c8 3b cc 38 11 d6 ..tX9..S...;.8..
000070: fd 3f dd 41 ff 35 03 a3 67 c4 a6 43 1c 82 24 41 .?.A.5..g..C..$A
000080: b7 1d ce 66 5a 32 b3 f0 34 6b f3 0f 73 f9 ee f6 ...fZ2..4k..s...
000090: 05 41 56 7b 27 19 40 25 bc e7 b1 02 c9 43 e7 7d .AV{'.@%.....C.}
0000a0: fd b4 11 82 52 1f c8 d0 3c ad 92 ee 1e 57 6d e7 ....R...<....Wm.
0000b0: ad fd 72 53 b3 fd 1a 9b 10 52 57 01 86 11 42 7c ..rS.....RW...B|
0000c0: a2 74 ed f6 1b 28 33 cf 7a 19 79 fa b0 6c 04 a7 .t...(3.z.y..l..
0000d0: 89 36 c5 08 d8 ee e8 de 5a a3 b8 48 3d 94 62 0d .6......Z..H=.b.
0000e0: 0a 38 4c 21 5d 15 b8 54 e1 ea d7 0b 12 bf 8a a0 .8L!]..T........
0000f0: a3 e0 96 1a a8 79 c3 44 62 9e de 02 ea a0 31 8d .....y.Db.....1.
000100: 96 12 e0 7c ad e0 a5 9f fe 89 54 a6 54 f2 9d 6c ...|......T.T..l
000110: 42 c1 f0 14 8d 15 49 a5 d3 80 2c b1 26 ca af 80 B.....I...,.&...
000120: a8 cf a8 a4 77 02 60 ea c0 d8 4d 2c d9 18 1e 67 ....w.`...M,...g
000130: 8f 9a 29 67 30 92 b5 62 da 1d c1 30 21 f8 eb 21 ..)g0..b...0!..!
000140: fe d8 c2 a6 64 cf 52 dc 58 d1 0c ef d0 60 fb 9b ....d.R.X....`..
000150: 02 7a e9 d1 d6 a7 3c 01 79 7b da a7 9b 0b ef 3f .z....<.y{.....?
000160: 80 a3 d1 87 d2 81 50 d1 a2 59 c0 65 c3 8b c5 7b ......P..Y.e...{
000170: 8c e5 56 50 bf c2 6e 50 82 26 23 9a 76 2b e7 3b ..VP..nP.&#.v+.;
000180: 4f 5a ec f3 87 aa 27 fe 33 74 40 48 ba db 4f 25 OZ....'.3t@H..O%
... ...
Nothing jumps out at me here. As an exercise in code-breaking, this one is probably possible, since we have a way to generate as many cases as we need. I looked at other images, and there was no clear pattern.
It’s interesting that Spore chose steganography here, since it’s usually described as a way to hide a message so that its very existence is a secret. But there’s no sensitive data here, and they tipped us off to its presence with their instructions. Perhaps they wanted to save space by using those unneeded low bits? Perhaps they didn’t have the tools for manipulating tEXt records?
In any case, Spore is already a fertile breeding ground, both for wild new life forms, and geek interest in its technology.
Comments
Those long stretches of transparent black would compress to just about nothing, but now they're full of high entropy data.
They really should have used their own chunks, but you're probably right in not having the tools. I've seen many APIs that just give you a loadImage call, and all format specific-data like PNG chunks is not loaded.
(Or they just didn't know PNG supports user-specified chunks)
It's like those scammy "you are bidding on a JPEG image of $THING" ebay auctions, but the image really IS the thing :)
Myself, I find it awesome, in a probably incredibly geeky way.
@Sean: the point of the tEXt records in PNG files is that it would also let you simply drag a PNG file, and the program can extract the structured information. It's a side channel of information that doesn't interfere with the pixels, but is still part of the PNG file.
@Justin: I made the same assumption, but there's no clue how it might have been compressed. zlib didn't like it!
I think it's encrypted.
One reason might be to make it difficult for people to create images that do not contain what they appear to contain.
So far I'm going with the "Because it is freakin' awesome" reason as the most plausible :)
Create a creature, and save it as a reference.
Modify one bit of the image above the LSB (change a 0x01 red byte to 0x81 perhaps), and see if that image will load. This will verify the existence or lack of a CRC on the non-data bits.
Open the creature from the image file, and save it with no changes. Compare the data layers on the new image. This will show whether timestamps or other transient data is saved with the creature (making every save of the creature different, even if the creature itself is identical).
Assuming transient data is not saved with the creature, open the creature and make a tiny change... such as altering the color of a single layer by a shade or two. Save the creature, and see if the entire data layer changes (implying compression and/or encryption) or if only a small part changes (implying no compression/encryption).
Assuming compression is used (very likely), save the bitstream using various methods of converting the bits to bytes. For example, each pixel could be a nibble (4 bits), probably in RGBA or ARGB bit order. Or, each layer could be a separate bitstream, one bit per pixel with the 2KB chunks concatenated (again, probably in RGBA or ARGB order). A script could create dozens of 8KB interpretations of the data pretty quickly, and you could try standard decompression routines on each one to see if any are valid.
Alternately, someone good at decompilation could find the file import routine, and bypass all this baseless speculation. :-)
I suppose there's also the possibility that, instead of storing the data this way "because it's cool" (though it is :D) they stored the data this way purposefully to make it that much harder for us users to crack. They don't want people figuring out their data storage algorithms and writing programs to create tEXt blocks that imitate their data but actually do something different - possibly destructive.
I don't view this as steganography at all insomuch as that implies they are trying to hide something from the user. I think it was just a (very good) practical design decision that some clever programmer thought up to allow maximum flexibility for clipboard-related image interchange.
---------------------------------------------------
You might be interested to know...
Someone else pointed out that the alpha channel is very "noisy" in the image...
And this seems to be true, when you examine the image the "transparent" parts of the image actually contain a LOT of information, when you look at the parts of the image that have an alpha value of either 0 or 1 most of the RGB values actually evaluate to binary (that is they have either a value of either 0 or 1)!
When you then convert those to ascii (using the ARGB values as binary) you get:
AUTU@DUPP@EPPED@P@QPPPAAQT@EUUUEATTD@@DDQDEUDPEATAEDT@ETPUDUPEUUDP@AAATDAPUQQPTUQ@DTE@QAQDQUPUAUQPTATUAQA@@@QDETPEPEQAAA@DUTQPADETATQTUPEPPTEQADU@TPAD (rest omitted for brevity)
So there is definitely some information being encoded in there... As to what it means... I'd have to run comparisons against other images for that...
I've often imagined how steganography could enter the mainstream in a commercial service. Spore have shown some commercial genius in so doing!
As George McBay hinted, commerical factors appear to be the main motive for choosing steganography.
Besides the one he points out - the easy 'portability' of the graphic and encrypted creature, there is a more obvious motive.
The closed source nature of the steganographic content restricts the editing of Spore creatures to Spore's tools.
If the creature files were, for example, using xml instead, any kid could open the file, edit content between a tag and anyone capable of writing a viewer/player app, if not an editor could translate the edited file into an alien version of Pinochio.
This is what VRML/X3D (xml ver of former)/what-ever-it's-called-these-days is.
Hard to say if such an open source model could work for a game and its parent company. There are however many games that have open file formats, which allow the creation of mods.
A question I have is, what would a more open format look like, and would it be possible within similar file sizes?
I'll be looking for other uses of this commerical play of steganography, and clones of Spore spawning, with the possibility of a competitor using a more open format.
Would it be too absurd for Sketchup to offer something similar?
The possibilities are really cool!
1. The whole data is in the png file. At first I suspected that the png encodes only a url from which the full creature could be loaded. Disconnecting from the internet didn't prevent spore from loading a (not loaded before, not cached) creature.
2. The data is not encoded only in the LSBs. Changed the MSBs of the red channel and the creature failed to load. This also proves that the data is not only coded in the alpha channel (as some reddit users assumed).
3. There is some kind of error correction? changed quite a few pixel, and that didn't prevent the creature from being loaded.
4. Took a creature named it aaaa and saved. renamed it baaa and saved again. Most pixels have changed, only 5% of pixels remained the same, no apparent pattern to those which stayed the same.
Your point #2 might simply mean that the entire file is checksummed to prevent mismatching the thumbnail and the data.
I don't think the data is encoded solely in the red channel's msb, and indeed it won't make much sense. There may be a general checksum, or even that spore checks if the resulting creature looks like the thumbnail provided, but I still don't think the data is encoded only in the LSBs. As further evidence, when I saved the same creature, once named "aaaa", and once named "baaa", not only the LSBs were changed.
I know this is not a complete proof, but on the other hand, it seems to be a valid option ...
Also, regarding the Reddit comment, that data has a very biased distribution. I see '@' (0x40) to 'U' (0x55) represented, with very few of the characters in that range present. I can't imagine they'd actually waste that many bits to make it just a make a very limited range of ASCII values. Can anyone validate the original author's experiment?
Tried replacing one creature's lsb with another. It didn't work. Though, it can still mean that there's a checksum for the whole "data" (by whole I mean, the picture included).
pil creates PNGs that are excepted by spore. As I mentioned before, I changed a number of bits, saved the image using pil, and the creature was loaded into spore without a problem.
Also, resaving the file (copy original, then do NOOP edits in the name field) seems to induce changes through out the stegdata.
The size of the data would make using a shared encryption key, exactly 8k to match the data length, a very real possibility. They could even salt it, randomly choosing one of several keys. My experiments with resaving the file (only a time-stamp change?) almost hint at this.
Of course, they would want to hide the encryption key in the application. In theory, pushing the everything is procedural theme, they could generate a key using the salt hint to seed a pseudo-random algorithm. But that's just speculation.
But I think I may be getting ahead of myself. I need to go look at my resaves to determine exactly what parts of the image have changed. In theory, the image portion (everything but the LSB) should be identical.
I'll save you the time - even when resaving the same creature with the same name, other bits, not just the LSBs, are changed.
Another possibility (other than encryption) is that they do use compression, and that the timestamp is one of the first things encoded in the data (and thus, would change the appearance of the rest of the data). However, I lean to the encryption theory for now.
Some experiments that I don't understand their meaning:
Saved the same creature after doing NOOP changes to its name. Then copied whole pixels from the first image to the second, and tried opening the "merged" image.
Successfully transfered the first 6 pixels from one to the other. More than that fails. Just changing the 6 pixels to (0,0,0,0) fails. Copying 6 pixels in the middle of the image fails (at least sometimes), though copying 1 or 2 pixels seems to work.
any ideas what that means?
I've noticed missing creatures from downloads before, but I thought it was due to download method (e.g., accidentally grabbing *_lrg.png from the web Sporepedia).
This may hint to an index of locally created creatures, probably stored outside the Creatures directory. It also implies ability of the SCC to load a PNG into the catalog is an unreliable test, with several false negatives.
Non-LSB pixel changes could be non-deterministic rendering, possibly from changes in background, camera angle, or animation pose. In fact, careful comparison of the RGB channels in Photoshop shows exactly that.
Knowing this, I'm going to proceed with the LSB theory.
Actually, for now I accept the LSB theory - I've checked the values of the RGB channels when the transparency is set to either 0 or 1 (that is - in transparent pixels), and only the LSB is set.
Another clue - I've successfully resaved a creature in spore without changing any pixel (even the LSBs were not changed). So it seems that the data is not newly encoded everytime, with a newly created key. Maybe the creature-data is encoded relatively to the non-data (=image data)?
About how many images did you go through to get a matching resave? It may hint at a low number of encryption salts. Except...
I think we're expecting a change: we know the systems stores a timestamp of when the creature was created. Even downloaded creatures have creation dates earlier than the file date. And when you first create a creature, the time stamp is accurate to the second. Are you sure you did not accidentally duplicate a file?
It was my first attempt, which makes it much more suspicious.
Re: missing creatures
I have two creatures I made (one was only a recolor of an existing, so I have a new copy) do not show up in my catalog (the SCC keeps BSoD-ing my desktop). I also have an install at my parent's house and the first three creatures I made there (including a successful exit of the program) also don't show up when started up again.
Just managed to grab one of the parent's place critters, but can't grab one created on this machine. The other one I just did did note me as the creator, doesn't have a lock symbol on it, unlike another user's creation, and is noted as being shared, however it is not under "my creations".
Ah ha. Deleting the "bad" file from the My Docs\My Creations folder allowed me to load it up again. Fantastic. Files are identical to the bit, what this means, I don't know. Moved the pics from my desktop into the \My Creations folder and they load up just fine too, they just don't show under My Creations. Meh, I can live with that. Attempt to edit...BSOD again. Bah, this computer hasn't been stable since it fried its CPU...four years ago? Something like that, nearly every damn piece has been replaced at this point.
Trying again. BSOD, but a little farther into the process almost had the full screen view of the thing. Geeze! Again, BSOD on splash screen....and corrupted text on reboot. This is not good. Not good at all...
Hard power down seems to have de-corrupted whatever it was that was giving me graphics problems. USB keyboard was not initially recognized, but was when Windows booted. One more time, one more time....
Yay! Edits R Us. Added a spin bit and removed it (essentially leaving the creature the same) and saved, old and new files were different, didn't look too closely.
....if the creatures are the same, but the files different, would it be possible to subtract one from the other and possibly get useful data out of it? I know that some encryption schemes can be partially deciphered that way (as you're subtracting out the plain text and are left with only an A-B ciphertext).
The data files are a new variant of EA's old DBPF format. Most of the data is compressed using the zlib deflate() routine. There is a prefix of 5 bytes in the game files prefixed to the compressed data:
10 FB XX XX XX or 50 FB XX XX XX
The fields XX XX XX above are the uncompressed size, chopped to 24 bits, in a high-to-low byte order.
In the game, at least the large part of the creature data is in XML format, and is highly compressible, for one test file I made I compressed one chunk about 10:1 with WinRar.
There are likely at least two chunks to the data, one with the creator and creatire names an comment in it, and another with the XML creature data.
So, perhaps the 0x10FB or 0x50FB will help someone solve the puzzle.
There is a Sim2 wiki page that goes into detail on that format at http://www.sims2wiki.info/wiki.php?title=DatabasePackedFile and includes reference to 0x10FB "Compression ID (0x10FB) QFS Compression"
I understand you have an xml representation of a creature. Is that correct? can you upload it (with the creature's png representation) or explain in more detail how to get it?
thanks,
rouli
btw, I've had some minor progress, will post about it on the weekend.
It is also very obvious that zlib 1.2.3 is used for the compression, the zlib license is copied into the EULA, and can be seen in the executable as plain text in WOrdPad (not that I would disassemble the executable in violation of the EULA).
My point was just that if someone is looking for a Rosetta Stone to determine the pattern used in encoding the bits, the five bytes I mentioned will likely turn up when they bits are ordered properly. The decryption code is available at the link you mentioned, and at other locations, but what was used in The Sims 2 used a 9-byte header that included the compressed size, in the CC game files this information is available in the index, and is not repeated in the data chunks, while the compressed size is available in both locations, albeit chopped to 24 bits and in reversed byte order.
Per your request, I have posted a creature and a .rar archive of the EXTRACTED files, including the .xml file. I put these on the CustomSims3 {dot} com website (well, Spore is out, Sims 3 is not, what else can I say), in the Meshing and Modding forum, in a thread titled "Chickenopolis".
Membership is not required to read the messages and download the attachments, but I do not have the wonder spam filters that Ned has here, so if you want to comment there, you have to sign up.
Though, the LSBs do contain some information about the other bits (which may be a simple checksum) that prevents one from copying the LSBs of one creature to another.
@Wes - Thanks!
The checksum of image is a a good point, and would explain some of our tests.
The meshes are not in the creature creator files, just references to the components and their placement. I would not believe that the idle animations onscreen would have any effect on the construction details.
I dont know if there's any basis to this hypothesis or not, I'm not opening these PNGs in binary editors like some of the people posting here, it was just a hypothesis that popped into my head while I was scanning through this thread.
My point is just that if you save the same creature twice, and the 2d rendering in the .png is not identical each time because of slight variations in the pose of the creature, then you would need to take that into account when trying to compare the differences between the two image files bit by bit.
So whatever variations there are in the pixel data from one save to the next do not have a noticeable affect in the image appearance, which would imply that the data variations all have to do with the steganographic model encoding
Each character has an ID composed of two ulongs. In the uncompressed XML creature files, this is represented as
0x40626000, 0xd05c53a3
although that data would be compressed in the PNG file. In the file I took that line from, there are 15 instances of that sequence, one for each block, although that seems to vary by creature complexity.
Each creature has a unique pair of ID numbers, at least on of which is generated by hashing the name, although I do not know the hashing algorithm. That pair of ID numbers is used to tie different parts of the creature definition together.
The line I showed above may be the single most significant variation in the otherwise identical creatures with different names.
Because this text has to fit into the image along with the data describing the creature, it means there's an enforced upper limit on the length of the description and tags which is probably pretty low.
Had little time to further explore the file format, but you may find the following creature interesting -
http://bp3.blogger.com/_hDVLcOyq0vk/SG-MwdO5wsI/AAAAAAAAAC8/DsqJcibOPsA/s1600-h/x0.png
As you can see (and if you can't, go to my defunct blog, I have only one post, and it's about spore), that image is blank, but surprisingly, when one loads it in spore, it turns out to contain a basic worm-like creature. I'm not sure how exactly I did it, I guess it is due to some weird program bug (no foul play on my part).
Anyway, this may help us find that elusive "checksum", because there is no picture data to interfere with the creature's data. On the other hand, if anyone would create another creature file with no real image present, comparing both creatures may shed light on the encoding process.
At the same site you obtained the package I made for you is a Spore mesh viewer tool. It's weighty (built on the Ogre library), but it has an archive splitter function. You could use that to extract the character data yourself from characters you made (it is stored in the EditorSave.package and the GraphicsCache.package files).
Could the checksum be copied from one creature (the intended base image) to the second (the chimera) in order to get the creature to load properly?
Other possibilities include making minor changes to the creature data (the XML file) such that you make an asymmetrical creature (possible?) or change things very minorly (a rotation of to rotation ) so that in essence you create a new creature by changing the LSB in tiny amounts, loading, saving (generating a new checksum!) and then tweaking some more. Grueling process, but if you can change the LSB data to significant degrees, without copying an entirely different creature's data onto a picture, then you could make things like Fiddler Crabs with ease (just remove that pesky tag that insists that a claw is on both sides!). Interestingly, Fiddler Crab's large claws aren't genetic, if it loses its large claw, the other increases in size on the next shell-molting. Odd, certainly, but currently impossible in the CC.
I finally got to write why you shouldn't post any creature to sporepedia, http://www.rouli.net/2008/07/sporepedia-leaks-email-addresses.html.
In short, sporepedia expose your email address, so watch out.
On the other hand, sporepedia also exposes some of the fields one may find in the encoded creature: But I guess no one is watching this thread and I'm posting this in vain.
Anyhow, I found out that some guys at Something Awful have reverse engineered Spore creature creator and found out how creatures are coded within the .png files (well, at least most of it).
It turns out that the non LSB bits in the picture are used as an encryption key for the (compressed) creature's model. Moreover, the bits are permutated, just to make things a lot more fun.
To those inclined, I've posted a simple walkthrough on the encoding procedure with some of the open problems still ahead.
I would very much like to continue the discussion!
Add a comment: