Reverse Engineering the Godot File Format

I’ve been messing around with the Godot game engine recently. After writing some examples that load the assets and map data from files, I exported it and noticed that Godot bundled all the resources into a single .pck file. It was packing all the game resources and providing them during runtime as some sort of virtual file system.

Of course; after I was finished for the day with learning gamedev, I was curious about that file. I decided to give myself a small challenge and parse that file using only hexdump and Python. I opened the pack file with my hex editor and I was met with a mixture of binary and ASCII data. Here’s the beginning of the file from hexdump -C game.pck.

00000000  47 44 50 43 01 00 00 00  03 00 00 00 01 00 00 00  |GDPC............|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000050  00 00 00 00 1e 00 00 00  40 00 00 00 72 65 73 3a  |........@...res:|
00000060  2f 2f 2e 69 6d 70 6f 72  74 2f 61 6d 67 31 5f 66  |//.import/amg1_f|
00000070  72 31 2e 70 6e 67 2d 32  32 66 30 32 33 32 34 65  |r1.png-22f02324e|
00000080  63 62 39 34 32 34 39 37  62 61 61 31 38 30 37 33  |cb942497baa18073|
00000090  37 36 62 37 31 64 30 2e  73 74 65 78 00 3d 00 00  |76b71d0.stex.=..|
000000a0  00 00 00 00 28 04 00 00  00 00 00 00 de a0 23 30  |....(.........#0|
000000b0  cf 7c 59 5c fb 73 5d f6  a7 f8 12 a7 40 00 00 00  |.|Y\.s].....@...|
000000c0  72 65 73 3a 2f 2f 2e 69  6d 70 6f 72 74 2f 61 6d  |res://.import/am|
000000d0  67 31 5f 6c 66 31 2e 70  6e 67 2d 62 33 35 35 35  |g1_lf1.png-b3555|
000000e0  34 66 64 31 39 36 37 64  65 31 65 62 32 63 64 31  |4fd1967de1eb2cd1|
000000f0  32 33 32 65 32 31 38 33  33 30 32 2e 73 74 65 78  |232e2183302.stex|

Immediately we can see that the file starts with some magic bytes, and before our ASCII filename begins there’s a lot of zeros. They might be keeping that space as a padding or for future extensions, or maybe it can even contain data depending on your settings, but for our immediate goal it doesn’t matter.

What looks interesting here is the two integers right before our path. 1e 00 00 00 and 40 00 00 00, probably lengths or counts since they are before real data. Saying these two are little-endian unsigned integers would be a good assumption, because otherwise they would be huge numbers that have no business being the length of anything.

The first number is 30 and the second one is 64. Now, what’s the name of that file? res://.import/amg1_fr1.png-22f02324ecb942497baa1807376b71d0.stex. Exactly 64 bytes. That means we now know that the paths are prefixed by their length.

If we look at the next path, we can see that a similar pattern of being length prefixed still applies. The first integer we found, 30, is most likely the number of files we have. And a rough eyeballing of the file contents reveals that to be the case.

Let’s get a little Python here and try to read the file with our knowledge so far. We’ll read the first integer and loop through all the files, trying to print their names.

pack = open('game.pck', 'rb')
pack.read(0x54) # Skip the empty padding

file_count = struct.unpack('<I', pack.read(4))[0]

name_len = struct.unpack('<I', pack.read(4))[0]
name = pack.read(name_len).decode('utf-8')

print(f'The first file is {name}')

Running this code produces the following output, success!

The first file is res://.import/amg1_fr1.png-22f02324ecb942497baa1807376b71d0.stex

Now let’s try to loop file_count times and see if our results are good. One thing we should notice is the data following the ASCII text, before the new one begins. If we miss that, we will read the rest of the data wrong and end up with garbage. Let’s go back to our hexdump and count how many bytes we need to skip until the next length. Looks like we have 32 extra bytes. Let’s account for those and print everything.

for i in range(file_count):
  name_len = struct.unpack('<I', pack.read(4))[0]
  name = pack.read(name_len)
  pack.read(32)
  print(name)
...
b'res://MapLoader.gd.remap'
b'res://MapLoader.gdc\x00'
b'res://MapLoader.tscn'
b'res://Player.gd.remap\x00\x00\x00'
b'res://Player.gdc'
...

Much better, the only issue is the trailing null bytes on some file names. This shouldn’t be a huge issue though, these are probably random padding and they don’t even matter if we consider the strings null-terminated. Let’s just get rid of trailing null bytes.

name = pack.read(name_len).rstrip(b'\x00').decode('utf-8')

After this change, we can get a list of all the resource files in a Godot pack file.

Getting the file contents

Sure, getting the list of files contained in the file is useful. But it’s not super helpful if our tool can’t get the file contents as well.

The file contents are stored separately from the file names. The thing we parsed so far is only the file index, like a table of contents. It’s useful to have it that way so when the game needs a resource at the end of the file, the game engine won’t have to scan the whole file to get there.

But how can we get find where the contents are without going through the whole thing? With offsets of course. Every entry we read from the index, along with the file name, contains the offset and the size of the file. It’s not the easiest thing to explain how you discover something after the fact, but it’s a combination of being familiar with other file formats and a bunch of guesswork. Now I’d like to direct your attention to the 32 bytes we skipped earlier.

Since we already have the file names, we can make assumptions like text files being smaller than other resources like images and sprites. This can be made even easier by putting files with known lengths there, but just for the sake of a challenge let’s pretend that we can’t create these files.

After each guess, we can easily verify this by checking with hexdump or plugging the new logic in to our Python script.

The 4 byte integer that follows the file name is our offset. If we go to the beginning of the file and count that many bytes, we should end up where our file contents begin.

offset = struct.unpack('<I', pack.read(4))[0]

This is followed by 4 empty bytes and then another integer which is our size. Those 4 bytes might be used for other purposes, but again they are irrelevant for our goal.

pack.read(4)
size = struct.unpack('<I', pack.read(4))[0]

The offset is where the file contents began, and the size is how many bytes it is. So the range between offset and offset + size is the file contents. And because we ended up reading 12 bytes from the file, we should make our padding 20 instead of 32.

Reading a File

To finish up, let’s read the map file I put in my game. It’s a plaintext file with JSON contents, so it should be easy to see if everything looks complete.

pack = open('game.pck', 'rb')
pack.read(0x54) # Skip the empty padding

file_count = struct.unpack('<I', pack.read(4))[0]

for i in range(file_count):
  name_len = struct.unpack('<I', pack.read(4))[0]
  name = pack.read(name_len).rstrip(b'\x00').decode('utf-8')

  offset = struct.unpack('<I', pack.read(4))[0]
  pack.read(4)
  size = struct.unpack('<I', pack.read(4))[0]
  pack.read(20)

  print(name)

  if name == 'res://maps/map01.tres':
    pack.seek(offset)
    content = pack.read(size).decode('utf-8')
    print(content)
    break
...
res://Player.gd.remap
res://Player.gdc
res://Player.tscn
res://block.gd.remap
res://block.gdc
res://block.tscn
res://default_env.tres
res://icon.png
res://icon.png.import
res://maps/map01.tres
{
    "spawn_point": {"x": 5, "y": 3},
    "blocks": [
        {"x": 5, "y": 5, "msg": "Welcome to Mr Jumpy Man", "jumpLimit": 0},
        {"x": 7, "y": 5, "msg": "We're still waiting on the trademark"},
        {"x": 9, "y": 5, "texture": "platformIndustrial_001.png"},
        {"x": 11, "y": 6, "msg": "If you fall down,\nyou will be teleported to the start point"},
...

Everything looks good! Using this code, we should be able to extract the resources of Godot games.

Finishing words

Now; before the angry comments section gets all disappointed, let me explain. I am fully aware that Godot is open source. Yes, I could’ve looked at the code to know exactly how it works. No, that wouldn’t be as fun. Kthxbye.



Comments BETA





Page built on Fri 14 Jun 2019 12:36:03 AM BST