I was working on a proof-of-concept implementation of a file format that uses DEFLATE compression. Since it was supposed to be a self-contained example, I didn’t want to bring in a fully-featured compressor like zlib. I skimmed the DEFLATE RFC and noticed that it supports raw / uncompressed data blocks. I wrote a simple encoder that stores uncompressed blocks to solve my problem, and wanted to document it on my blog for future reference.
Structure of a DEFLATE stream
A DEFLATE stream is made up of blocks. Each block has a 3-bit header. The first bit signifies the last block of the stream, and the other two make up the block type.
The block types are
00- no compression
01- compressed with fixed Huffman codes
10- compressed with dynamic Huffman codes
11- reserved (error)
In this post, we’re only interested in the first one (00).
Structure of an uncompressed block
In the uncompressed block, the 3-bit header is contained in a byte. A good property of the uncompressed block type being
00 is the ease of constructing the header.
- If the block is the final one, the header is
- If the block is not the final one, the header is
After the header byte, there is the length and negated length. These are both encoded as little endian uint16_t’s.
- length is the number of data bytes in the block
- negated length is one’s complement of length,
~len & 0xFFFF
After the header, length and the negated length, length bytes of data follow. If there are no more blocks after this one, the final bit is set.
Here’s a simple DEFLATE implementation in Python. The output should be a valid DEFLATE stream for real decoders.
def faux_deflate(reader, writer, bufsize=2048): while True: chunk = reader.read(bufsize) header = 0 if not chunk: header = 1 _len = len(chunk) nlen = ~_len & 0xFFFF writer.write(struct.pack("<BHH", header, _len, nlen)) writer.write(chunk) if not chunk: break
There’s also a decoder that can only decode uncompressed blocks.
def faux_inflate(reader, writer): while header := reader.read(5): header, _len, nlen = struct.unpack("<BHH", header) assert header in [0, 1] assert nlen == ~_len & 0xFFFF writer.write(reader.read(_len))