Encryption is tricky to get right. Because some data and communications might be very sensitive or even life-critical for people, beginners are often - and quite rudely- shunned away from playing around with how it works. But if people don’t learn by making bad ciphers, they will have difficulty understanding why the good ones are good.

Software is usually one of the most welcoming fields for beginners, but encryption seems to be the one thing that can’t be experimented with.

Let’s ignore all that and have some fun, after all encryption is the perfect project for recreational programming. It might end up being very weak or decently strong; in either case I’ll learn something new after this is done through constructive criticism.

Here’s a description of my first-ever cipher, hope you have fun breaking it.

Meet XOR

XOR is the operation that can combine two bitstreams in a way that is reversible. By applying the same key through XOR, data can be encrypted and decrypted. Here’s a short example of how it works.

data = [1, 2, 3] # Secret data
key  = [7, 8, 4] # Completely arbitrary

# ^ is the Python operator for XOR
encrypted = [data[i] ^ key[i] for i in range(3)]

# Decode the encrypted bytes with the same key
decrypted = [encrypted[i] ^ key[i] for i in range(3)]

assert data == decrypted

This will be a useful building block of our cipher, so let’s turn it into a function. The function will be able to handle having different amounts of data and key material as well, in order to support different file sizes.

def xor_vectors(v1, v2):
    size = min(len(v1), len(v2))
    return [v1[x] ^ v2[x] for x in range(size)]

If you and the person you’re trying to communicate with shared a key that was the same size as the message you’re trying to send, and you only used that key once, you could achieve perfect encryption that would be impossible to decrypt. This is called a One-time Pad.

The problem is getting the large amount of keys to them securely. If we can generate the same random bitstream reliably, we can use it as our XOR key in order to encrypt and decrypt the file. We will address this in the next section.

Hash chains

In order to generate a random bitstream, we will use the SHA512 hash function. This takes any number of bytes, and turns them into a 64 byte output. By repeatedly feeding the hash output back into the hash function, we can get a stream of random bytes that can be used to encrypt our file.

This function implements the hashing for us.

def rng_bitstream(val):
    return hashlib.sha512(bytes(val)).digest()

Here’s a function that will read chunks from STDIN and yield 64-byte chunks of encrypted data.

def encrypt():
    last = get_entropy(64)
    yield last

    for chunk in stdin_chunks():
        xored = xor_vectors(chunk, rng_bitstream(last))
        last = xored
        yield xored

When we’re encrypting the files, we’re going to make the first 64 bytes completely random. This is what the line with get_entropy does. This will make it so that encrypting the same file with the same password twice will produce different results. It also covers for any lack of entropy from the file since the first chunk will always be completely random.

def get_entropy(length):
    """Read length bytes of randomness from /dev/random."""
    return os.getrandom(length, os.GRND_RANDOM)

os.getrandom basically does the equivalent of

with open('/dev/random', 'rb') as f:
    return f.read(length)

Decrypting the chunks

Decrypting the chunks is similar, but instead of feeding the XOR’ed chunk to the hash function, you use the original chunk.

def decrypt(chunks):
    last = next(chunks)
    for chunk in chunks:
        xored = xor_vectors(chunk, rng_bitstream(last))
        last = chunk
        yield xored

Encrypting with a password

So far; the cipher we wrote only encrypts our data with a random key prepended to the file. While that obfuscates the whole file and essentially turns it into random data, it is still possible to decrypt it easily. We’re basically creating a secret document with the key written on top of it.

In order to fix this, we will XOR all the chunks again with a hash chain. But this time instead of using entropy, we will begin the hash chain with a password.

def passkey_stream(key, stream):
    last = key.encode("utf-8")
    for chunk in stream:
        last = rng_bitstream(last)
        yield xor_vectors(chunk, last)

Regardless of the password length, the XOR key starts with 64 bytes. A weak password is easy to brute force, but the hashing operation at least makes the data look random. So without brute forcing there is no way to know anything about the password.

Putting it all together

The snippet below parses the command line arguments in order to encrypt/decrypt files.

if __name__ == "__main__":
    key = sys.argv[2]
    chunks = None
    if sys.argv[1] == 'encrypt':
        chunks = passkey_stream(key, encrypt())
    else:
        chunks = decrypt(passkey_stream(key, stdin_chunks()))
    for chunk in chunks:
        os.write(1, bytes(chunk))

The Challenge

Here’s a small challenge, I’m linking to an encrypted file. Try to get some information from this file, or break the cipher in any way. Comment, email and blog about how you went about it. Let’s have some fun discussions about homebrew DIY crypto and why it might be a good / bad idea.

Full code

You can find the full code here:

#!/usr/bin/env python3
import hashlib
import sys
import os


def get_entropy(length):
    """Read length bytes of randomness from /dev/random."""
    return os.getrandom(length, os.GRND_RANDOM)


def xor_vectors(v1, v2):
    return [v1[x] ^ v2[x] for x in range(min(len(v1), len(v2)))]


def rng_bitstream(val):
    return hashlib.sha512(bytes(val)).digest()


def stdin_chunks():
    with open("/dev/stdin", "rb") as f:
        while True:
            b = f.read(64)
            if b:
                yield b
            else:
                break


def encrypt():
    last = get_entropy(64)
    yield last

    for chunk in stdin_chunks():
        xored = xor_vectors(chunk, rng_bitstream(last))
        last = xored
        yield xored


def decrypt(chunks):
    last = next(chunks)
    for chunk in chunks:
        xored = xor_vectors(chunk, rng_bitstream(last))
        last = chunk
        yield xored


def passkey_stream(key, stream):
    last = key.encode("utf-8")
    for chunk in stream:
        last = rng_bitstream(last)
        yield xor_vectors(chunk, last)

if __name__ == "__main__":
    key = sys.argv[2]
    chunks = None
    if sys.argv[1] == 'encrypt':
        chunks = passkey_stream(key, encrypt())
    else:
        chunks = decrypt(passkey_stream(key, stdin_chunks()))
    for chunk in chunks:
        os.write(1, bytes(chunk))