Due to JSON’s ubiquity, we end up reaching for JSON libraries regularly when we have a project that needs to exchange data with other systems. Whenever something becomes widespread and becomes an “infrastructure”, it turns into a black-box in people’s minds.

One reason JSON got so popular is the fact that it’s simple. It’s not the simplest solution to the problem, not by a long shot, but it’s flexible enough to solve a lot of problems without becoming too large. In this post we’ll be making a JSON serializer in Python that can serialize arbitrary nested data structures in a few lines of code. And more importantly, every part should be understandable and self-contained.

Below is the default implementation of the encoder, which fails with an “Unknown type” error. We will implement encoding for the types shared by Python and JSON, and have this error as a fallback in case people try to encode something weird.

In [2]:
from functools import singledispatch

@singledispatch
def encode(value):
    raise Exception(f"Unknown type: {type(value)}")

Serializing None

This is probably the easiest type to encode, as it involves no logic whatsoever. JSON has a null type that represents a value that is intentionally missing. The None keyword in Python is the equivalent. When we see a value of NoneType, we should encode it as the 4-bytes null.

In [3]:
@encode.register(type(None))
def encode_none(_):
    return "null"
In [4]:
print(encode(None))
Out:
null

Serializing booleans

The boolean values True and False in Python can be encoded as true and false respectively. This is also pretty straightforward.

In [5]:
@encode.register(bool)
def encode_bool(value):
    if value:
        return "true"
    return "false"
In [6]:
print(encode(True))
print(encode(False))
Out:
true
false

Serializing numbers

The way Python formats numbers by default is actually suitable for JSON. So encoding int’s and float’s is pretty easy, just call str(x) on the value.

In [7]:
@encode.register(int)
@encode.register(float)
def encode_number(val):
    return str(val)
In [8]:
print(encode(42))
print(encode(123.45))
Out:
42
123.45

Serializing strings

Strings are a list of characters enclosed in double quotes ("). Aside from some special characters, everything else can be encoded as-is. The characters we’ll escape are double-quotes and newlines.

In [9]:
@encode.register(str)
def encode_str(val):
    result = '"'
    for c in val:
        if c == '"':
            result += r'\"'
        elif c == '\n':
            result += r'\n'
        else:
            result += c
    result += '"'
    return result
In [10]:
print(encode("Hello, world!"))
print(encode('Hello "World"'))
print(encode("""Hello
world!"""))
Out:
"Hello, world!"
"Hello \"World\""
"Hello\nworld!"

Serializing lists

Lists in JSON are comma-separated values enclosed in [ and ]. An empty list becomes [], a list with one element is encoded as [42], and a list with two elements can be written as [42, 43]. Of course, you are not limited to only integers. Any JSON type, including other lists and dictionaries, can be a list member.

In [11]:
@encode.register(list)
def encode_list(l):
    vals = ','.join(map(encode, l))
    return f"[{vals}]"
In [12]:
print(encode([]))
print(encode([1]))
print(encode([1, 2, 3]))
print(encode([1, 2.0, "3"]))
print(encode([1, 2, [3, 4], 5, 6, False, True]))
Out:
[]
[1]
[1,2,3]
[1,2.0,"3"]
[1,2,[3,4],5,6,false,true]

Serializing dictionaries

Dictionaries are similar to lists, but instead of a list of values, we have a list of key-value pairs. Before we get to the dictionary part, let’s start by encoding key-value pairs in the suitable format.

In JSON, key value pairs are formatted as key:value.

In [13]:
def encode_key_value(t):
    k, v = t
    return f"{encode(k)}:{encode(v)}"
In [14]:
print(encode_key_value((1, 2)))
print(encode_key_value(("key", 1234)))
print(encode_key_value(("this is a list", [1, 2, 3])))
Out:
1:2
"key":1234
"this is a list":[1,2,3]

Looks good so far. To turn key-value pairs into a dictionary, we encode them the same way as the list, but instead of wrapping them in [ and ], we wrap them in { and }.

In [15]:
@encode.register(dict)
def encode_dict(d):
    vals = ','.join(map(encode_key_value, d.items()))
    return '{' + vals + '}'
In [16]:
print(encode({"Name": "Leo", "BirthYear": 1998, "Website": "www.gkbrk.com"}))
Out:
{"Name":"Leo","BirthYear":1998,"Website":"www.gkbrk.com"}