blah blah

In order to judge the distribution, I’ll use the standard deviation.

In [2]:
```statistics.stdev([5, 5, 6, 5, 5])
statistics.stdev([4, 4, 10, 4, 4])```
Out [2]:
`0.4472135954999579`
Out [2]:
`2.6832815729997477`
In [3]:
```def evaluate_hash(fn, values):
counts = [0] * 256

for val in values:
h = fn(val)
counts[h] += 1

return statistics.stdev(counts)```
In [4]:
```unkeyed_hash("Test message 1")
unkeyed_hash("Test message 2")
unkeyed_hash("Test message 3")```
Out [4]:
`168`
Out [4]:
`184`
Out [4]:
`236`

Let’s see how good Python’s built-in `hash` function is on “normal” input.

In [5]:
```values = []

for i in range(5000):
values.append(f"Test {i}")

evaluate_hash(crc32, values)```
Out [5]:
`1.4656190554459485`

Now let’s try to skew it a little by choosing the values more carefully.

In [6]:
```values = []

i = 0
while len(values) < 5000:
val = f"Test {i}"
if crc32(val) == 5:
values.append(val)
i += 1

evaluate_hash(crc32, values)```
Out [6]:
`312.5`

Wow, any data structure or partitioning that expects uniform output from a hash function would be devastated.

In [7]:
```hash1 = pearson_with_key("Quentin Coldwater")
hash2 = pearson_with_key("Josh Hoberman")

values = []

i = 0
while len(values) < 5000:
val = f"Test {i}"
if hash1(val) == 5:
values.append(val)
i += 1

evaluate_hash(hash1, values)```
Out [7]:
`312.5`

A similar result, but now let’s try a different key with the same values.

In [8]:
`evaluate_hash(hash2, values)`
Out [8]:
`4.4860349757800115`