Generating Vanity Infohashes for Torrents


Reading time: about 3 minutes

In the world of Bittorrent, each torrent is identified by an infohash. It is basically the SHA1 hash of the torrent metadata that tells you about the files. And people, when confronted with something that’s supposed to be random, like to control it to some degree. You can see this behaviour in lots of different places online. People try to generate special Bitcoin wallets, Tor services with their nick or 4chan tripcodes that look cool. These are all done by repeatedly generating the hash until you find a result that you like. We can do the exact same thing with torrents as well.

The structure of torrent files

Before we start tweaking our infohash, let’s talk about torrent files first. A torrent file is a bencoded dictionary. It contains information about the files, their names, how large they are and hashes for each piece. This is stored in the info section of the dictionary. The rest of the dictionary includes a list of trackers, the file comment, the creation date and other optional metadata. The infohash is quite literally the SHA1 hash of the info section of the torrent. Any modification to the file contents changes the infohash, while changing the other metadata doesn’t.

This gives us two ways of affecting the hash without touching the file contents. The first one is adding a separate key called vanity and changing the value of it. While this would be really flexible and cause the least change that the use can see, it adds a non-standard key to our dictionary. Fortunately, torrent files are supposed to be flexible and handle unknown keys gracefully.

The other thing we can do is to add a prefix to the file name. This should keep everything intact aside from a random value in front of our filename.

Parsing the torrent file

First of all, let’s read our torrent file and parse it. For this purpose, I’m using the bencoder module.

import bencoder

target = 'arch-linux.torrent'
with open(target, 'rb') as torrent_file:
    torrent = bencoder.decode(torrent_file.read())

Calculating the infohash

The infohash is the hash of the info section of the file. Let’s write a function to calculate that. We also encode the binary of the hash with base 32 to bring it to the infohash format.

import hashlib
import base64

def get_infohash(torrent):
    encoded = bencoder.encode(torrent[b'info'])
    sha1 = hashlib.sha1(encoded).hexdigest()
    return sha1

Prefixing the name

Let’s do the method with prefixing the name first. We will start from 0 and keep incrementing the name prefix until the infohash starts with cafe.

original_name = torrent[b'info'][b'name'].decode('utf-8')

vanity = 0
while True:
    torrent[b'info'][b'name'] = '{}-{}'.format(vanity, original_name)
    if get_infohash(torrent).startswith('cafe'):
        print(vanity, get_infohash(torrent))
        break
    vanity += 1

This code will increment our vanity number in a loop and print it and the respective infohash when it finds a suitable one.

Adding a separate key to the info section

While the previous section works well, it still causes a change that is visible to the user. Let’s work around that by modifying the data in a bogus key called vanity.

vanity = 0
while True:
    torrent[b'info'][b'vanity'] = str(vanity)
    if get_infohash(torrent).startswith('cafe'):
        print(vanity, get_infohash(torrent))
        break
    vanity += 1

Saving the modified torrent files

While it is possible to do the modification to the file yourself, why not go all the way and save the modified torrent file as well? Let’s write a function to save a given torrent file.

def save_torrent(torrent, name):
    with open(name, 'wb+') as torrent_file:
        torrent_file.write(bencoder.encode(torrent))

You can use this function after finding an infohash that you like.

Cool ideas for infohash criteria

  • Release groups can prefix their infohashes with their name/something unique to them
  • Finding smaller infohashes - should slowly accumulate 0’s in the beginning
  • Infohashes with the least entropy - should make them easier to remember
  • Infohashes with the most digits
  • Infohashes with no digits

The following pages link here

Citation

If you find this work useful, please cite it as:
@article{yaltirakli201804generatingvanityinfohashesfortorrents,
  title   = "Generating Vanity Infohashes for Torrents",
  author  = "Yaltirakli, Gokberk",
  journal = "gkbrk.com",
  year    = "2018",
  url     = "https://www.gkbrk.com/2018/04/generating-vanity-infohashes-for-torrents/"
}
Not using BibTeX? Click here for more citation styles.
IEEE Citation
Gokberk Yaltirakli, "Generating Vanity Infohashes for Torrents", April, 2018. [Online]. Available: https://www.gkbrk.com/2018/04/generating-vanity-infohashes-for-torrents/. [Accessed Oct. 19, 2024].
APA Style
Yaltirakli, G. (2018, April 06). Generating Vanity Infohashes for Torrents. https://www.gkbrk.com/2018/04/generating-vanity-infohashes-for-torrents/
Bluebook Style
Gokberk Yaltirakli, Generating Vanity Infohashes for Torrents, GKBRK.COM (Apr. 06, 2018), https://www.gkbrk.com/2018/04/generating-vanity-infohashes-for-torrents/

Comments

Comment by rubdos
2018-04-06 at 22:29
Spam probability: 0.123%

@TorrentWizard You can always do this trick on the hex encode. That even halves the amount of bytes! (Also reduces the amount of possible vanity infohashes, but yeh)

Comment by TorrentWizard
2018-04-06 at 18:47
Spam probability: 0.01%

This is not a good idea. AT ALL. In the BitTorrent protocol there is an assumption that infohashes are uniformly distributed. When a peer announces its participation in a swarm to the DHT, it does that to the eight nodes with a node_id closest to the infohash. This means that vanity infohashes that has the 24 first bits (3 characters) or more in common will be announced to she same eight nodes. This creates an hot spot in the DHT. Some nodes will refuse to accept more than one announce per IP as DoS protection. It also makes it easier to do a eclipse attack on those infohashes. Some software as clients and trackers may be negatively impacted by nonuniformly distributed infohashes. Also: It's pointless, as a normal user almost always sees the infohash being hex-encoded and not in the raw binary form were the vanity text is exposed.

© 2024 Gokberk Yaltirakli