About the WHOIS Protocol


Tags: networking
Reading time: about 4 minutes

If you are a web developer, chances are you have used whois before. WHOIS allows you to retrieve basic information about a domain such as when it was registered, when it will expire and the contact information of the owner. There are lots of websites and command line tools that allow you to query this information, but they all use the same protocol in the background.

The WHOIS protocol is a simple, plaintext-based protocol that listens on TCP port 43. There is an RFC that defines the protocol, RFC 3912, but it doesn’t give useful information regarding how WHOIS works. I got all the information about this protocol by running the whois command and inspecting the data using Wireshark.

One problematic aspect of the WHOIS protocol is that the responses are designed to be human-readable rather than machine-readable. Thankfully the information we need to extract usually follow a Header name: Header data format. You should split at the : and turn the header name into lowercase when you are looking for a specific header.

The protocol

WHOIS requests need to be terminated with a carriage return + line feed (\r\n).

  • Connect to whois.iana.org. Send the TLD, followed by a newline. (e.g. Send “com” + “\r\n”)
  • A bunch of data about that TLD will be sent. The WHOIS server responsible for that TLD will be sent in a header called whois.
  • Connect to that server and send the full domain name followed by a newline. (e.g. Send “example.com” + “\r\n”)
    • The response data you get from this server is the WHOIS data, but there’s usually more data you can get from another server.
    • This server’s address is sent to you in a header called whois server. Send the request and get the response in the same way as the first server.

If you are going to be doing a large number of queries, you should probably cache your requests to whois.iana.org in order to keep their traffic low.

Implementation in Python

Let’s make a funtion in Python to get the WHOIS server responsible for a Top-Level domain.

def get_tld_server(tld="com"):
    sock = socket.socket()
    sock.connect(("whois.iana.org", 43))
    sock.send("{}\n".format(tld).encode("utf-8"))
    for line in sock.makefile():
        parts = line.split(":", 2)
        if len(parts) > 1:
            header_name = parts[0].strip()
            header_value = parts[1].strip()
            if header_name.lower() == "whois":
                return header_value

This function above sends the TLD to the central WHOIS server, parses the response to find a line that looks like whois: whois.verisign-grs.com.

Now that we have a server, we can get the data like this.

def get_whois_data(domain):
    tld = domain.split(".")[-1]
    server = get_tld_server(tld)

    sock = socket.socket()
    sock.connect((server, 43))
    sock.send("{}\n".format(domain).encode("utf-8"))
    for line in sock.makefile():
        parts = line.split(":", 2)
        if len(parts) > 1:
            header_name = parts[0].strip()
            header_value = parts[1].strip()
            if header_name.lower() == "whois server":
                print(header_value)
        yield line.replace("\n", "")

This function queries the actual server and yields the lines that it receives. If there is a second server to get more data from, it prints the address to the console. Using this information, we can modify the function a little to make it query the second server automatically.

def get_whois_data(domain, server=None):
    if not server:
        tld = domain.split(".")[-1]
        server = get_tld_server(tld)

    nextserver = None

    sock = socket.socket()
    sock.connect((server, 43))
    sock.send("{}\n".format(domain).encode("utf-8"))
    for line in sock.makefile():
        parts = line.split(":", 2)
        if len(parts) > 1:
            header_name = parts[0].strip()
            header_value = parts[1].strip()
            if header_name.lower() == "whois server":
                nextserver = header_value
        yield line.replace("\n", "")

    if nextserver:
        for line in get_whois_data(domain, nextserver):
            yield line

We can now use our new get_whois_data() function like this. It should give identical output to other WHOIS utilities.

for line in get_whois_data("gkbrk.com"):
    print(line)

Thanks for reading this article about the WHOIS protocol. I hope you enjoyed it. You can find related information in these sources.

The following pages link here

Citation

If you find this work useful, please cite it as:
@article{yaltirakli201608aboutthewhoisprotocol,
  title   = "About the WHOIS Protocol",
  author  = "Yaltirakli, Gokberk",
  journal = "gkbrk.com",
  year    = "2016",
  url     = "https://www.gkbrk.com/2016/08/about-the-whois-protocol/"
}
Not using BibTeX? Click here for more citation styles.
IEEE Citation
Gokberk Yaltirakli, "About the WHOIS Protocol", August, 2016. [Online]. Available: https://www.gkbrk.com/2016/08/about-the-whois-protocol/. [Accessed Oct. 10, 2024].
APA Style
Yaltirakli, G. (2016, August 31). About the WHOIS Protocol. https://www.gkbrk.com/2016/08/about-the-whois-protocol/
Bluebook Style
Gokberk Yaltirakli, About the WHOIS Protocol, GKBRK.COM (Aug. 31, 2016), https://www.gkbrk.com/2016/08/about-the-whois-protocol/

Comments

Comment by codebje
2016-09-01 at 07:19
Spam probability: 0.807%

Or you could use RDAP (https://datatracker.ietf.org/doc/html/rfc7482), which we made to be parseable by machine, though it looks like the domain folk are a little tardy at adoption as yet. You can get reliable number registry service this way, though.

© 2024 Gokberk Yaltirakli