The twtxt Protocol

Document ID: 2be39d96c164423883fe234f4774d067
Last Update: 2020-10-22

Abstract

    This file documents twtxt; an HTTP-based protocol and file format that can
    be used for federated social media communications.

1. Introduction

    twtxt is a simple federated micro blogging protocol built on HTTP. It works
    with a plain text file, usually named twtxt.txt.

2. File Format

    The file used to exchange twtxt posts is a newline delimited plain text
    file. The file MUST be encoded as a UTF-8 unicode file. When you are
    encoding a file, you MUST use a line feed (\n) to separate the lines. When
    consuming files generated by others, an implementation MAY consider other
    delimeters such as CRLF.

    When generating the file, the lines SHOULD not have trailing whitespace.
    For the consumer, these SHOULD be ignored. The file SHOULD be generated with
    ordered lines, going from least recent to most recent. Consumers MUST handle
    files with arbitrary orderings.

2.1. Comments

    If a line starts with a # sign, it is considered a comment and MUST be
    ignored. These lines can be completely informational (and useless to
    machines), or they can include human / machine readable metadata. If the
    client can parse it, it CAN use the metadata.

    When generating the file, comments SHOULD include one ASCII space after the
    # sign. If the comment is empty, the space can be omitted.

2.2. Posts

    A non-comment line is a post. Post lines MUST be separated into two parts by
    a horizontal ASCII TAB.

    The first part is the datetime that the message was posted in. It must be
    in the format specified by RFC 3339.

    The second part of the TAB splitted message, until the newline, is the
    message content.

3. Protocol

    A client implementing twtxt MUST support the HTTP protocol. It MAY choose to
    support additional file-transfer protocols (such as IPFS or Gopher); but
    those SHOULD be open specifications, ideally with multiple implementations.

3.1. HTTP

3.1.1. Server

    When serving requests, the server SHOULD NOT depend on ony headers being
    present other than Host. It MUST be able to serve clients via HTTP 1.1, and
    SHOULD try to serve them over 1.0 and 2.0 as well.

3.1.2. Client

    The client MUST send a user-agent header. This header MUST include the name
    of the program and SHOULD include a version number. If the user also
    publishes their posts, the client SHOULD include the nickname and twtxt URL
    of the user in the user-agent header.

    The User-Agent header SHOULD be in the format `$NAME/$VERSION (+$URL;
    @$NICK)`. Here's an example User-Agent header:

    twtxt/1.2.3 (+https://example.com/twtxt.txt; @leo)

3.2. HTTPS

3.2.1. Client

    A client SHOULD support fetching twtxt feeds over HTTPS. It MAY pick its own
    method of accepting and rejecting certificates and ciphers. A client CAN use
    system certificate stores, or it CAN choose another method such as TOFU
    (Trust On First Use).

3.3. Gemini

    Gemini is a new network protocol, similar to Gopher. The Gemini ecosystem
    and community are likely to overlap with the twtxt ones. A server CAN
    support serving twtxt files over Gemini. The file contents MUST not be any
    different than the other transport protocols.

3.3. Other protocols

    * Gopher
    * IPFS

4. Feed Metadata

    A feed CAN include metadata in the form of comments. The basic format of a
    metadata comment is

    # key = value

    When generating the file, the server SHOULD format it exactly as above. A
    client MAY consider extra whitespace insignificant and ignore it.

    There are various pieces of common metadata that is found on existing twtxt
    feeds. Some of these have gained enough usage to be considered official.

4.1. Official Metadata

    Below is a list of official metadata. It is recommended that every server
    produces them and every client consumes them.

4.1.1. nick

    nick is used to show to preferred nickname of the user. Since one user
    mentioning another can write their nick as they wish, this piece of metadata
    provides the opportunity to publish a correct one.

    A twtxt feed MAY include this field. It MUST NOT include multiple instances,
    and a client consuming a feed SHOULD use the last one.

4.1.2. url

    url is used to announce the canonical URL of a twtxt feed. If the client
    fetches a feed and finds a different URL, it SHOULD update to the canonical
    URL.

    A twtxt feed MAY include this field. In case there are multiple, all of them
    can be considered valid and the client may choose one arbitrarily.

5. Discovery Methods

    There are various methods to discover users on the twtxt network.

5.1. Mentions on posts

    When a user that you follow mentiones another user in their post, they will
    include their nickname and the URL of their feed in the message. Using this,
    it is possible to extend your network and find more people to follow.

5.2. User-Agent strings

    If you have access to server logs, twtxt clients that follow you CAN include
    the nickname and feed URL of their user in the User-Agent header of HTTP
    requests.

5.3. Third-party registries

    Aside from the previously mentioned methods, there are also third-party
    registries that can be utilized for discovery. While those registries might
    be useful, they usually employ some anti-patterns that end up centralizing
    the decentralized twtxt.