|
| 1 | +# The BitTorrent Protocol Specification |
| 2 | + |
| 3 | +**BEP:** 3 |
| 4 | +**Title:** The BitTorrent Protocol Specification |
| 5 | +**Version:** 0e08ddf84d8d3bf101cdf897fc312f2774588c9e |
| 6 | +**Last-Modified:** Sat Feb 4 12:58:40 2017 +0100 |
| 7 | +**Author:** Bram Cohen <bram@bittorrent.com> |
| 8 | +**Status:** Final |
| 9 | +**Type:** Standard |
| 10 | +**Created:** 10-Jan-2008 |
| 11 | +**Post-History:** 24-Jun-2009 (arvid@bittorrent.com), clarified the encoding of strings in torrent files. 20-Oct-2012 ( |
| 12 | +arvid@bittorrent.com), clarified that info-hash is the digest of en bencoding found in .torrent file. Introduced some |
| 13 | +references to new BEPs and cleaned up formatting. 11-Oct-2013 (arvid@bittorrent.com), correct the accepted and de-facto |
| 14 | +sizes for request messages 04-Feb-2017 (the8472.bep@infinite-source.de), further info-hash clarifications, added |
| 15 | +resources for new implementors |
| 16 | + |
| 17 | +--- |
| 18 | + |
| 19 | +BitTorrent is a protocol for distributing files. It identifies content by URL and is designed to integrate seamlessly |
| 20 | +with the web. Its advantage over plain HTTP is that when multiple downloads of the same file happen concurrently, the |
| 21 | +downloaders upload to each other, making it possible for the file source to support very large numbers of downloaders |
| 22 | +with only a modest increase in its load. |
| 23 | + |
| 24 | +## A BitTorrent file distribution consists of these entities: |
| 25 | + |
| 26 | +- An ordinary web server |
| 27 | +- A static 'metainfo' file |
| 28 | +- A BitTorrent tracker |
| 29 | +- An 'original' downloader |
| 30 | +- The end user web browsers |
| 31 | +- The end user downloaders |
| 32 | + |
| 33 | +There are ideally many end users for a single file. |
| 34 | + |
| 35 | +## To start serving, a host goes through the following steps: |
| 36 | + |
| 37 | +1. Start running a tracker (or, more likely, have one running already). |
| 38 | +2. Start running an ordinary web server, such as apache, or have one already. |
| 39 | +3. Associate the extension .torrent with mimetype application/x-bittorrent on their web server (or have done so |
| 40 | + already). |
| 41 | +4. Generate a metainfo (.torrent) file using the complete file to be served and the URL of the tracker. |
| 42 | +5. Put the metainfo file on the web server. |
| 43 | +6. Link to the metainfo (.torrent) file from some other web page. |
| 44 | +7. Start a downloader which already has the complete file (the 'origin'). |
| 45 | +8. To start downloading, a user does the following: |
| 46 | +9. Install BitTorrent (or have done so already). |
| 47 | +10. Surf the web. |
| 48 | +11. Click on a link to a .torrent file. |
| 49 | +12. Select where to save the file locally, or select a partial download to resume. |
| 50 | +13. Wait for download to complete. |
| 51 | +14. Tell downloader to exit (it keeps uploading until this happens). |
| 52 | + |
| 53 | +## bencoding |
| 54 | + |
| 55 | +- Strings are length-prefixed base ten followed by a colon and the string. For example `4:spam` corresponds to 'spam'. |
| 56 | +- Integers are represented by an 'i' followed by the number in base 10 followed by an 'e'. For example `i3e` corresponds |
| 57 | + to 3 and `i-3e` corresponds to -3. Integers have no size limitation. `i-0e` is invalid. All encodings with a leading |
| 58 | + zero, such as `i03e`, are invalid, other than i0e, which of course corresponds to 0. |
| 59 | +- Lists are encoded as an 'l' followed by their elements (also bencoded) followed by an 'e'. For |
| 60 | + example `l4:spam4:eggse` corresponds to `['spam', 'eggs']`. |
| 61 | +- Dictionaries are encoded as a 'd' followed by a list of alternating keys and their corresponding values followed by |
| 62 | + an 'e'. For example, `d3:cow3:moo4:spam4:eggse` corresponds to `{'cow': 'moo', 'spam': 'eggs'}` and `d4:spaml1:a1:bee` |
| 63 | + corresponds to `{'spam': ['a', 'b']}`. Keys must be strings and appear in sorted order (sorted as raw strings, not |
| 64 | + alphanumerics). |
| 65 | + |
| 66 | +## metainfo files |
| 67 | + |
| 68 | +Metainfo files (also known as .torrent files) are bencoded dictionaries with the following keys: |
| 69 | + |
| 70 | +- announce |
| 71 | + |
| 72 | + The URL of the tracker. |
| 73 | + |
| 74 | +- info |
| 75 | + |
| 76 | + This maps to a dictionary, with keys described below. |
| 77 | + |
| 78 | +All strings in a .torrent file that contains text must be UTF-8 encoded. |
| 79 | + |
| 80 | +### info dictionary |
| 81 | + |
| 82 | +The `name` key maps to a UTF-8 encoded string which is the suggested name to save the file (or directory) as. It is |
| 83 | +purely advisory. |
| 84 | + |
| 85 | +`piece length` maps to the number of bytes in each piece the file is split into. For the purposes of transfer, files are |
| 86 | +split into fixed-size pieces which are all the same length except for possibly the last one which may be |
| 87 | +truncated. `piece length` is almost always a power of two, most commonly 2 18 = 256 K (BitTorrent prior to version 3.2 |
| 88 | +uses 2 20 = 1 M as default). |
| 89 | + |
| 90 | +`pieces maps` to a string whose length is a multiple of 20. It is to be subdivided into strings of length 20, each of |
| 91 | +which is the SHA1 hash of the piece at the corresponding index. |
| 92 | + |
| 93 | +There is also a key `length` or a key `files`, but not both or neither. If `length` is present then the download |
| 94 | +represents a single file, otherwise it represents a set of files which go in a directory structure. |
| 95 | + |
| 96 | +In the single file case, `length` maps to the length of the file in bytes. |
| 97 | + |
| 98 | +For the purposes of the other keys, the multi-file case is treated as only having a single file by concatenating the |
| 99 | +files in the order they appear in the files list. The files list is the value `files` maps to, and is a list of |
| 100 | +dictionaries containing the following keys: |
| 101 | + |
| 102 | +`length` - The length of the file, in bytes. |
| 103 | + |
| 104 | +`path` - A list of UTF-8 encoded strings corresponding to subdirectory names, the last of which is the actual file |
| 105 | +name (a zero length list is an error case). |
| 106 | + |
| 107 | +In the single file case, the name key is the name of a file, in the muliple file case, it's the name of a directory. |
| 108 | + |
| 109 | +## trackers |
| 110 | + |
| 111 | +Tracker GET requests have the following keys: |
| 112 | + |
| 113 | +- info_hash |
| 114 | + |
| 115 | + The 20 byte sha1 hash of the bencoded form of the info value from the metainfo file. This value will almost certainly |
| 116 | + have to be escaped. |
| 117 | + |
| 118 | + Note that this is a substring of the metainfo file. The info-hash must be the hash of the encoded form as found in the |
| 119 | + .torrent file, which is identical to bdecoding the metainfo file, extracting the info dictionary and encoding it if |
| 120 | + and only if the bdecoder fully validated the input (e.g. key ordering, absence of leading zeros). Conversely that |
| 121 | + means clients must either reject invalid metainfo files or extract the substring directly. They must not perform a |
| 122 | + decode-encode roundtrip on invalid data. |
| 123 | + |
| 124 | +- peer_id |
| 125 | + |
| 126 | + A string of length 20 which this downloader uses as its id. Each downloader generates its own id at random at the |
| 127 | + start of a new download. This value will also almost certainly have to be escaped. |
| 128 | + |
| 129 | +- ip |
| 130 | + |
| 131 | + An optional parameter giving the IP (or dns name) which this peer is at. Generally used for the origin if it's on the |
| 132 | + same machine as the tracker. |
| 133 | + |
| 134 | +- port |
| 135 | + |
| 136 | + The port number this peer is listening on. Common behavior is for a downloader to try to listen on port 6881 and if |
| 137 | + that port is taken try 6882, then 6883, etc. and give up after 6889. |
| 138 | + |
| 139 | +- uploaded |
| 140 | + |
| 141 | + The total amount uploaded so far, encoded in base ten ascii. |
| 142 | + |
| 143 | +- downloaded |
| 144 | + |
| 145 | + The total amount downloaded so far, encoded in base ten ascii. |
| 146 | + |
| 147 | +- left |
| 148 | + |
| 149 | + The number of bytes this peer still has to download, encoded in base ten ascii. Note that this can't be computed from |
| 150 | + downloaded and the file length since it might be a resume, and there's a chance that some of the downloaded data |
| 151 | + failed an integrity check and had to be re-downloaded. |
| 152 | + |
| 153 | +- event |
| 154 | + |
| 155 | + This is an optional key which maps to started, completed, or stopped (or empty, which is the same as not being |
| 156 | + present). If not present, this is one of the announcements done at regular intervals. An announcement using started is |
| 157 | + sent when a download first begins, and one using completed is sent when the download is complete. No completed is sent |
| 158 | + if the file was complete when started. Downloaders send an announcement using stopped when they cease downloading. |
| 159 | + |
| 160 | +Tracker responses are bencoded dictionaries. If a tracker response has a key `failure reason`, then that maps to a human |
| 161 | +readable string which explains why the query failed, and no other keys are required. Otherwise, it must have two |
| 162 | +keys: `interval`, which maps to the number of seconds the downloader should wait between regular rerequests, |
| 163 | +and `peers`. `peers` maps to a list of dictionaries corresponding to `peers`, each of which contains the keys |
| 164 | +peer `id`, `ip`, and `port`, which map to the peer's self-selected ID, IP address or dns name as a string, and port |
| 165 | +number, respectively. Note that downloaders may rerequest on nonscheduled times if an event happens or they need more |
| 166 | +peers. |
| 167 | + |
| 168 | +More commonly is that trackers return a compact representation of the peer list, |
| 169 | +see [BEP 23](https://www.bittorrent.org/beps/bep_0023.html). |
| 170 | + |
| 171 | +If you want to make any extensions to metainfo files or tracker queries, please coordinate with Bram Cohen to make sure |
| 172 | +that all extensions are done compatibly. |
| 173 | + |
| 174 | +It is common to announce over a [UDP tracker protocol](https://www.bittorrent.org/beps/bep_0015.html) as well. |
| 175 | + |
| 176 | +## peer protocol |
| 177 | + |
| 178 | +BitTorrent's peer protocol operates over TCP or [uTP](https://www.bittorrent.org/beps/bep_0029.html). |
| 179 | + |
| 180 | +Peer connections are symmetrical. Messages sent in both directions look the same, and data can flow in either direction. |
| 181 | + |
| 182 | +The peer protocol refers to pieces of the file by index as described in the metainfo file, starting at zero. When a peer |
| 183 | +finishes downloading a piece and checks that the hash matches, it announces that it has that piece to all of its peers. |
| 184 | + |
| 185 | +Connections contain two bits of state on either end: choked or not, and interested or not. Choking is a notification |
| 186 | +that no data will be sent until unchoking happens. The reasoning and common techniques behind choking are explained |
| 187 | +later in this document. |
| 188 | + |
| 189 | +Data transfer takes place whenever one side is interested and the other side is not choking. Interest state must be kept |
| 190 | +up to date at all times - whenever a downloader doesn't have something they currently would ask a peer for in unchoked, |
| 191 | +they must express lack of interest, despite being choked. Implementing this properly is tricky, but makes it possible |
| 192 | +for downloaders to know which peers will start downloading immediately if unchoked. |
| 193 | + |
| 194 | +Connections start out choked and not interested. |
| 195 | + |
| 196 | +When data is being transferred, downloaders should keep several piece requests queued up at once in order to get good |
| 197 | +TCP performance (this is called 'pipelining'.) On the other side, requests which can't be written out to the TCP buffer |
| 198 | +immediately should be queued up in memory rather than kept in an application-level network buffer, so they can all be |
| 199 | +thrown out when a choke happens. |
| 200 | + |
| 201 | +The peer wire protocol consists of a handshake followed by a never-ending stream of length-prefixed messages. The |
| 202 | +handshake starts with character ninteen (decimal) followed by the string 'BitTorrent protocol'. The leading character is |
| 203 | +a length prefix, put there in the hope that other new protocols may do the same and thus be trivially distinguishable |
| 204 | +from each other. |
| 205 | + |
| 206 | +All later integers sent in the protocol are encoded as four bytes big-endian. |
| 207 | + |
| 208 | +After the fixed headers come eight reserved bytes, which are all zero in all current implementations. If you wish to |
| 209 | +extend the protocol using these bytes, please coordinate with Bram Cohen to make sure all extensions are done |
| 210 | +compatibly. |
| 211 | + |
| 212 | +Next comes the 20 byte sha1 hash of the bencoded form of the info value from the metainfo file. (This is the same value |
| 213 | +which is announced as info_hash to the tracker, only here it's raw instead of quoted here). If both sides don't send the |
| 214 | +same value, they sever the connection. The one possible exception is if a downloader wants to do multiple downloads over |
| 215 | +a single port, they may wait for incoming connections to give a download hash first, and respond with the same one if |
| 216 | +it's in their list. |
| 217 | + |
| 218 | +After the download hash comes the 20-byte peer id which is reported in tracker requests and contained in peer lists in |
| 219 | +tracker responses. If the receiving side's peer id doesn't match the one the initiating side expects, it severs the |
| 220 | +connection. |
| 221 | + |
| 222 | +That's it for handshaking, next comes an alternating stream of length prefixes and messages. Messages of length zero are |
| 223 | +keepalives, and ignored. Keepalives are generally sent once every two minutes, but note that timeouts can be done much |
| 224 | +more quickly when data is expected. |
| 225 | + |
| 226 | +## peer messages |
| 227 | + |
| 228 | +All non-keepalive messages start with a single byte which gives their type. |
| 229 | + |
| 230 | +The possible values are: |
| 231 | + |
| 232 | +- 0 - choke |
| 233 | +- 1 - unchoke |
| 234 | +- 2 - interested |
| 235 | +- 3 - not interested |
| 236 | +- 4 - have |
| 237 | +- 5 - bitfield |
| 238 | +- 6 - request |
| 239 | +- 7 - piece |
| 240 | +- 8 - cancel |
| 241 | + |
| 242 | +'choke', 'unchoke', 'interested', and 'not interested' have no payload. |
| 243 | + |
| 244 | +'bitfield' is only ever sent as the first message. Its payload is a bitfield with each index that downloader has sent |
| 245 | +set to one and the rest set to zero. Downloaders which don't have anything yet may skip the 'bitfield' message. The |
| 246 | +first byte of the bitfield corresponds to indices 0 - 7 from high bit to low bit, respectively. The next one 8-15, etc. |
| 247 | +Spare bits at the end are set to zero. |
| 248 | + |
| 249 | +The 'have' message's payload is a single number, the index which that downloader just completed and checked the hash of. |
| 250 | + |
| 251 | +'request' messages contain an index, begin, and length. The last two are byte offsets. Length is generally a power of |
| 252 | +two unless it gets truncated by the end of the file. All current implementations use 2^14 (16 kiB), and close |
| 253 | +connections which request an amount greater than that. |
| 254 | + |
| 255 | +'cancel' messages have the same payload as request messages. They are generally only sent towards the end of a download, |
| 256 | +during what's called 'endgame mode'. When a download is almost complete, there's a tendency for the last few pieces to |
| 257 | +all be downloaded off a single hosed modem line, taking a very long time. To make sure the last few pieces come in |
| 258 | +quickly, once requests for all pieces a given downloader doesn't have yet are currently pending, it sends requests for |
| 259 | +everything to everyone it's downloading from. To keep this from becoming horribly inefficient, it sends cancels to |
| 260 | +everyone else every time a piece arrives. |
| 261 | + |
| 262 | +'piece' messages contain an index, begin, and piece. Note that they are correlated with request messages implicitly. |
| 263 | +It's possible for an unexpected piece to arrive if choke and unchoke messages are sent in quick succession and/or |
| 264 | +transfer is going very slowly. |
| 265 | + |
| 266 | +Downloaders generally download pieces in random order, which does a reasonably good job of keeping them from having a |
| 267 | +strict subset or superset of the pieces of any of their peers. |
| 268 | + |
| 269 | +Choking is done for several reasons. TCP congestion control behaves very poorly when sending over many connections at |
| 270 | +once. Also, choking lets each peer use a tit-for-tat-ish algorithm to ensure that they get a consistent download rate. |
| 271 | + |
| 272 | +The choking algorithm described below is the currently deployed one. It is very important that all new algorithms work |
| 273 | +well both in a network consisting entirely of themselves and in a network consisting mostly of this one. |
| 274 | + |
| 275 | +There are several criteria a good choking algorithm should meet. It should cap the number of simultaneous uploads for |
| 276 | +good TCP performance. It should avoid choking and unchoking quickly, known as 'fibrillation'. It should reciprocate to |
| 277 | +peers who let it download. Finally, it should try out unused connections once in a while to find out if they might be |
| 278 | +better than the currently used ones, known as optimistic unchoking. |
| 279 | + |
| 280 | +The currently deployed choking algorithm avoids fibrillation by only changing who's choked once every ten seconds. It |
| 281 | +does reciprocation and number of uploads capping by unchoking the four peers which it has the best download rates from |
| 282 | +and are interested. Peers which have a better upload rate but aren't interested get unchoked and if they become |
| 283 | +interested the worst uploader gets choked. If a downloader has a complete file, it uses its upload rate rather than its |
| 284 | +download rate to decide who to unchoke. |
| 285 | + |
| 286 | +For optimistic unchoking, at any one time there is a single peer which is unchoked regardless of its upload rate (if |
| 287 | +interested, it counts as one of the four allowed downloaders.) Which peer is optimistically unchoked rotates every 30 |
| 288 | +seconds. To give them a decent chance of getting a complete piece to upload, new connections are three times as likely |
| 289 | +to start as the current optimistic unchoke as anywhere else in the rotation. |
| 290 | + |
| 291 | +## Resources |
| 292 | + |
| 293 | +The [BitTorrent Economics Paper](http://bittorrent.org/bittorrentecon.pdf) outlines some request and choking algorithms |
| 294 | +clients should implement for optimal performance |
| 295 | +When developing a new implementation the Wireshark protocol analyzer and |
| 296 | +its [dissectors for bittorrent](https://wiki.wireshark.org/BitTorrent) can be useful to debug and compare with existing |
| 297 | +ones. |
| 298 | + |
| 299 | +## Copyright |
| 300 | + |
| 301 | +This document has been placed in the public domain. |
0 commit comments