Skip to content

Commit a0691f3

Browse files
committed
docs: add manually copied bep_0003.md
1 parent f65ae22 commit a0691f3

File tree

1 file changed

+301
-0
lines changed

1 file changed

+301
-0
lines changed

bep_0003.md

Lines changed: 301 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,301 @@
1+
# The BitTorrent Protocol Specification
2+
3+
**BEP:** 3
4+
**Title:** The BitTorrent Protocol Specification
5+
**Version:** 0e08ddf84d8d3bf101cdf897fc312f2774588c9e
6+
**Last-Modified:** Sat Feb 4 12:58:40 2017 +0100
7+
**Author:** Bram Cohen <bram@bittorrent.com>
8+
**Status:** Final
9+
**Type:** Standard
10+
**Created:** 10-Jan-2008
11+
**Post-History:** 24-Jun-2009 (arvid@bittorrent.com), clarified the encoding of strings in torrent files. 20-Oct-2012 (
12+
arvid@bittorrent.com), clarified that info-hash is the digest of en bencoding found in .torrent file. Introduced some
13+
references to new BEPs and cleaned up formatting. 11-Oct-2013 (arvid@bittorrent.com), correct the accepted and de-facto
14+
sizes for request messages 04-Feb-2017 (the8472.bep@infinite-source.de), further info-hash clarifications, added
15+
resources for new implementors
16+
17+
---
18+
19+
BitTorrent is a protocol for distributing files. It identifies content by URL and is designed to integrate seamlessly
20+
with the web. Its advantage over plain HTTP is that when multiple downloads of the same file happen concurrently, the
21+
downloaders upload to each other, making it possible for the file source to support very large numbers of downloaders
22+
with only a modest increase in its load.
23+
24+
## A BitTorrent file distribution consists of these entities:
25+
26+
- An ordinary web server
27+
- A static 'metainfo' file
28+
- A BitTorrent tracker
29+
- An 'original' downloader
30+
- The end user web browsers
31+
- The end user downloaders
32+
33+
There are ideally many end users for a single file.
34+
35+
## To start serving, a host goes through the following steps:
36+
37+
1. Start running a tracker (or, more likely, have one running already).
38+
2. Start running an ordinary web server, such as apache, or have one already.
39+
3. Associate the extension .torrent with mimetype application/x-bittorrent on their web server (or have done so
40+
already).
41+
4. Generate a metainfo (.torrent) file using the complete file to be served and the URL of the tracker.
42+
5. Put the metainfo file on the web server.
43+
6. Link to the metainfo (.torrent) file from some other web page.
44+
7. Start a downloader which already has the complete file (the 'origin').
45+
8. To start downloading, a user does the following:
46+
9. Install BitTorrent (or have done so already).
47+
10. Surf the web.
48+
11. Click on a link to a .torrent file.
49+
12. Select where to save the file locally, or select a partial download to resume.
50+
13. Wait for download to complete.
51+
14. Tell downloader to exit (it keeps uploading until this happens).
52+
53+
## bencoding
54+
55+
- Strings are length-prefixed base ten followed by a colon and the string. For example `4:spam` corresponds to 'spam'.
56+
- Integers are represented by an 'i' followed by the number in base 10 followed by an 'e'. For example `i3e` corresponds
57+
to 3 and `i-3e` corresponds to -3. Integers have no size limitation. `i-0e` is invalid. All encodings with a leading
58+
zero, such as `i03e`, are invalid, other than i0e, which of course corresponds to 0.
59+
- Lists are encoded as an 'l' followed by their elements (also bencoded) followed by an 'e'. For
60+
example `l4:spam4:eggse` corresponds to `['spam', 'eggs']`.
61+
- Dictionaries are encoded as a 'd' followed by a list of alternating keys and their corresponding values followed by
62+
an 'e'. For example, `d3:cow3:moo4:spam4:eggse` corresponds to `{'cow': 'moo', 'spam': 'eggs'}` and `d4:spaml1:a1:bee`
63+
corresponds to `{'spam': ['a', 'b']}`. Keys must be strings and appear in sorted order (sorted as raw strings, not
64+
alphanumerics).
65+
66+
## metainfo files
67+
68+
Metainfo files (also known as .torrent files) are bencoded dictionaries with the following keys:
69+
70+
- announce
71+
72+
The URL of the tracker.
73+
74+
- info
75+
76+
This maps to a dictionary, with keys described below.
77+
78+
All strings in a .torrent file that contains text must be UTF-8 encoded.
79+
80+
### info dictionary
81+
82+
The `name` key maps to a UTF-8 encoded string which is the suggested name to save the file (or directory) as. It is
83+
purely advisory.
84+
85+
`piece length` maps to the number of bytes in each piece the file is split into. For the purposes of transfer, files are
86+
split into fixed-size pieces which are all the same length except for possibly the last one which may be
87+
truncated. `piece length` is almost always a power of two, most commonly 2 18 = 256 K (BitTorrent prior to version 3.2
88+
uses 2 20 = 1 M as default).
89+
90+
`pieces maps` to a string whose length is a multiple of 20. It is to be subdivided into strings of length 20, each of
91+
which is the SHA1 hash of the piece at the corresponding index.
92+
93+
There is also a key `length` or a key `files`, but not both or neither. If `length` is present then the download
94+
represents a single file, otherwise it represents a set of files which go in a directory structure.
95+
96+
In the single file case, `length` maps to the length of the file in bytes.
97+
98+
For the purposes of the other keys, the multi-file case is treated as only having a single file by concatenating the
99+
files in the order they appear in the files list. The files list is the value `files` maps to, and is a list of
100+
dictionaries containing the following keys:
101+
102+
`length` - The length of the file, in bytes.
103+
104+
`path` - A list of UTF-8 encoded strings corresponding to subdirectory names, the last of which is the actual file
105+
name (a zero length list is an error case).
106+
107+
In the single file case, the name key is the name of a file, in the muliple file case, it's the name of a directory.
108+
109+
## trackers
110+
111+
Tracker GET requests have the following keys:
112+
113+
- info_hash
114+
115+
The 20 byte sha1 hash of the bencoded form of the info value from the metainfo file. This value will almost certainly
116+
have to be escaped.
117+
118+
Note that this is a substring of the metainfo file. The info-hash must be the hash of the encoded form as found in the
119+
.torrent file, which is identical to bdecoding the metainfo file, extracting the info dictionary and encoding it if
120+
and only if the bdecoder fully validated the input (e.g. key ordering, absence of leading zeros). Conversely that
121+
means clients must either reject invalid metainfo files or extract the substring directly. They must not perform a
122+
decode-encode roundtrip on invalid data.
123+
124+
- peer_id
125+
126+
A string of length 20 which this downloader uses as its id. Each downloader generates its own id at random at the
127+
start of a new download. This value will also almost certainly have to be escaped.
128+
129+
- ip
130+
131+
An optional parameter giving the IP (or dns name) which this peer is at. Generally used for the origin if it's on the
132+
same machine as the tracker.
133+
134+
- port
135+
136+
The port number this peer is listening on. Common behavior is for a downloader to try to listen on port 6881 and if
137+
that port is taken try 6882, then 6883, etc. and give up after 6889.
138+
139+
- uploaded
140+
141+
The total amount uploaded so far, encoded in base ten ascii.
142+
143+
- downloaded
144+
145+
The total amount downloaded so far, encoded in base ten ascii.
146+
147+
- left
148+
149+
The number of bytes this peer still has to download, encoded in base ten ascii. Note that this can't be computed from
150+
downloaded and the file length since it might be a resume, and there's a chance that some of the downloaded data
151+
failed an integrity check and had to be re-downloaded.
152+
153+
- event
154+
155+
This is an optional key which maps to started, completed, or stopped (or empty, which is the same as not being
156+
present). If not present, this is one of the announcements done at regular intervals. An announcement using started is
157+
sent when a download first begins, and one using completed is sent when the download is complete. No completed is sent
158+
if the file was complete when started. Downloaders send an announcement using stopped when they cease downloading.
159+
160+
Tracker responses are bencoded dictionaries. If a tracker response has a key `failure reason`, then that maps to a human
161+
readable string which explains why the query failed, and no other keys are required. Otherwise, it must have two
162+
keys: `interval`, which maps to the number of seconds the downloader should wait between regular rerequests,
163+
and `peers`. `peers` maps to a list of dictionaries corresponding to `peers`, each of which contains the keys
164+
peer `id`, `ip`, and `port`, which map to the peer's self-selected ID, IP address or dns name as a string, and port
165+
number, respectively. Note that downloaders may rerequest on nonscheduled times if an event happens or they need more
166+
peers.
167+
168+
More commonly is that trackers return a compact representation of the peer list,
169+
see [BEP 23](https://www.bittorrent.org/beps/bep_0023.html).
170+
171+
If you want to make any extensions to metainfo files or tracker queries, please coordinate with Bram Cohen to make sure
172+
that all extensions are done compatibly.
173+
174+
It is common to announce over a [UDP tracker protocol](https://www.bittorrent.org/beps/bep_0015.html) as well.
175+
176+
## peer protocol
177+
178+
BitTorrent's peer protocol operates over TCP or [uTP](https://www.bittorrent.org/beps/bep_0029.html).
179+
180+
Peer connections are symmetrical. Messages sent in both directions look the same, and data can flow in either direction.
181+
182+
The peer protocol refers to pieces of the file by index as described in the metainfo file, starting at zero. When a peer
183+
finishes downloading a piece and checks that the hash matches, it announces that it has that piece to all of its peers.
184+
185+
Connections contain two bits of state on either end: choked or not, and interested or not. Choking is a notification
186+
that no data will be sent until unchoking happens. The reasoning and common techniques behind choking are explained
187+
later in this document.
188+
189+
Data transfer takes place whenever one side is interested and the other side is not choking. Interest state must be kept
190+
up to date at all times - whenever a downloader doesn't have something they currently would ask a peer for in unchoked,
191+
they must express lack of interest, despite being choked. Implementing this properly is tricky, but makes it possible
192+
for downloaders to know which peers will start downloading immediately if unchoked.
193+
194+
Connections start out choked and not interested.
195+
196+
When data is being transferred, downloaders should keep several piece requests queued up at once in order to get good
197+
TCP performance (this is called 'pipelining'.) On the other side, requests which can't be written out to the TCP buffer
198+
immediately should be queued up in memory rather than kept in an application-level network buffer, so they can all be
199+
thrown out when a choke happens.
200+
201+
The peer wire protocol consists of a handshake followed by a never-ending stream of length-prefixed messages. The
202+
handshake starts with character ninteen (decimal) followed by the string 'BitTorrent protocol'. The leading character is
203+
a length prefix, put there in the hope that other new protocols may do the same and thus be trivially distinguishable
204+
from each other.
205+
206+
All later integers sent in the protocol are encoded as four bytes big-endian.
207+
208+
After the fixed headers come eight reserved bytes, which are all zero in all current implementations. If you wish to
209+
extend the protocol using these bytes, please coordinate with Bram Cohen to make sure all extensions are done
210+
compatibly.
211+
212+
Next comes the 20 byte sha1 hash of the bencoded form of the info value from the metainfo file. (This is the same value
213+
which is announced as info_hash to the tracker, only here it's raw instead of quoted here). If both sides don't send the
214+
same value, they sever the connection. The one possible exception is if a downloader wants to do multiple downloads over
215+
a single port, they may wait for incoming connections to give a download hash first, and respond with the same one if
216+
it's in their list.
217+
218+
After the download hash comes the 20-byte peer id which is reported in tracker requests and contained in peer lists in
219+
tracker responses. If the receiving side's peer id doesn't match the one the initiating side expects, it severs the
220+
connection.
221+
222+
That's it for handshaking, next comes an alternating stream of length prefixes and messages. Messages of length zero are
223+
keepalives, and ignored. Keepalives are generally sent once every two minutes, but note that timeouts can be done much
224+
more quickly when data is expected.
225+
226+
## peer messages
227+
228+
All non-keepalive messages start with a single byte which gives their type.
229+
230+
The possible values are:
231+
232+
- 0 - choke
233+
- 1 - unchoke
234+
- 2 - interested
235+
- 3 - not interested
236+
- 4 - have
237+
- 5 - bitfield
238+
- 6 - request
239+
- 7 - piece
240+
- 8 - cancel
241+
242+
'choke', 'unchoke', 'interested', and 'not interested' have no payload.
243+
244+
'bitfield' is only ever sent as the first message. Its payload is a bitfield with each index that downloader has sent
245+
set to one and the rest set to zero. Downloaders which don't have anything yet may skip the 'bitfield' message. The
246+
first byte of the bitfield corresponds to indices 0 - 7 from high bit to low bit, respectively. The next one 8-15, etc.
247+
Spare bits at the end are set to zero.
248+
249+
The 'have' message's payload is a single number, the index which that downloader just completed and checked the hash of.
250+
251+
'request' messages contain an index, begin, and length. The last two are byte offsets. Length is generally a power of
252+
two unless it gets truncated by the end of the file. All current implementations use 2^14 (16 kiB), and close
253+
connections which request an amount greater than that.
254+
255+
'cancel' messages have the same payload as request messages. They are generally only sent towards the end of a download,
256+
during what's called 'endgame mode'. When a download is almost complete, there's a tendency for the last few pieces to
257+
all be downloaded off a single hosed modem line, taking a very long time. To make sure the last few pieces come in
258+
quickly, once requests for all pieces a given downloader doesn't have yet are currently pending, it sends requests for
259+
everything to everyone it's downloading from. To keep this from becoming horribly inefficient, it sends cancels to
260+
everyone else every time a piece arrives.
261+
262+
'piece' messages contain an index, begin, and piece. Note that they are correlated with request messages implicitly.
263+
It's possible for an unexpected piece to arrive if choke and unchoke messages are sent in quick succession and/or
264+
transfer is going very slowly.
265+
266+
Downloaders generally download pieces in random order, which does a reasonably good job of keeping them from having a
267+
strict subset or superset of the pieces of any of their peers.
268+
269+
Choking is done for several reasons. TCP congestion control behaves very poorly when sending over many connections at
270+
once. Also, choking lets each peer use a tit-for-tat-ish algorithm to ensure that they get a consistent download rate.
271+
272+
The choking algorithm described below is the currently deployed one. It is very important that all new algorithms work
273+
well both in a network consisting entirely of themselves and in a network consisting mostly of this one.
274+
275+
There are several criteria a good choking algorithm should meet. It should cap the number of simultaneous uploads for
276+
good TCP performance. It should avoid choking and unchoking quickly, known as 'fibrillation'. It should reciprocate to
277+
peers who let it download. Finally, it should try out unused connections once in a while to find out if they might be
278+
better than the currently used ones, known as optimistic unchoking.
279+
280+
The currently deployed choking algorithm avoids fibrillation by only changing who's choked once every ten seconds. It
281+
does reciprocation and number of uploads capping by unchoking the four peers which it has the best download rates from
282+
and are interested. Peers which have a better upload rate but aren't interested get unchoked and if they become
283+
interested the worst uploader gets choked. If a downloader has a complete file, it uses its upload rate rather than its
284+
download rate to decide who to unchoke.
285+
286+
For optimistic unchoking, at any one time there is a single peer which is unchoked regardless of its upload rate (if
287+
interested, it counts as one of the four allowed downloaders.) Which peer is optimistically unchoked rotates every 30
288+
seconds. To give them a decent chance of getting a complete piece to upload, new connections are three times as likely
289+
to start as the current optimistic unchoke as anywhere else in the rotation.
290+
291+
## Resources
292+
293+
The [BitTorrent Economics Paper](http://bittorrent.org/bittorrentecon.pdf) outlines some request and choking algorithms
294+
clients should implement for optimal performance
295+
When developing a new implementation the Wireshark protocol analyzer and
296+
its [dissectors for bittorrent](https://wiki.wireshark.org/BitTorrent) can be useful to debug and compare with existing
297+
ones.
298+
299+
## Copyright
300+
301+
This document has been placed in the public domain.

0 commit comments

Comments
 (0)