diff --git a/docs/assets/css/custom.css b/docs/assets/css/custom.css index 130e63e..92d72e7 100644 --- a/docs/assets/css/custom.css +++ b/docs/assets/css/custom.css @@ -73,28 +73,13 @@ html.dark .hextra-card:hover { /* ── Nav ─────────────────────────────────────────────────────── */ -/* Probe Results — primary nav item */ +/* Leaderboards — primary nav item */ a[href$="/probe-results"], a[href$="/probe-results/"] { font-weight: 700 !important; font-size: 1.1em !important; } -/* Separator before Glossary */ -a[href$="/docs"], -a[href$="/docs/"] { - margin-left: 12px !important; - padding-left: 14px !important; - border-left: 1px solid rgba(128, 128, 128, 0.3) !important; -} - -/* Separator before Add a Framework */ -a[href$="/add-a-framework"], -a[href$="/add-a-framework/"] { - margin-left: 12px !important; - padding-left: 14px !important; - border-left: 1px solid rgba(128, 128, 128, 0.3) !important; -} /* ── Code blocks: fix dark-mode readability ─────────────────── */ diff --git a/docs/content/_index.md b/docs/content/_index.md index 8b2ab29..0ba785d 100644 --- a/docs/content/_index.md +++ b/docs/content/_index.md @@ -40,6 +40,7 @@ Http11Probe sends a suite of crafted HTTP requests to each server and checks whe {{< card link="compliance" title="Compliance" subtitle="RFC 9110/9112 protocol requirements — line endings, request-line format, header syntax, Host validation, Content-Length parsing." icon="check-circle" >}} {{< card link="smuggling" title="Smuggling" subtitle="CL/TE ambiguity, duplicate Content-Length, obfuscated Transfer-Encoding, pipeline injection vectors." icon="shield-exclamation" >}} {{< card link="malformed-input" title="Robustness" subtitle="Binary garbage, oversized fields, too many headers, control characters, integer overflow, incomplete requests." icon="lightning-bolt" >}} + {{< card link="normalization" title="Normalization" subtitle="Header normalization behavior — underscore-to-hyphen, space before colon, tab in name, case folding on Transfer-Encoding." icon="adjustments" >}} {{< /cards >}}
diff --git a/docs/content/add-a-framework/_index.md b/docs/content/add-a-framework/_index.md index d79be2b..cd4ca6f 100644 --- a/docs/content/add-a-framework/_index.md +++ b/docs/content/add-a-framework/_index.md @@ -1,13 +1,39 @@ --- title: Add a Framework -toc: false +toc: true --- Http11Probe is designed so anyone can contribute their HTTP server and get compliance results without touching the test infrastructure. +## Required Endpoints + +Your server must listen on **port 8080** and implement three endpoints: + +| Endpoint | Method | Behavior | +|----------|--------|----------| +| `/` | `GET` | Return `200 OK`. This is the baseline reachability check. | +| `/` | `POST` | Read the full request body and return it in the response. Used by body handling and smuggling tests. | +| `/echo` | `POST` | Return all received request headers in the response body, one per line as `Name: Value`. Used by normalization tests. | + +### Why `/echo`? + +Normalization tests need to see how the server internally represents headers after parsing. For example, if the test sends `Content_Length: 99`, the `/echo` endpoint reveals whether the server normalized the underscore to a hyphen, preserved it as-is, or dropped it entirely. Without this endpoint, normalization tests cannot run. + +### Response format for `/echo` + +The response body should contain one header per line in `Name: Value` format: + +``` +Host: localhost:8080 +Content-Length: 11 +Content-Type: text/plain +``` + +The order does not matter. Include all headers the server received (framework-added headers like `Connection` are fine). + ## Steps -**1. Write a minimal server** — Create a directory under `src/Servers/YourServer/` with a simple HTTP server that listens on **port 8080** and returns `200 OK` on `GET /`. Any language, any framework. +**1. Create a server directory** — Add a directory under `src/Servers/YourServer/` with your server source code implementing the three endpoints above. **2. Add a Dockerfile** — Build and run your server. It will run with `--network host`. @@ -17,7 +43,7 @@ Http11Probe is designed so anyone can contribute their HTTP server and get compl {"name": "Your Server"} ``` -That's it. Open a PR and the probe runs automatically. +Open a PR and the probe runs automatically. ## How It Works @@ -26,14 +52,14 @@ The CI pipeline scans `src/Servers/*/probe.json` to discover servers. For each o 1. Builds the Docker image from the Dockerfile in that directory 2. Runs the container on port 8080 with `--network host` 3. Waits for the server to become ready -4. Runs the full compliance probe suite +4. Runs the full probe suite (compliance, smuggling, malformed input, normalization) 5. Stops the container and moves to the next server No workflow edits, no port allocation, no config files. ## Example -Here's the full Flask server as a reference: +Here's the Flask server as a reference: **`src/Servers/FlaskServer/probe.json`** ```json @@ -49,4 +75,36 @@ COPY src/Servers/FlaskServer/app.py . ENTRYPOINT ["python3", "app.py", "8080"] ``` -**`src/Servers/FlaskServer/app.py`** — a minimal Flask app that reads the port from `sys.argv` and returns `200 OK` on `GET /`. +**`src/Servers/FlaskServer/app.py`** +```python +import sys +from flask import Flask, request +from werkzeug.routing import Rule + +app = Flask(__name__) + +@app.route('/echo', methods=['GET','POST','PUT','DELETE','PATCH','OPTIONS','HEAD']) +def echo(): + lines = [] + for name, value in request.headers: + lines.append(f"{name}: {value}") + return '\n'.join(lines) + '\n', 200, {'Content-Type': 'text/plain'} + +app.url_map.add(Rule('/', defaults={"path": ""}, endpoint='catch_all')) +app.url_map.add(Rule('/', endpoint='catch_all')) + +@app.endpoint('catch_all') +def catch_all(path): + if request.method == 'POST': + return request.get_data(as_text=True) + return "OK" + +if __name__ == "__main__": + port = int(sys.argv[1]) if len(sys.argv) > 1 else 8080 + app.run(host="0.0.0.0", port=port) +``` + +The key parts: +- **`/echo`** — echoes all received headers back as plain text. +- **`POST /`** — reads and returns the request body (needed for body and smuggling tests). +- **`GET /`** (catch-all) — returns `"OK"` with `200`. diff --git a/docs/content/docs/body/_index.md b/docs/content/docs/body/_index.md index af905fb..9ea3a81 100644 --- a/docs/content/docs/body/_index.md +++ b/docs/content/docs/body/_index.md @@ -1,7 +1,7 @@ --- title: Body Handling description: "Body Handling — Http11Probe documentation" -weight: 6 +weight: 9 sidebar: open: false --- diff --git a/docs/content/docs/content-length/_index.md b/docs/content/docs/content-length/_index.md index 95b81f7..a8bc9c7 100644 --- a/docs/content/docs/content-length/_index.md +++ b/docs/content/docs/content-length/_index.md @@ -1,7 +1,7 @@ --- title: Content-Length description: "Content-Length — Http11Probe documentation" -weight: 6 +weight: 8 sidebar: open: false --- diff --git a/docs/content/docs/headers/_index.md b/docs/content/docs/headers/_index.md index 00ca2c8..78de358 100644 --- a/docs/content/docs/headers/_index.md +++ b/docs/content/docs/headers/_index.md @@ -1,7 +1,7 @@ --- title: Header Syntax description: "Header Syntax — Http11Probe documentation" -weight: 4 +weight: 6 sidebar: open: false --- diff --git a/docs/content/docs/host-header/_index.md b/docs/content/docs/host-header/_index.md index 9ba0e91..bc08b02 100644 --- a/docs/content/docs/host-header/_index.md +++ b/docs/content/docs/host-header/_index.md @@ -1,7 +1,7 @@ --- title: Host Header description: "Host Header — Http11Probe documentation" -weight: 5 +weight: 7 sidebar: open: false --- diff --git a/docs/content/docs/http-overview/_index.md b/docs/content/docs/http-overview/_index.md new file mode 100644 index 0000000..513551f --- /dev/null +++ b/docs/content/docs/http-overview/_index.md @@ -0,0 +1,19 @@ +--- +title: Understanding HTTP +description: "What HTTP is, how HTTP/1.1 works in depth, its history from 0.9 to 3, and alternatives." +weight: 2 +sidebar: + open: false +--- + +A comprehensive guide to HTTP — what it is, why it was designed the way it was, and how HTTP/1.1 works at the wire level. Start here before diving into the individual test categories. + +{{< cards >}} + {{< card link="what-is-http" title="What is HTTP?" subtitle="Application-layer request/response protocol, client-server model, stateless design, and core design goals." icon="question-mark-circle" >}} + {{< card link="message-syntax" title="Message Syntax" subtitle="Request and response structure, methods (GET, POST, PUT...), status codes (1xx–5xx), and the request-line grammar." icon="code" >}} + {{< card link="headers" title="Headers" subtitle="Header structure, common request and response headers, the Host header, and why it's the only required header." icon="document-text" >}} + {{< card link="connections" title="Connections" subtitle="Persistent connections, keep-alive, pipelining, head-of-line blocking, Upgrade, and 100 Continue." icon="switch-horizontal" >}} + {{< card link="body-and-framing" title="Body and Framing" subtitle="Content-Length, chunked transfer encoding, trailers, and why CL+TE conflicts cause request smuggling." icon="document-download" >}} + {{< card link="caching-and-negotiation" title="Caching and Negotiation" subtitle="Content negotiation with Accept headers, Cache-Control, ETags, conditional requests, and Vary." icon="refresh" >}} + {{< card link="history-and-future" title="History and Future" subtitle="HTTP/0.9 to HTTP/3, the current IETF work, alternatives to HTTP, and learning resources." icon="clock" >}} +{{< /cards >}} diff --git a/docs/content/docs/http-overview/body-and-framing.md b/docs/content/docs/http-overview/body-and-framing.md new file mode 100644 index 0000000..be44e76 --- /dev/null +++ b/docs/content/docs/http-overview/body-and-framing.md @@ -0,0 +1,179 @@ +--- +title: Body and Framing +description: "Content-Length, chunked transfer encoding, trailers, and why CL+TE conflicts cause request smuggling." +weight: 5 +--- + +HTTP/1.1 messages optionally carry a **message body** after the header section. The critical question for any parser is: **where does the body end?** Getting this wrong is the root cause of HTTP request smuggling. + +## When Is a Body Present? + +- **Requests** — a body is present if `Content-Length` or `Transfer-Encoding` is set. `GET`, `HEAD`, `DELETE`, and `OPTIONS` typically have no body (though the spec doesn't forbid it). +- **Responses** — all responses to `HEAD` requests and all `1xx`, `204`, and `304` responses have no body. Everything else may have a body. + +## Content-Length + +The `Content-Length` header declares the exact size of the body in bytes as a decimal integer: + +```http +POST /data HTTP/1.1 +Host: example.com +Content-Type: text/plain +Content-Length: 13 + +Hello, World! +``` + +The parser reads exactly 13 bytes after the empty line, then the next bytes are the start of the next message (on a persistent connection) or the connection ends. + +### Rules + +- The value **MUST** be a non-negative decimal integer. +- **No leading zeros** — `Content-Length: 007` is invalid. +- **No signs** — `Content-Length: +13` or `Content-Length: -1` are invalid. +- **No whitespace** within the value — `Content-Length: 1 3` is invalid. +- If `Content-Length` **doesn't match** the actual body size, the message is malformed. The server SHOULD close the connection. +- **Multiple `Content-Length` headers** are allowed only if all values are identical. If they differ, the message is malformed and MUST be rejected. + +### Why Strictness Matters + +Lenient parsing of `Content-Length` is a common source of vulnerabilities: + +- `Content-Length: 0x0d` — if parsed as hex, this is 13 bytes. If parsed as decimal, it's invalid. A parser mismatch between front-end and back-end enables smuggling. +- `Content-Length: 13, 14` — a list of two differing values. One parser might take the first, another the last. + +## Chunked Transfer Encoding + +When the total body size is unknown at the time headers are sent (streaming, server-generated content, compression), HTTP/1.1 uses **chunked transfer encoding**. + +### Format + +``` +chunk-size (hex) CRLF +chunk-data CRLF +... +0 CRLF +[ trailer-section ] +CRLF +``` + +Each chunk starts with the chunk size in hexadecimal, followed by CRLF, then exactly that many bytes of data, followed by CRLF. A zero-length chunk signals the end of the body. + +### Full Example + +```http +HTTP/1.1 200 OK +Transfer-Encoding: chunked + +4\r\n +Wiki\r\n +7\r\n +pedia i\r\n +B\r\n +n chunks.\r\n +0\r\n +\r\n +``` + +Decoded body: `Wikipedia in chunks.` + +### Chunk Extensions + +A chunk-size may be followed by semicolon-separated extensions: + +``` +a;ext-name=ext-value\r\n +0123456789\r\n +``` + +Most servers and proxies **ignore** chunk extensions. They exist for potential use cases like per-chunk checksums or metadata, but are rarely used in practice. Some security tools test whether servers handle unexpected extensions safely. + +### Trailers + +After the final zero-length chunk, **trailer fields** may appear — headers sent after the body: + +```http +HTTP/1.1 200 OK +Transfer-Encoding: chunked +Trailer: Checksum + +4\r\n +data\r\n +0\r\n +Checksum: abc123\r\n +\r\n +``` + +Trailers are useful for: +- **Checksums/signatures** — computed as the body streams. +- **Processing status** — whether the server completed successfully. +- **Metadata** — anything that can't be determined until after the body is generated. + +The `Trailer` header in the response declares which trailer fields to expect (though this is advisory, not enforced). + +### Rules + +- Chunk sizes **MUST** be hexadecimal, case-insensitive (`a` and `A` are both valid). +- A zero-length chunk **MUST** be present to terminate the body. +- After the zero-length chunk, the trailer section and final CRLF complete the message. + +## Content-Length vs Transfer-Encoding + +A message **MUST NOT** contain both `Content-Length` and `Transfer-Encoding`. + +RFC 9112 §6.1 is explicit: + +> If a message is received with both a Transfer-Encoding and a Content-Length header field, the Transfer-Encoding overrides the Content-Length. Such a message might indicate an attempt to perform request smuggling or response splitting and **ought to be handled as an error**. + +### The Request Smuggling Problem + +This ambiguity is the **root cause of HTTP request smuggling**. Consider a message with both headers: + +```http +POST / HTTP/1.1 +Host: example.com +Content-Length: 6 +Transfer-Encoding: chunked + +0\r\n +\r\n +GPOST +``` + +- A parser that uses **Transfer-Encoding** sees a zero-length chunk → body ends immediately. The remaining bytes (`GPOST`) are the start of the next request. +- A parser that uses **Content-Length** reads 6 bytes (`0\r\n\r\nG`) as the body. `POST` becomes part of the next request with a different method. + +If a front-end proxy uses one interpretation and a back-end server uses another, the attacker controls where one request ends and the next begins. This can: +- **Bypass access controls** — smuggle a request to an internal endpoint. +- **Poison caches** — make the cache store an attacker-controlled response for a victim's URL. +- **Hijack connections** — capture another user's request. + +### How Servers Should Handle It + +Strict servers should: +1. **Reject** messages with both `Content-Length` and `Transfer-Encoding` with a 400 response. +2. If not rejecting, **always prioritize `Transfer-Encoding`** and ignore `Content-Length`. +3. **Never trust `Content-Length`** when `Transfer-Encoding` is present. + +This is one of the most critical compliance checks that Http11Probe performs. + +## Transfer-Encoding Obfuscation + +Attackers may try to hide `Transfer-Encoding` from one parser while making another recognize it: + +```http +Transfer-Encoding: chunked +Transfer-Encoding : chunked +Transfer-Encoding: xchunked +Transfer-Encoding: chunked\r\n (extra space) +Transfer-Encoding: + chunked +``` + +Each of these variants exploits differences in how parsers handle: +- Whitespace before the colon (forbidden by RFC 9112 §5.1). +- Unknown transfer coding names. +- Obs-fold (deprecated line folding). +- Leading/trailing whitespace in the value. + +Strict, RFC-compliant parsing eliminates these attack surfaces. diff --git a/docs/content/docs/http-overview/caching-and-negotiation.md b/docs/content/docs/http-overview/caching-and-negotiation.md new file mode 100644 index 0000000..0cc51a2 --- /dev/null +++ b/docs/content/docs/http-overview/caching-and-negotiation.md @@ -0,0 +1,195 @@ +--- +title: Caching and Negotiation +description: "Content negotiation with Accept headers, Cache-Control, ETags, conditional requests, and Vary." +weight: 6 +--- + +HTTP/1.1 includes built-in mechanisms for content negotiation and caching. These features reduce bandwidth, latency, and server load without requiring application-level changes. + +## Content Negotiation + +Content negotiation lets the client and server agree on the best **representation** of a resource. A single URL can serve different formats, languages, or encodings depending on the client's capabilities and preferences. + +### Proactive (Server-Driven) Negotiation + +The client sends preferences in `Accept*` headers, and the server chooses the best match: + +```http +GET /document HTTP/1.1 +Host: example.com +Accept: text/html, application/json;q=0.9 +Accept-Language: en-US, pt;q=0.8 +Accept-Encoding: gzip, br +``` + +#### Quality Values + +The `q` parameter (quality value, 0.000–1.000) indicates preference weight: + +- `text/html` — no `q` value means `q=1.0` (highest preference). +- `application/json;q=0.9` — acceptable, but HTML is preferred. +- `pt;q=0.8` — Portuguese is acceptable, but English is preferred. + +The server picks the best match and indicates what it chose via `Content-Type`, `Content-Language`, and `Content-Encoding` response headers. + +#### Accept Header Negotiation + +| Accept Header | What It Negotiates | +|---------------|-------------------| +| `Accept` | Media type (e.g., `text/html`, `application/json`, `image/webp`). | +| `Accept-Language` | Natural language (e.g., `en-US`, `pt-BR`, `ja`). | +| `Accept-Encoding` | Compression algorithm (e.g., `gzip`, `deflate`, `br`, `zstd`). | +| `Accept-Charset` | Character encoding (largely obsolete — UTF-8 is near-universal). | + +#### Wildcard Matching + +- `*/*` — accept any media type. +- `text/*` — accept any text subtype. +- `*` in `Accept-Encoding` — accept any encoding. + +### Reactive (Agent-Driven) Negotiation + +Instead of guessing, the server tells the client what's available: + +- **`300 Multiple Choices`** — the server lists available representations and the client picks one. +- **`406 Not Acceptable`** — no representation matches the client's preferences. + +Reactive negotiation is less common because it requires an extra round-trip. + +## Caching + +HTTP/1.1 has a sophisticated caching model defined in RFC 9111. Caches can exist at multiple layers: + +- **Browser cache** — private, per-user cache in the client. +- **Proxy cache** — shared cache at a forward proxy or CDN edge node. +- **Gateway/reverse-proxy cache** — shared cache at the origin's front door (e.g., Varnish, Nginx). + +### Cache-Control + +The `Cache-Control` header is the primary mechanism for controlling caching behavior: + +#### Request Directives + +| Directive | Meaning | +|-----------|---------| +| `no-cache` | The cache must revalidate with the origin before using a stored response. | +| `no-store` | The cache MUST NOT store any part of the request or response. | +| `max-age=N` | Accept a cached response that is at most N seconds old. | +| `max-stale[=N]` | Accept a response that has been stale for up to N seconds. | +| `min-fresh=N` | Require the response to be fresh for at least N more seconds. | +| `only-if-cached` | Only return a cached response; don't contact the origin. Return `504` if nothing is cached. | + +#### Response Directives + +| Directive | Meaning | +|-----------|---------| +| `max-age=N` | The response is fresh for N seconds from the time it was generated. | +| `s-maxage=N` | Like `max-age`, but only applies to shared caches (CDNs, proxies). Overrides `max-age`. | +| `no-cache` | The response may be stored but MUST be revalidated before each use. | +| `no-store` | The response MUST NOT be stored by any cache. | +| `private` | The response is intended for a single user. Shared caches MUST NOT store it. | +| `public` | The response may be stored by any cache, even if it would normally be non-cacheable. | +| `must-revalidate` | Once stale, the cache MUST revalidate before using. MUST NOT serve stale on error. | +| `immutable` | The response body will not change. Prevents revalidation even on user refresh. | +| `stale-while-revalidate=N` | Serve stale for up to N seconds while revalidating in the background. | + +### Conditional Requests + +Conditional requests let a cache check whether its stored response is still valid without downloading the full body again. + +#### ETag / If-None-Match + +1. Server sends a response with an `ETag`: + +```http +HTTP/1.1 200 OK +ETag: "abc123" +Content-Length: 5000 + +...body... +``` + +2. Client stores the response. On the next request, it sends the ETag back: + +```http +GET /resource HTTP/1.1 +Host: example.com +If-None-Match: "abc123" +``` + +3. If the resource hasn't changed, the server responds with no body: + +```http +HTTP/1.1 304 Not Modified +ETag: "abc123" +``` + +ETags can be **strong** (`"abc123"`) or **weak** (`W/"abc123"`). Strong ETags guarantee byte-for-byte identity. Weak ETags indicate semantic equivalence — the content is "close enough" that a cached version is acceptable. + +#### Last-Modified / If-Modified-Since + +A timestamp-based alternative to ETags: + +1. Server sends `Last-Modified`: + +```http +HTTP/1.1 200 OK +Last-Modified: Wed, 21 Oct 2024 07:28:00 GMT +``` + +2. Client sends `If-Modified-Since`: + +```http +GET /resource HTTP/1.1 +If-Modified-Since: Wed, 21 Oct 2024 07:28:00 GMT +``` + +3. If unmodified, the server responds with `304 Not Modified`. + +ETags are more precise (a resource can change and change back within the same second), but `Last-Modified` is simpler and works well for static files. + +### Vary + +The `Vary` header tells caches which **request headers** affect the response. Without `Vary`, a cache might serve a gzip-compressed response to a client that doesn't support gzip. + +```http +HTTP/1.1 200 OK +Content-Encoding: gzip +Vary: Accept-Encoding +``` + +This tells caches: "the response depends on the `Accept-Encoding` request header." The cache must store separate copies for each unique `Accept-Encoding` value. + +Common `Vary` values: +- `Vary: Accept-Encoding` — different compression levels. +- `Vary: Accept-Language` — different language versions. +- `Vary: Accept` — different media types (HTML vs JSON). +- `Vary: Cookie` — personalized content (effectively disables shared caching). +- `Vary: *` — every request is unique; never serve from cache. + +### Age + +The `Age` header indicates how many seconds a response has been in a cache: + +```http +HTTP/1.1 200 OK +Cache-Control: max-age=3600 +Age: 600 +``` + +This response has been cached for 600 seconds and has 3000 seconds of freshness remaining. + +### Caching Flow Summary + +``` +Client sends request + ↓ +Cache checks for stored response + ├── No stored response → forward to origin → store response → return + ├── Fresh stored response → return immediately (Age incremented) + └── Stale stored response + ├── must-revalidate → conditional request to origin + │ ├── 304 → update freshness, return stored response + │ └── 200 → store new response, return + └── stale-while-revalidate → return stale, revalidate in background +``` diff --git a/docs/content/docs/http-overview/connections.md b/docs/content/docs/http-overview/connections.md new file mode 100644 index 0000000..abe401b --- /dev/null +++ b/docs/content/docs/http-overview/connections.md @@ -0,0 +1,196 @@ +--- +title: Connections +description: "TCP connection lifecycle, persistent connections, pipelining, Upgrade, and 100 Continue." +weight: 4 +--- + +HTTP/1.1 runs over TCP (or TLS over TCP for HTTPS). This page covers the connection lifecycle and the features HTTP/1.1 provides for efficient connection use. + +## TCP Connection Lifecycle + +A typical HTTP/1.1 exchange: + +1. **DNS resolution** — the client resolves the server's hostname to an IP address. May involve multiple queries (A, AAAA, CNAME). +2. **TCP handshake** — a three-way handshake (SYN → SYN-ACK → ACK) establishes the connection. Adds one round-trip of latency. +3. **TLS handshake** (if HTTPS) — client and server negotiate cipher suites, exchange certificates, and derive session keys. TLS 1.2 adds two round-trips; TLS 1.3 adds one (or zero with 0-RTT resumption). +4. **Request/response exchange** — the client sends one or more requests; the server responds in order. +5. **Connection close** — either side sends `Connection: close`, the TCP connection times out, or a TCP RST is sent. + +Before any HTTP data can flow, the overhead is at minimum one round-trip (TCP) and often two or three (TCP + TLS). This is why connection reuse matters. + +## Persistent Connections (Keep-Alive) + +One of HTTP/1.1's most important improvements over 1.0 is **persistent connections**. + +### How It Changed + +- In **HTTP/1.0**, every request required a new TCP connection. Three-way handshake, slow-start, optional TLS negotiation — all repeated for every request. This added hundreds of milliseconds of latency per resource. +- In **HTTP/1.1**, connections are **persistent by default**. Multiple requests and responses can be sent sequentially over the same TCP connection without renegotiating. + +### Wire Example + +```http +GET /page1 HTTP/1.1 +Host: example.com + +HTTP/1.1 200 OK +Content-Length: 500 + +...body... + +GET /style.css HTTP/1.1 +Host: example.com + +HTTP/1.1 200 OK +Content-Length: 300 + +...body... + +GET /script.js HTTP/1.1 +Host: example.com +Connection: close + +HTTP/1.1 200 OK +Content-Length: 800 + +...body... +(TCP connection closed) +``` + +Three requests over one connection. The third request includes `Connection: close` to signal that the connection should be closed after the response. + +### Benefits + +- **Eliminates TCP handshake overhead** for subsequent requests. +- **TCP congestion window grows** over the life of the connection, improving throughput for later requests. +- **Reduces server resource usage** — fewer sockets, fewer TIME_WAIT entries, less memory. +- **Enables pipelining** (see below). + +### Closing a Connection + +Either side can close the connection: + +- **`Connection: close`** — the sender will close the connection after this message. The recipient should not send further requests on this connection. +- **Server timeout** — most servers close idle connections after a configurable period (e.g., 60 seconds in Nginx, 5 seconds in Apache). +- **TCP RST** — abrupt connection termination. Can happen if the server crashes, hits a resource limit, or detects a protocol error. + +## Pipelining + +HTTP/1.1 allows **pipelining** — sending multiple requests without waiting for each response: + +``` +Client → Server: GET /a GET /b GET /c +Server → Client: resp /a resp /b resp /c +``` + +Responses **MUST** be returned in the same order as the requests. This creates **head-of-line (HOL) blocking**: if `/a` is slow (e.g., a large database query), `/b` and `/c` are delayed even if they're ready. + +### Why Pipelining Failed + +In practice, pipelining is rarely used: + +- **HOL blocking** negates most latency benefits. +- **Buggy intermediaries** — many proxies and load balancers don't handle pipelined requests correctly, sometimes sending responses out of order or dropping requests. +- **Error recovery is complex** — if the connection drops mid-pipeline, the client doesn't know which requests were processed. +- **Browsers never enabled it** — no major browser ships with pipelining on by default. + +HTTP/2's **multiplexing** solves this by allowing interleaved responses on independent streams. + +## Connection Management Headers + +### `Connection` + +The `Connection` header serves two purposes: + +1. **Signaling connection close** — `Connection: close` tells the other side the connection will be closed after this message. +2. **Listing hop-by-hop headers** — any headers listed in `Connection` are hop-by-hop and MUST be removed by proxies before forwarding. For example, `Connection: Keep-Alive, X-Custom` means both `Keep-Alive` and `X-Custom` are consumed by the next hop. + +### `Keep-Alive` + +The `Keep-Alive` header is informational and can suggest parameters: + +```http +Keep-Alive: timeout=5, max=100 +``` + +- `timeout` — how many seconds the server will keep the idle connection open. +- `max` — maximum number of requests the server will accept on this connection. + +These values are **not binding** — either side can close at any time. + +## Protocol Upgrade + +The `Upgrade` header allows switching from HTTP/1.1 to a different protocol on the same connection. + +### Mechanism + +1. Client sends a request with `Upgrade` and `Connection: Upgrade`: + +```http +GET /chat HTTP/1.1 +Host: example.com +Upgrade: websocket +Connection: Upgrade +Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ== +Sec-WebSocket-Version: 13 +``` + +2. If the server agrees, it responds with `101 Switching Protocols`: + +```http +HTTP/1.1 101 Switching Protocols +Upgrade: websocket +Connection: Upgrade +Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo= +``` + +3. From this point, the connection speaks the new protocol (WebSocket in this example). + +### Common Upgrades + +| Target Protocol | Usage | +|----------------|-------| +| WebSocket | Full-duplex communication for real-time applications. | +| h2c | HTTP/2 over cleartext (no TLS). Rarely used — most HTTP/2 uses ALPN during TLS. | +| TLS/1.0 | Historical — upgrading an HTTP connection to HTTPS (largely replaced by direct HTTPS). | + +## 100 Continue + +The `Expect: 100-continue` mechanism prevents clients from sending large request bodies that the server will reject. + +### Flow + +1. Client sends headers with `Expect: 100-continue` but **withholds the body**: + +```http +POST /upload HTTP/1.1 +Host: example.com +Content-Length: 52428800 +Expect: 100-continue +``` + +2. Server checks the headers (authentication, content-length limits, etc.): + - If acceptable: responds with `100 Continue` — the client then sends the body. + - If not acceptable: responds with a 4xx error (e.g., `413 Content Too Large`) — the client never sends the 50MB body. + +```http +HTTP/1.1 100 Continue + +(client sends body) + +HTTP/1.1 200 OK +``` + +### Why It Matters + +Without `Expect: 100-continue`, a client uploading a large file would send the entire body before learning the server rejects it (wrong auth, too large, wrong content type). This wastes bandwidth and time. + +## Line Endings and Parsing Strictness + +RFC 9112 §2.2 defines strict rules for line endings in HTTP/1.1 messages: + +- All line endings **MUST** be CRLF (`\r\n`). +- Bare CR (`\r`) without a following LF is **not a valid line terminator** and MUST be rejected. +- Bare LF (`\n`) without a preceding CR — the spec says a server **MAY** accept bare LF as a line terminator in the request-line and header fields, but this is a robustness concession, not a requirement. + +These rules exist to prevent parsing ambiguities. If a front-end proxy interprets line endings differently from a back-end server, an attacker can exploit the discrepancy for **request smuggling** or **header injection**. diff --git a/docs/content/docs/http-overview/headers.md b/docs/content/docs/http-overview/headers.md new file mode 100644 index 0000000..ce498dd --- /dev/null +++ b/docs/content/docs/http-overview/headers.md @@ -0,0 +1,101 @@ +--- +title: Headers +description: "HTTP header structure, common request and response headers, and the Host header requirement." +weight: 3 +--- + +Headers are the primary extension mechanism in HTTP. They carry metadata about the message, the resource, the connection, and the client/server. + +## Structure + +``` +field-name ":" OWS field-value OWS CRLF +``` + +- **field-name** is case-insensitive and MUST NOT contain whitespace or colons. It must be a valid `token` — one or more characters from `!#$%&'*+-.^_|~`, digits, and letters. +- **OWS** (optional whitespace) may appear between the colon and the value, and after the value. +- **No space before the colon** — RFC 9112 §5.1 forbids whitespace between the field-name and the colon. Servers that receive it **MUST** reject the message with 400 or strip the whitespace before processing. +- Header field values can span multiple lines using **obs-fold** (obsolete line folding — a CRLF followed by at least one space or tab), but this is deprecated. Servers **MUST** either reject obs-fold with 400 or replace it with a single space before processing. + +## Header Categories + +HTTP headers fall into several categories based on their scope: + +| Category | Description | Examples | +|----------|-------------|---------| +| **Request headers** | Sent by the client to provide context about the request. | `Host`, `Accept`, `Authorization`, `User-Agent` | +| **Response headers** | Sent by the server to provide context about the response. | `Server`, `Set-Cookie`, `WWW-Authenticate` | +| **Representation headers** | Describe the body content in either direction. | `Content-Type`, `Content-Length`, `Content-Encoding` | +| **Hop-by-hop headers** | Consumed by the next intermediary, not forwarded. Listed in the `Connection` header. | `Connection`, `Transfer-Encoding`, `Keep-Alive`, `Upgrade` | +| **End-to-end headers** | Forwarded by intermediaries to the final recipient. | Everything not listed in `Connection`. | + +## Common Request Headers + +| Header | Purpose | +|--------|---------| +| `Host` | **Required** in HTTP/1.1. Identifies the target host and port. Enables virtual hosting. | +| `Content-Type` | Media type of the request body (e.g., `application/json`, `multipart/form-data`). | +| `Content-Length` | Size of the request body in bytes. Must be an exact decimal integer. | +| `Transfer-Encoding` | Body encoding (e.g., `chunked`). Mutually exclusive with `Content-Length` in practice. | +| `Accept` | Media types the client can handle (e.g., `text/html, application/json`). | +| `Accept-Encoding` | Compression algorithms the client supports (e.g., `gzip, deflate, br`). | +| `Accept-Language` | Preferred natural languages (e.g., `en-US, pt;q=0.8`). | +| `Authorization` | Credentials for authenticating the client (e.g., `Bearer `, `Basic `). | +| `User-Agent` | Identifies the client software and version. | +| `Connection` | Controls connection persistence (`keep-alive`, `close`) and lists hop-by-hop headers. | +| `Cookie` | Sends stored cookies to the server. | +| `If-None-Match` | Conditional request — send the resource only if the ETag doesn't match (for caching). | +| `If-Modified-Since` | Conditional request — send the resource only if modified after this timestamp. | +| `Expect` | Indicates expectations the server must meet (e.g., `100-continue`). | +| `Referer` | URL of the page that linked to the current request. | + +## Common Response Headers + +| Header | Purpose | +|--------|---------| +| `Content-Type` | Media type of the response body (e.g., `text/html; charset=utf-8`). | +| `Content-Length` | Size of the response body in bytes. | +| `Transfer-Encoding` | Body encoding applied to the response (e.g., `chunked`). | +| `Cache-Control` | Caching directives (e.g., `no-cache`, `max-age=3600`, `private`). | +| `ETag` | Opaque identifier for a specific version of the resource. Used for conditional requests. | +| `Last-Modified` | Timestamp of last modification. Used with `If-Modified-Since`. | +| `Set-Cookie` | Sends a cookie to the client for storage. | +| `Location` | URL to redirect to (used with 3xx and 201 status codes). | +| `Server` | Identifies the server software. | +| `WWW-Authenticate` | Defines the authentication scheme for 401 responses. | +| `Vary` | Lists request headers that affect the response (important for caching). | +| `Allow` | Lists permitted methods for the resource (required with 405 responses). | +| `Retry-After` | Suggests how long the client should wait before retrying (used with 429/503). | + +## The Host Header + +The `Host` header is the **only header that HTTP/1.1 requires** in every request. It was introduced to support **virtual hosting** — multiple websites served from the same IP address and port. + +### Why It's Required + +Before HTTP/1.1, each website needed its own IP address. The `Host` header allows a server to distinguish between `example.com` and `other.com` even when both resolve to the same IP. Without it, the server has no way to determine which virtual host the request is for. + +### Rules + +RFC 9112 §3.2 defines strict requirements: + +- A client **MUST** send a `Host` header in every HTTP/1.1 request. +- A server **MUST** respond with **400 Bad Request** if: + - The `Host` header is **missing**. + - There are **multiple** `Host` headers. + - The `Host` value is **invalid**. +- The `Host` value must match the URI authority (hostname and optional port). + +```http +GET / HTTP/1.1 +Host: example.com +``` + +```http +GET /api/data HTTP/1.1 +Host: api.example.com:8443 +``` + +### Host vs :authority + +In HTTP/2 and HTTP/3, the `Host` header is replaced by the `:authority` pseudo-header in the request. However, `Host` is still sent for backward compatibility with intermediaries. diff --git a/docs/content/docs/http-overview/history-and-future.md b/docs/content/docs/http-overview/history-and-future.md new file mode 100644 index 0000000..a198e27 --- /dev/null +++ b/docs/content/docs/http-overview/history-and-future.md @@ -0,0 +1,131 @@ +--- +title: History and Future +description: "HTTP's evolution from 0.9 to 3, the current IETF work, alternatives to HTTP, and learning resources." +weight: 7 +--- + +## History + +| Year | Version | Key Milestone | +|------|---------|---------------| +| 1991 | HTTP/0.9 | Tim Berners-Lee's original protocol. Single-line `GET` request, HTML-only response, no headers, no status codes. | +| 1996 | HTTP/1.0 (RFC 1945) | Added headers, status codes, content types, and `POST`/`HEAD` methods. One request per TCP connection. | +| 1997 | HTTP/1.1 (RFC 2068) | Persistent connections, `Host` header (virtual hosting), chunked encoding, content negotiation. | +| 1999 | HTTP/1.1 (RFC 2616) | Consolidated and revised specification. The reference for over a decade. | +| 2014 | HTTP/1.1 (RFC 7230–7235) | Split into six focused documents, clarified edge cases, obsoleted RFC 2616. | +| 2022 | HTTP (RFC 9110/9112) | Current standard. Separated semantics (9110) from message syntax (9112). Version-agnostic semantics. | + +### HTTP/0.9 (1991) + +The original protocol had no version number, no headers, and no status codes. A request was a single line: + +``` +GET /page.html +``` + +The server responded with raw HTML and closed the connection. That's it. No content type, no error handling, no metadata. + +### HTTP/1.0 (1996) + +HTTP/1.0 (RFC 1945) added the features we now consider essential: + +- **Headers** — both request and response headers for metadata. +- **Status codes** — `200 OK`, `404 Not Found`, `500 Internal Server Error`. +- **Content types** — the `Content-Type` header, enabling non-HTML responses. +- **New methods** — `POST` and `HEAD` alongside `GET`. + +The major limitation: **one request per TCP connection**. Loading a page with 20 images meant 20 separate TCP connections, each with handshake overhead. + +### HTTP/1.1 (1997–2022) + +HTTP/1.1 was a major leap that introduced: + +- **Persistent connections** — reuse TCP connections across multiple requests. +- **Host header** — required in every request, enabling virtual hosting. +- **Chunked transfer encoding** — stream responses of unknown size. +- **Content negotiation** — `Accept`, `Accept-Language`, `Accept-Encoding`. +- **Caching** — `Cache-Control`, `ETag`, conditional requests. +- **Range requests** — partial content delivery for resumable downloads. +- **Pipelining** — send multiple requests without waiting (though rarely used in practice). + +The specification was revised multiple times: +- **RFC 2068** (1997) — initial specification. +- **RFC 2616** (1999) — consolidated revision, the reference for 15+ years. +- **RFC 7230–7235** (2014) — split into six focused documents for clarity. +- **RFC 9110–9112** (2022) — current standard, separating semantics from wire format. + +## HTTP Today + +### HTTP/1.1 + +Still widely deployed and **the dominant protocol** for: +- Server-to-server communication behind load balancers. +- Reverse proxies and internal APIs. +- Environments where simplicity and debuggability matter. +- Legacy systems and embedded devices. + +Its text-based format makes it uniquely accessible for debugging — you can literally read the bytes on the wire. + +### HTTP/2 (2015, RFC 9113) + +HTTP/2 addressed HTTP/1.1's performance limitations: + +- **Binary framing** — messages are encoded in binary frames instead of text. More compact and less error-prone to parse. +- **Multiplexing** — multiple concurrent request/response exchanges on a single connection, eliminating head-of-line blocking at the HTTP layer. +- **Header compression (HPACK)** — compresses headers using a static table and dynamic indexing. Headers like `Host`, `Accept`, and `User-Agent` that repeat on every request are sent efficiently. +- **Server push** — the server can proactively send resources it knows the client will need. (Largely deprecated — Chrome removed support in 2022.) +- **Stream prioritization** — clients can indicate which resources are more important. + +HTTP/2 keeps the same semantics (methods, status codes, headers) as HTTP/1.1 — it only changes how messages are framed on the wire. Most HTTP/2 deployments use TLS (the `h2` protocol identifier negotiated via ALPN). + +### HTTP/3 (2022, RFC 9114) + +HTTP/3 replaces TCP with **QUIC**, a UDP-based transport: + +- **No TCP head-of-line blocking** — packet loss on one stream doesn't block others. In HTTP/2 over TCP, a single lost packet stalls all streams. +- **0-RTT connection setup** — QUIC combines the transport and TLS handshake into a single round-trip. Resumed connections can send data immediately (0-RTT). +- **Connection migration** — a QUIC connection survives network changes (e.g., switching from Wi-Fi to cellular) because it's identified by a connection ID, not a source IP+port tuple. +- **Built-in encryption** — TLS 1.3 is mandatory and integrated into the transport layer. +- **Header compression (QPACK)** — similar to HPACK but designed for QUIC's out-of-order delivery. + +## The Future + +Active work in the IETF HTTP Working Group includes: + +- **WebTransport** — bidirectional, multiplexed transport for web applications, built on HTTP/3. Enables use cases like game networking and live media that need both reliable and unreliable delivery. +- **HTTP Datagrams** (RFC 9297) — unreliable datagram delivery over HTTP connections. Enables latency-sensitive applications that can tolerate packet loss. +- **MASQUE proxying** — using HTTP CONNECT-UDP and CONNECT-IP for tunneling arbitrary IP and UDP traffic through HTTP proxies. Enables VPN-like functionality over HTTP infrastructure. +- **Resumable uploads** — standardizing the ability to pause and resume large file uploads (draft-ietf-httpbis-resumable-upload). +- Ongoing refinement of HTTP semantics, caching specifications, and security best practices. + +## Alternatives to HTTP + +HTTP is not the only application-layer protocol. Depending on the use case, other protocols may be a better fit: + +| Protocol | Transport | Use Case | +|----------|-----------|----------| +| **gRPC** | HTTP/2 | High-performance RPC with Protocol Buffers. Strongly typed contracts, streaming, deadlines. Common for microservice communication. | +| **WebSocket** | TCP (HTTP Upgrade) | Full-duplex, persistent connection. Real-time applications like chat, live dashboards, collaborative editing. | +| **MQTT** | TCP | Lightweight pub/sub messaging for IoT and constrained devices. Tiny packet overhead, QoS levels, retained messages. | +| **CoAP** | UDP | Constrained Application Protocol — REST-like semantics for low-power, lossy networks. Uses UDP with optional reliability. | +| **AMQP** | TCP | Advanced Message Queuing Protocol — reliable message brokering with routing, queuing, and transactions. (RabbitMQ, Azure Service Bus.) | +| **FTP** | TCP | File transfer protocol. Still used for legacy integrations, bulk file exchange, and some hosting workflows. | +| **SMTP** | TCP | Email delivery. Purpose-built for store-and-forward message delivery across mail servers. | + +## Learn More + +### Videos + +- [HTTP Crash Course & Explore](https://www.youtube.com/watch?v=iYM2zFP3Zn0) — Traversy Media +- [How HTTP Requests Work](https://www.youtube.com/watch?v=4_-KdOo4rGo) — LiveOverflow +- [HTTP/1 to HTTP/2 to HTTP/3](https://www.youtube.com/watch?v=a-sBfyiXysI) — Hussein Nasser + +### Documentation + +- [MDN: An overview of HTTP](https://developer.mozilla.org/en-US/docs/Web/HTTP/Overview) — beginner-friendly reference. +- [RFC 9110 — HTTP Semantics](https://www.rfc-editor.org/rfc/rfc9110) — the current specification for HTTP semantics. +- [RFC 9112 — HTTP/1.1](https://www.rfc-editor.org/rfc/rfc9112) — the current specification for HTTP/1.1 message syntax. +- [RFC 9113 — HTTP/2](https://www.rfc-editor.org/rfc/rfc9113) — the HTTP/2 specification. +- [RFC 9114 — HTTP/3](https://www.rfc-editor.org/rfc/rfc9114) — the HTTP/3 specification. +- [IETF HTTP Working Group](https://httpwg.org/) — active drafts, meeting notes, and mailing list. +- [High Performance Browser Networking](https://hpbn.co/) — Ilya Grigorik's free online book covering HTTP, TLS, and networking performance. diff --git a/docs/content/docs/http-overview/message-syntax.md b/docs/content/docs/http-overview/message-syntax.md new file mode 100644 index 0000000..185ea38 --- /dev/null +++ b/docs/content/docs/http-overview/message-syntax.md @@ -0,0 +1,157 @@ +--- +title: Message Syntax +description: "HTTP/1.1 request and response message structure, methods, and status codes." +weight: 2 +--- + +This page covers the wire-level structure of HTTP/1.1 messages as defined by **RFC 9112** (HTTP/1.1 Message Syntax and Routing). + +## General Message Format + +Every HTTP/1.1 message — whether request or response — follows the same structure: + +``` +start-line CRLF +*( header-field CRLF ) +CRLF +[ message-body ] +``` + +The start-line is either a **request-line** or a **status-line**. Headers follow as `field-name: field-value` pairs, each terminated by CRLF. An empty line (bare CRLF) separates headers from the optional body. + +## Request Message + +``` +method SP request-target SP HTTP-version CRLF +*( field-name ":" OWS field-value OWS CRLF ) +CRLF +[ message-body ] +``` + +Example — a `POST` with a JSON body: + +```http +POST /api/users HTTP/1.1 +Host: example.com +Content-Type: application/json +Content-Length: 27 + +{"name":"Alice","age":30} +``` + +Key rules (RFC 9112 §3): +- Exactly **one SP** (space, `0x20`) between method, request-target, and HTTP-version. +- The request-target is usually an absolute path (`/index.html`) or an asterisk (`*`) for `OPTIONS`. +- The HTTP-version **MUST** be `HTTP/1.1` (or `HTTP/1.0` for legacy). +- The request-line **MUST** end with CRLF. No extra whitespace, no trailing characters. + +## Response Message + +``` +HTTP-version SP status-code SP [ reason-phrase ] CRLF +*( field-name ":" OWS field-value OWS CRLF ) +CRLF +[ message-body ] +``` + +Example: + +```http +HTTP/1.1 200 OK +Content-Type: text/html; charset=utf-8 +Content-Length: 1234 +Cache-Control: max-age=3600 + +... +``` + +The reason-phrase (e.g., `OK`, `Not Found`) is purely informational — clients **MUST NOT** depend on its content. HTTP/2 and HTTP/3 removed it entirely. + +## Methods + +HTTP/1.1 defines a set of **request methods** that indicate the desired action on a resource: + +| Method | Safe | Idempotent | Purpose | +|--------|------|------------|---------| +| `GET` | Yes | Yes | Retrieve a representation of the resource. | +| `HEAD` | Yes | Yes | Same as `GET` but without the response body. Used to check headers/existence. | +| `POST` | No | No | Submit data to the resource. Often creates a new sub-resource or triggers processing. | +| `PUT` | No | Yes | Replace the target resource entirely with the request payload. | +| `DELETE` | No | Yes | Remove the target resource. | +| `PATCH` | No | No | Apply a partial modification to the resource (RFC 5789). | +| `OPTIONS` | Yes | Yes | Describe the communication options for the target resource. Used in CORS preflight. | +| `TRACE` | Yes | Yes | Echo back the received request. Useful for debugging proxies. Often disabled for security. | +| `CONNECT` | No | No | Establish a tunnel to the server, typically for HTTPS through a proxy. | + +### Safe vs Idempotent + +- **Safe** methods do not modify server state. A `GET` request should never create, update, or delete a resource. Caches and prefetchers rely on this guarantee. +- **Idempotent** methods produce the same result whether called once or many times. `PUT /user/1` with the same body always results in the same state. `POST` is not idempotent — calling it twice might create two resources. + +### Method Registration + +Methods are maintained in the [IANA HTTP Method Registry](https://www.iana.org/assignments/http-methods/http-methods.xhtml). Servers that receive an unrecognized method SHOULD respond with `501 Not Implemented`. If the method is recognized but not allowed for the target resource, the server responds with `405 Method Not Allowed` and a required `Allow` header listing permitted methods. + +## Status Codes + +Responses carry a three-digit **status code** grouped into five classes: + +| Range | Class | Meaning | +|-------|-------|---------| +| `1xx` | Informational | Request received, continuing process. | +| `2xx` | Successful | Request received, understood, and accepted. | +| `3xx` | Redirection | Further action needed to complete the request. | +| `4xx` | Client Error | Request contains bad syntax or cannot be fulfilled. | +| `5xx` | Server Error | Server failed to fulfill a valid request. | + +### 1xx — Informational + +| Code | Name | Usage | +|------|------|-------| +| `100` | Continue | Server has received the request headers and the client should proceed to send the body. Sent in response to `Expect: 100-continue`. | +| `101` | Switching Protocols | Server agrees to switch protocols via the `Upgrade` header (e.g., WebSocket). | + +### 2xx — Successful + +| Code | Name | Usage | +|------|------|-------| +| `200` | OK | Standard success response. Body contains the requested resource. | +| `201` | Created | Resource was successfully created. `Location` header points to the new resource. | +| `204` | No Content | Success, but no body to return (e.g., after a `DELETE`). | +| `206` | Partial Content | Range request fulfilled. Used for resumable downloads. | + +### 3xx — Redirection + +| Code | Name | Usage | +|------|------|-------| +| `301` | Moved Permanently | Resource has been permanently moved. Clients should update bookmarks. | +| `302` | Found | Temporary redirect. Original URL should still be used in the future. | +| `304` | Not Modified | Conditional request matched — the cached version is still valid. No body sent. | +| `307` | Temporary Redirect | Like 302, but the method and body MUST NOT change. | +| `308` | Permanent Redirect | Like 301, but the method and body MUST NOT change. | + +### 4xx — Client Error + +| Code | Name | Usage | +|------|------|-------| +| `400` | Bad Request | Malformed syntax. The server MUST return this for specific violations (missing Host, duplicate Host, space before colon, etc.). **This is what Http11Probe primarily tests.** | +| `401` | Unauthorized | Authentication required. Must include `WWW-Authenticate` header. | +| `403` | Forbidden | Server understood the request but refuses to fulfill it. | +| `404` | Not Found | Resource does not exist. | +| `405` | Method Not Allowed | Method is recognized but not supported for this resource. Must include `Allow` header. | +| `408` | Request Timeout | Server timed out waiting for the request. | +| `411` | Length Required | Server refuses the request without a `Content-Length`. | +| `413` | Content Too Large | Request body exceeds the server's limits. | +| `414` | URI Too Long | Request-target exceeds the server's limits. | +| `431` | Request Header Fields Too Large | Headers are too large. | + +### 5xx — Server Error + +| Code | Name | Usage | +|------|------|-------| +| `500` | Internal Server Error | Generic server failure. | +| `501` | Not Implemented | Server does not recognize the request method. | +| `502` | Bad Gateway | The server, acting as a gateway/proxy, received an invalid response from upstream. | +| `503` | Service Unavailable | Server is temporarily unable to handle the request (overloaded, maintenance). | +| `504` | Gateway Timeout | The server, acting as a gateway/proxy, did not receive a timely response from upstream. | +| `505` | HTTP Version Not Supported | The server does not support the HTTP version used in the request. | diff --git a/docs/content/docs/http-overview/what-is-http.md b/docs/content/docs/http-overview/what-is-http.md new file mode 100644 index 0000000..8eea642 --- /dev/null +++ b/docs/content/docs/http-overview/what-is-http.md @@ -0,0 +1,27 @@ +--- +title: What is HTTP? +description: "What HTTP is, its core characteristics, and the design goals behind the protocol." +weight: 1 +--- + +## Overview + +HTTP (HyperText Transfer Protocol) is an **application-layer, request/response protocol** for exchanging data between clients and servers. A client — a web browser, CLI tool, mobile app, or another service — sends a request message, and the server returns a response message. + +## Core Characteristics + +- **Client-server model** — one side initiates (the client), the other responds (the server). Roles are fixed for a given exchange. The client is always the party that opens the connection and sends the first message. +- **Stateless** — each request is independent. The server retains no memory of previous requests unless the application layer (cookies, sessions, tokens) adds state. This simplifies server implementation and enables horizontal scaling. +- **Text-based wire format (in HTTP/1.1)** — request lines, headers, and status lines are human-readable ASCII terminated by CRLF (`\r\n`). This makes the protocol easy to inspect and debug with tools like `curl`, `telnet`, or `netcat`. +- **Layered over a reliable transport** — HTTP/1.1 requires an ordered, reliable byte stream, almost always TCP. TLS may be layered between TCP and HTTP to provide encryption (HTTPS). + +## Design Goals + +HTTP was designed as a **universal interface for web resources**: + +- **Human-readable messages** — developers can craft and read raw requests by hand, making debugging straightforward. You can literally `telnet` to a server and type a valid request. +- **Extensibility via headers** — new capabilities (authentication, caching, content negotiation, security policies) are added through headers without changing the core protocol grammar. This is how HTTP has evolved for over 30 years without breaking backward compatibility. +- **Content negotiation** — clients express preferences for language (`Accept-Language`), encoding (`Accept-Encoding`), and media type (`Accept`), and servers select the best matching representation. A single URL can serve HTML to a browser and JSON to an API client. +- **Support for intermediaries** — proxies, caches, CDNs, gateways, and load balancers can inspect, transform, cache, and forward messages because the format is well-defined and semantically layered. The protocol was explicitly designed with intermediaries in mind. +- **Method semantics** — standardized methods (`GET`, `POST`, `PUT`, `DELETE`, etc.) give shared meaning to operations, enabling generic tooling and middleware. A cache knows `GET` is safe to cache; a proxy knows `CONNECT` means tunnel. +- **Resource-oriented** — every interaction targets a **resource** identified by a URI. This abstraction decouples the client from server implementation details — the resource might be a file, a database row, a computed result, or a proxy to another service. diff --git a/docs/content/docs/line-endings/_index.md b/docs/content/docs/line-endings/_index.md index 75b25a4..71ba5eb 100644 --- a/docs/content/docs/line-endings/_index.md +++ b/docs/content/docs/line-endings/_index.md @@ -1,7 +1,7 @@ --- title: Line Endings description: "Line Endings — Http11Probe documentation" -weight: 2 +weight: 4 sidebar: open: false --- diff --git a/docs/content/docs/malformed-input/_index.md b/docs/content/docs/malformed-input/_index.md index dfc80ad..83fcf73 100644 --- a/docs/content/docs/malformed-input/_index.md +++ b/docs/content/docs/malformed-input/_index.md @@ -1,7 +1,7 @@ --- title: Malformed Input description: "Malformed Input — Http11Probe documentation" -weight: 8 +weight: 11 sidebar: open: false --- diff --git a/docs/content/docs/request-line/_index.md b/docs/content/docs/request-line/_index.md index ee5e2c8..8f3a3e7 100644 --- a/docs/content/docs/request-line/_index.md +++ b/docs/content/docs/request-line/_index.md @@ -1,7 +1,7 @@ --- title: Request Line description: "Request Line — Http11Probe documentation" -weight: 3 +weight: 5 sidebar: open: false --- diff --git a/docs/content/docs/smuggling/_index.md b/docs/content/docs/smuggling/_index.md index b10f48d..8b6d85b 100644 --- a/docs/content/docs/smuggling/_index.md +++ b/docs/content/docs/smuggling/_index.md @@ -1,7 +1,7 @@ --- title: Request Smuggling description: "Request Smuggling — Http11Probe documentation" -weight: 7 +weight: 10 sidebar: open: false --- diff --git a/docs/content/docs/upgrade/_index.md b/docs/content/docs/upgrade/_index.md index 68d7895..5d3d45e 100644 --- a/docs/content/docs/upgrade/_index.md +++ b/docs/content/docs/upgrade/_index.md @@ -1,7 +1,7 @@ --- title: Upgrade / WebSocket description: "Upgrade / WebSocket — Http11Probe documentation" -weight: 9 +weight: 12 sidebar: open: false --- diff --git a/docs/content/probe-results/_index.md b/docs/content/probe-results/_index.md index c64e706..667460c 100644 --- a/docs/content/probe-results/_index.md +++ b/docs/content/probe-results/_index.md @@ -13,7 +13,7 @@ HTTP/1.1 compliance comparison across frameworks. Each test sends a specific mal ## Summary {{< callout type="info" >}} -These results are from CI runs (`ubuntu-latest`). Click a **server name** to view its Dockerfile and source code. Click on the **Compliance**, **Smuggling**, or **Malformed Input** tabs above for detailed results per category, where you can click any **result cell** to see the full HTTP request and response. +These results are from CI runs (`ubuntu-latest`). Click a **server name** to view its Dockerfile and source code. Click on the **Compliance**, **Smuggling**, **Malformed Input**, or **Normalization** tabs above for detailed results per category, where you can click any **result cell** to see the full HTTP request and response. {{< /callout >}}
diff --git a/docs/hugo.yaml b/docs/hugo.yaml index db64c5a..c153c98 100644 --- a/docs/hugo.yaml +++ b/docs/hugo.yaml @@ -15,39 +15,53 @@ markup: menu: main: - - name: Probe Results + - name: Leaderboards pageRef: /probe-results weight: 1 + - name: Simple Tests + weight: 2 - name: Compliance + parent: Simple Tests pageRef: /compliance - weight: 2 + weight: 1 - name: Smuggling + parent: Simple Tests pageRef: /smuggling - weight: 3 + weight: 2 - name: Malformed Input + parent: Simple Tests pageRef: /malformed-input + weight: 3 + - name: Normalization + parent: Simple Tests + pageRef: /normalization weight: 4 + - name: Sequence Tests + weight: 3 + - name: Coming Soon + parent: Sequence Tests + weight: 1 - name: Glossary pageRef: /docs - weight: 5 + weight: 4 - name: Add a Framework pageRef: /add-a-framework - weight: 6 + weight: 5 - name: Search - weight: 7 + weight: 6 params: type: search - name: Theme Toggle - weight: 8 + weight: 7 params: type: theme-toggle - name: Discord - weight: 9 + weight: 8 url: https://discord.gg/H84B5ZqDXR params: icon: discord - name: GitHub - weight: 10 + weight: 9 url: https://github.com/MDA2AV/Http11Probe params: icon: github