This project is a multithreaded HTTP web proxy implemented in C with a custom LRU cache for responses and a HTTP request parsing library. The proxy accepts HTTP requests from clients (e.g., browsers or curl), forwards them to the remote server, relays the response back to the client, and optionally stores the response in an in‑memory cache so that repeated requests are served faster.
The core logic lives in:
- proxy_server_with_cache.c – proxy server, threading, networking, caching, error handling
- proxy_parse.h – interface for the HTTP request parsing library used by the proxy
- Acts as an HTTP proxy for HTTP/1.0 and HTTP/1.1 requests.
- Supports only the
GETmethod (other methods are rejected / not processed). - Forwards requests to remote servers (e.g.,
google.com) and streams back responses. - Response caching with LRU eviction:
- Cache entries are keyed by the full request string.
- Cache elements are bounded by
MAX_ELEMENT_SIZE. - Global cache size is bounded by
MAX_SIZE. - LRU (Least Recently Used) eviction when the cache is full.
- Multi‑threaded concurrency:
- Each client connection is handled in a separate thread.
MAX_CLIENTScontrols the maximum concurrent active clients.- A semaphore limits concurrent workers; a mutex protects shared cache state.
- Basic HTTP error responses generated by the proxy itself for invalid/unsupported cases.
-
- Declares the
struct ParsedRequestandstruct ParsedHeadertypes. - Declares functions to:
- Create / destroy a parsed request:
ParsedRequest_create,ParsedRequest_destroy. - Parse a raw HTTP request string:
ParsedRequest_parse. - Reconstruct ("unparse") a request or just its headers:
ParsedRequest_unparse,ParsedRequest_unparse_headers. - Manage headers (set/get/remove):
ParsedHeader_set,ParsedHeader_get,ParsedHeader_remove.
- Create / destroy a parsed request:
- Provides a documented example of how to parse and manipulate headers programmatically.
- Declares the
-
- Includes system headers for POSIX networking (
socket,bind,listen,accept,connect,recv,send), threading (pthread), semaphores, and time utilities. - Implements:
- Socket server setup on a configurable port.
- Connection handling via worker threads (
pthread_create). - Client request parsing using the
proxy_parselibrary. - Request normalization (ensuring
HostandConnection: closeheaders exist). - Connection to the remote web server and streaming of the HTTP response.
- Caching of responses using an LRU policy.
- Basic HTTP error response generation.
- Includes system headers for POSIX networking (
Key constants in proxy_server_with_cache.c:
MAX_CLIENTS– upper bound on simultaneous client connections (also initial semaphore value).MAX_BYTES– buffer size used for reading/writing data across sockets.MAX_ELEMENT_SIZE– maximum allowed size for a single cache element.MAX_SIZE– maximum total size of all cached elements combined.
Startup flow in main:
- Initialize a semaphore
semaphorewith valueMAX_CLIENTS. - Initialize a mutex
lockfor synchronizing access to the global cache. - Read the proxy listening port from the command line (default logic expects one argument; the proxy listens on that port).
- Create a TCP socket (
proxy_socketId). - Set
SO_REUSEADDRon the socket so that it can be rebound quickly. - Bind the socket to
INADDR_ANYon the chosen port. - Call
listen(proxy_socketId, MAX_CLIENTS)to start listening. - Enter an infinite loop where:
acceptwaits for new client connections.- For each accepted client socket, a new thread is created using
pthread_create, runningthread_fn.
Each client connection is handled by thread_fn:
-
Concurrency control using semaphore:
sem_wait(&semaphore)decrements the semaphore and blocks if the maximum number of active clients has been reached.
-
Read the full HTTP request:
- Allocate a buffer of size
MAX_BYTESand read from the client usingrecv. - Continue reading until the end of HTTP headers (
"\r\n\r\n") or the buffer is full.
- Allocate a buffer of size
-
Clone the raw request string:
- A copy of the entire incoming request (
tempReq) is created. - This copy is later used as the cache key.
- A copy of the entire incoming request (
-
Check cache first:
- Call
find(tempReq)to look up a matching cached response. - If found, stream the cached
databack to the client in chunks ofMAX_BYTESuntil the full response is sent. - Update the element’s LRU timestamp (inside
find).
- Call
-
If not cached, parse and forward:
- Use
ParsedRequest_createandParsedRequest_parseto parse the raw HTTP request. - Only
GETrequests are supported:- If
request->methodis not"GET", the proxy prints a message and does not forward.
- If
- Validate:
request->hostexists.request->pathexists.- HTTP version is
HTTP/1.0orHTTP/1.1via thecheckHTTPVersionhelper.
- Call
handle_requestto forward the request to the remote server and relay the response. - If
handle_requestfails,sendErrorMessageis used to return an HTTP error to the client.
- Use
-
Cleanup:
- Destroy the parsed request (
ParsedRequest_destroy). - Shutdown and close the client socket (
shutdown+close). - Free request buffers and
tempReq. sem_post(&semaphore)increments the semaphore, allowing another client to be handled.
- Destroy the parsed request (
The handle_request function is responsible for transforming the client’s request and talking to the real server:
-
Build a normalized request line + headers:
- Start with
"GET", therequest->path, andrequest->version, followed by"\r\n". - Ensure that the
Connectionheader is set to"close"usingParsedHeader_set. - Ensure that the
Hostheader exists; if not, set it torequest->host. - Use
ParsedRequest_unparse_headersto serialize only the headers into the same buffer, appending them to the request line. - Final buffer structure is:
GET /path HTTP/1.1\r\n Host: example.com\r\n Connection: close\r\n ...other headers...\r\n \r\n
- Start with
-
Determine upstream server port:
- Default is
80. - If
request->portis non‑NULL, convert it usingatoiand use that port instead.
- Default is
-
Connect to the remote server:
- Use
connectRemoteServer(request->host, server_port)to open a TCP connection. - This helper:
- Resolves the host name via
gethostbyname. - Fills a
sockaddr_instructure. - Calls
connectand returns the socket descriptor on success.
- Resolves the host name via
- Use
-
Send request and stream response:
sendthe fully constructed HTTP request to the remote server.- Repeatedly
recvresponse chunks from the remote server intobuff. - For each chunk:
- Immediately
sendit to the client. - Append it into a dynamically growing buffer (
temp_buffer) used to accumulate the full response for caching.
- Immediately
- When
recvreturns0or negative, stop reading.
-
Cache the response:
- Null‑terminate
temp_buffer. - Call
add_cache_element(temp_buffer, strlen(temp_buffer), tempReq):tempReqis the original raw request string, used as the cache key.datapoints to the full HTTP response as received from the remote server.
- Free temporary buffers and close the remote server socket.
- Null‑terminate
sendErrorMessage(int socket, int status_code) builds and sends HTML error responses generated entirely by the proxy. It supports:
400 Bad Request403 Forbidden404 Not Found500 Internal Server Error501 Not Implemented505 HTTP Version Not Supported
Each response includes:
- An appropriate
HTTP/1.1status line. Content-Length,Content-Type: text/html, andConnection: keep-aliveheaders.- A
Dateheader formatted withgmtimeandstrftime. - A simple HTML body describing the error.
This function is called when parsing fails, when unsupported methods are used, or when forwarding fails, depending on the logic in thread_fn and handle_request.
The cache is built around the cache_element struct:
typedef struct cache_element {
char *data; // Full HTTP response
int len; // Length in bytes of data
char *url; // Full request string used as the key
time_t lru_time_track; // Last access time, used for LRU
struct cache_element *next;
} cache_element;
``
Global state in [proxy_server_with_cache.c](proxy_server_with_cache.c):
- `cache_element *head;` – head of a singly linked list of cache entries.
- `int cache_size;` – total size (in bytes) of all cache elements.
- `pthread_mutex_t lock;` – protects access to `head` and `cache_size`.
#### 5.1 Cache Lookup: `find`
- Locks the mutex with `pthread_mutex_lock(&lock)`.
- Traverses the linked list starting from `head`.
- Compares each element’s `url` with the requested `url` using `strcmp`.
- If a match is found:
- Prints debug information.
- Updates `lru_time_track` to the current time (`time(NULL)`) to mark it as recently used.
- Returns the `cache_element *`.
- Unlocks the mutex before returning.
#### 5.2 Cache Eviction: `remove_cache_element`
- Locks the mutex.
- If the cache is non‑empty:
- Iterates over the list to find the element with the **smallest** `lru_time_track` (oldest use).
- Maintains pointers:
- `temp` – current best candidate for eviction.
- `p` – node just before `temp`.
- Removes `temp` from the list:
- If `temp` is the `head`, move `head` to `head->next`.
- Otherwise, set `p->next = temp->next`.
- Decrements `cache_size` by the size of the evicted element:
- Subtract `temp->len` (response size).
- Subtract `sizeof(cache_element)` and `strlen(temp->url) + 1` for metadata and key.
- Frees `temp->data`, `temp->url`, and `temp` itself.
- Unlocks the mutex.
#### 5.3 Cache Insert: `add_cache_element`
- Locks the mutex.
- Computes `element_size = size + 1 + strlen(url) + sizeof(cache_element)`.
- If `element_size > MAX_ELEMENT_SIZE`:
- Unlocks and returns without caching (element too big).
- Otherwise, while `cache_size + element_size > MAX_SIZE`:
- Call `remove_cache_element()` until there is enough space.
- Allocate a new `cache_element` and its `data` and `url` buffers.
- Copy the response into `data` and the key into `url`.
- Set `lru_time_track = time(NULL)`.
- Insert the new element at the head of the list: `element->next = head; head = element;`.
- Increment `cache_size` by `element_size`.
- Unlock the mutex.
This design ensures:
- Cache entries are **bounded per element** and **bounded globally**.
- Frequently requested resources stay in the cache.
- Oldest, least recently used responses are evicted first.
### 6. HTTP Request Parsing Library (`proxy_parse.h`)
`proxy_parse.h` defines the abstraction used for parsing and manipulating HTTP requests:
- `struct ParsedRequest` holds:
- `method`, `protocol`, `host`, `port`, `path`, `version`.
- A buffer and length for the raw request line.
- A dynamic array/list of `ParsedHeader` entries.
- `struct ParsedHeader` represents one HTTP header as a `key: value` pair.
Key functions used by the proxy:
- `ParsedRequest_create` – allocate and initialize a `ParsedRequest`.
- `ParsedRequest_parse` – parse a raw request buffer into fields and headers.
- `ParsedHeader_set` – ensure headers like `Host` and `Connection` have desired values.
- `ParsedHeader_get` – check if a particular header (e.g., `Host`) exists.
- `ParsedRequest_unparse_headers` – convert headers back into wire format, appended to the request line built in `handle_request`.
The example in the header shows how these functions work together; the proxy uses them in a similar pattern but tailored to forwarding requests.
---
## Building the Proxy
This code is written for a **POSIX environment** (Linux/Unix/macOS). On Windows you are expected to use something like **WSL** or a POSIX‑compatible toolchain (e.g., MinGW with appropriate adjustments) because it depends on headers like `<unistd.h>`, `<netinet/in.h>`, `<arpa/inet.h>`, and `<pthread.h>`.
Assuming you have a `proxy_parse.c` implementation available, a typical build command with `gcc` would look like:
```bash
gcc -Wall -O2 -pthread -o webproxy \
proxy_server_with_cache.c proxy_parse.cIf the parsing library is provided as a precompiled object file or static library, adjust the command accordingly (e.g., link against -lproxyparse).
Run the compiled proxy with a port number:
./webproxy 8080- The proxy will start and listen on port
8080(or the port you pass as argument). - It prints messages about binding and each connected client, including the client’s IP address and port.
To test with curl using the proxy:
curl -x http://localhost:8080 http://example.com/- First request to a URL: fetched from the remote server, response cached.
- Second identical request: should be served from cache (you’ll see debug messages indicating cache hits).
You can also configure your browser’s HTTP proxy settings to point to localhost:8080 and browse regular HTTP sites through it (HTTPS via CONNECT is not implemented).
- Method support: Only
GETis supported. - Protocol support: Designed for HTTP/1.0 and HTTP/1.1.
- HTTPS / CONNECT not supported: The proxy does not implement tunneling for HTTPS.
- No persistent connections to upstream: Requests are sent with
Connection: closeand each remote server connection is closed after the response. - Parsing library dependency: Requires a
proxy_parseimplementation matching proxy_parse.h. - No full header/body parsing on responses: Responses are treated as opaque byte streams and cached as‑is.
- Basic error messages: Error handling is straightforward and mainly used when parsing or network operations fail.
Some natural next steps if you want to grow this project further:
- Add support for additional HTTP methods such as
HEADandPOST. - Implement HTTPS proxying using the
CONNECTmethod. - Add more robust parsing and validation of both requests and responses.
- Implement configurable cache policies (e.g., using
Cache-ControlorExpiresheaders). - Implement logging to files with timestamps and request/response metadata.
- Add command‑line flags for cache size, element size, and maximum clients.
This README reflects the full design and behavior implied by the current code: a multi‑threaded HTTP/1.x web proxy with an in‑memory LRU cache, implemented with raw sockets, POSIX threads, and a custom HTTP parsing library.