Search engine for RoboCup Team Description Papers (TDPs). Hybrid dense + sparse search over 2000+ papers across all RoboCup leagues.
Live at tdpsearch.com.
Rust workspace with the following crates:
| Crate | Type | Description |
|---|---|---|
web |
Binary | Axum HTTP API server |
mcp |
Binary | MCP (Model Context Protocol) server for LLM integration |
frontend |
SvelteKit | Web UI with Tailwind CSS |
api |
Library | Shared business logic (search, list, filter) |
data_access |
Library | Trait-based clients: Qdrant, SQLite, OpenAI |
data_processing |
Library | Chunking, embedding, IDF, search orchestration |
data_structures |
Library | Shared types (TDPName, League, Chunk, ContentItem, Filter) |
configuration |
Library | Config loading and client initialization |
tools |
Binaries | CLI tools for initialization, search, and analytics |
- Rust (edition 2024)
- Docker (for Qdrant)
- Node.js 22+ (for frontend)
-
Start Qdrant vector database:
make qdrant-restart -
Create
config.tomlfrom the example:cp config.toml.example config.tomlFill in your OpenAI API key and the path to your TDP markdown files.
Note:
embedding_sizemust match the embed model's output dimension. If you change models, re-runmake initto rebuild the Qdrant collection — mismatches cause silent failures. -
Initialize the database (parse TDPs, compute embeddings, build IDF):
make init -
Start the web server and frontend:
make web # API server on :50000 make ui # SvelteKit dev server on :50000Note:
make webrunscargo run -p web, which serves the built frontend from./static/. If you run it directly withoutmake ui, create a symlink first:ln -s frontend/build static.
make docker # build and start all services
make docker-logs # follow logs
make docker-down # stop
Before building Docker images, create your configuration files:
# Create Docker config from example
cp config.docker.toml.example config.docker.toml
# Edit config.docker.toml and add your OpenAI API key (or leave empty and use env vars)
# Create .env file for runtime overrides
cp .env.example .env
# Edit .env and configure your API keysDocker images bake in config.docker.toml as the default config. Settings can be overridden at runtime via environment variables using the TDP_ prefix and __ (double underscore) as a separator for nested keys.
For example, to set the OpenAI API key:
TDP_DATA_ACCESS__EMBED__OPENAI__API_KEY=sk-proj-...
There are two ways to pass environment variables to the containers:
-
env_filedirective indocker-compose.yml— addenv_file: .envto a service to inject all variables from.envdirectly into the container. -
environmentblock with interpolation — reference host/.envvariables using${VAR}syntax in docker-compose.yml. Note: Docker Compose auto-loads.envonly for${...}interpolation within the compose file itself, it does not automatically pass.envvariables into containers.
To get started, copy the example env file and fill in your values:
cp .env.example .env
The mcp and web services use depends_on with a health check on Qdrant's /healthz endpoint. They will not start until Qdrant is ready to accept connections. If you need to rebuild after changing config.docker.toml, run docker compose up --build since the config is copied into the image at build time.
All CLI tools live in the tools crate and are run via cargo run -p tools --bin <name>.
Parses TDP markdown files, computes embeddings, builds IDF, and upserts everything into Qdrant + SQLite.
make init
# or: cargo run --release -p tools --bin initialize
End-to-end verification: searches every (league, year) combination across all three search types (sparse, dense, hybrid) against a live Qdrant instance. Run after reindexing to catch filter mismatches or embedding alignment issues.
make smoke-test
Interactive search REPL for testing queries from the terminal.
cargo run -p tools --bin repl
Runs a predefined set of sentence-level searches for benchmarking/testing.
cargo run -p tools --bin search_by_sentence
Query the activity log database for usage reports and scraper detection.
cargo run -p tools --bin activity -- summary # event counts by type/source, top queries
cargo run -p tools --bin activity -- summary --since 2025-06-01
cargo run -p tools --bin activity -- recent # last 20 events
cargo run -p tools --bin activity -- recent --limit 50
cargo run -p tools --bin activity -- agents # user-agent and IP breakdown
cargo run -p tools --bin activity -- agents --since 2025-06-01
Or via Make:
make activity ARGS="summary"
make activity ARGS="agents --since 2025-06-01"
Services:
| Target | Description |
|---|---|
make web |
Start the Axum API server on :50000 |
make mcp |
Start the MCP servers (:50001 open, :50002 OAuth) |
make ui |
Start the SvelteKit dev server on :50000 |
Tools:
| Target | Description |
|---|---|
make init |
Initialize database (parse, embed, index) |
make smoke-test |
End-to-end search verification across all leagues/years |
make repl |
Interactive search REPL |
make search "query" |
Search for a query |
make activity ARGS="..." |
Run the activity analytics CLI |
Infrastructure:
| Target | Description |
|---|---|
make qdrant-restart |
Restart Qdrant Docker container |
make qdrant-snapshot |
Create Qdrant snapshot for Docker image |
make rebuild-index |
Full teardown → reindex → snapshot → Docker rebuild |
make docker |
Build and start all services via Docker Compose |
make docker-logs |
Follow Docker Compose logs |
make docker-down |
Stop Docker Compose |
make leagues |
Quick API test: list all leagues |
The MCP server exposes TDP search functionality to LLMs. Available tools:
search- Hybrid semantic + keyword search across all TDPslist_papers- List papers with optional league/year/team filterslist_teams- List team names with optional hint filterlist_leagues- List all RoboCup leagueslist_years- List years with optional league/year/team filtersget_tdp_contents- Retrieve full markdown of a specific paperget_table_of_contents- Get the structured table of contents of a paperget_abstract- Get a paper's abstractget_section- Get a specific section by content sequence numberget_paper_info- Get paper metadata (team, league, year, authors)get_team_info- Get team metadata (website, GitHub, socials)
cargo run -p mcp
All interactions (searches, paper opens, list operations) are logged to data/activity.db from both Web and MCP sources. HTTP requests from the web server also capture IP and user-agent for scraper detection.
Configure in config.toml:
[event_processing.activity.sqlite]
filename = "data/activity.db"