TDP Search

Search engine for RoboCup Team Description Papers (TDPs). Hybrid dense + sparse search over 2000+ papers across all RoboCup leagues.

Live at tdpsearch.com.

Architecture

Rust workspace with the following crates:

Crate	Type	Description
`web`	Binary	Axum HTTP API server
`mcp`	Binary	MCP (Model Context Protocol) server for LLM integration
`frontend`	SvelteKit	Web UI with Tailwind CSS
`api`	Library	Shared business logic (search, list, filter)
`data_access`	Library	Trait-based clients: Qdrant, SQLite, OpenAI
`data_processing`	Library	Chunking, embedding, IDF, search orchestration
`data_structures`	Library	Shared types (TDPName, League, Chunk, ContentItem, Filter)
`configuration`	Library	Config loading and client initialization
`tools`	Binaries	CLI tools for initialization, search, and analytics

Getting Started

Prerequisites

Rust (edition 2024)
Docker (for Qdrant)
Node.js 22+ (for frontend)

Setup

Start Qdrant vector database:
```
make qdrant-restart
```
Create config.toml from the example:
```
cp config.toml.example config.toml
```
Fill in your OpenAI API key and the path to your TDP markdown files.

Note: embedding_size must match the embed model's output dimension. If you change models, re-run make init to rebuild the Qdrant collection — mismatches cause silent failures.
Initialize the database (parse TDPs, compute embeddings, build IDF):
```
make init
```
Start the web server and frontend:
```
make web   # API server on :50000
make ui    # SvelteKit dev server on :50000
```
Note: make web runs cargo run -p web, which serves the built frontend from ./static/. If you run it directly without make ui, create a symlink first: ln -s frontend/build static.

Docker

make docker       # build and start all services
make docker-logs  # follow logs
make docker-down  # stop

Setup

Before building Docker images, create your configuration files:

# Create Docker config from example
cp config.docker.toml.example config.docker.toml
# Edit config.docker.toml and add your OpenAI API key (or leave empty and use env vars)

# Create .env file for runtime overrides
cp .env.example .env
# Edit .env and configure your API keys

Configuration

Docker images bake in config.docker.toml as the default config. Settings can be overridden at runtime via environment variables using the TDP_ prefix and __ (double underscore) as a separator for nested keys.

For example, to set the OpenAI API key:

TDP_DATA_ACCESS__EMBED__OPENAI__API_KEY=sk-proj-...

There are two ways to pass environment variables to the containers:

env_file directive in docker-compose.yml — add env_file: .env to a service to inject all variables from .env directly into the container.
environment block with interpolation — reference host/.env variables using ${VAR} syntax in docker-compose.yml. Note: Docker Compose auto-loads .env only for ${...} interpolation within the compose file itself, it does not automatically pass .env variables into containers.

To get started, copy the example env file and fill in your values:

cp .env.example .env

Startup order

The mcp and web services use depends_on with a health check on Qdrant's /healthz endpoint. They will not start until Qdrant is ready to accept connections. If you need to rebuild after changing config.docker.toml, run docker compose up --build since the config is copied into the image at build time.

CLI Tools

All CLI tools live in the tools crate and are run via cargo run -p tools --bin <name>.

initialize

Parses TDP markdown files, computes embeddings, builds IDF, and upserts everything into Qdrant + SQLite.

make init
# or: cargo run --release -p tools --bin initialize

smoke_test

End-to-end verification: searches every (league, year) combination across all three search types (sparse, dense, hybrid) against a live Qdrant instance. Run after reindexing to catch filter mismatches or embedding alignment issues.

make smoke-test

repl

Interactive search REPL for testing queries from the terminal.

cargo run -p tools --bin repl

search_by_sentence

Runs a predefined set of sentence-level searches for benchmarking/testing.

cargo run -p tools --bin search_by_sentence

activity

Query the activity log database for usage reports and scraper detection.

cargo run -p tools --bin activity -- summary              # event counts by type/source, top queries
cargo run -p tools --bin activity -- summary --since 2025-06-01
cargo run -p tools --bin activity -- recent               # last 20 events
cargo run -p tools --bin activity -- recent --limit 50
cargo run -p tools --bin activity -- agents               # user-agent and IP breakdown
cargo run -p tools --bin activity -- agents --since 2025-06-01

Or via Make:

make activity ARGS="summary"
make activity ARGS="agents --since 2025-06-01"

Makefile Targets

Services:

Target	Description
`make web`	Start the Axum API server on :50000
`make mcp`	Start the MCP servers (:50001 open, :50002 OAuth)
`make ui`	Start the SvelteKit dev server on :50000

Tools:

Target	Description
`make init`	Initialize database (parse, embed, index)
`make smoke-test`	End-to-end search verification across all leagues/years
`make repl`	Interactive search REPL
`make search "query"`	Search for a query
`make activity ARGS="..."`	Run the activity analytics CLI

Infrastructure:

Target	Description
`make qdrant-restart`	Restart Qdrant Docker container
`make qdrant-snapshot`	Create Qdrant snapshot for Docker image
`make rebuild-index`	Full teardown → reindex → snapshot → Docker rebuild
`make docker`	Build and start all services via Docker Compose
`make docker-logs`	Follow Docker Compose logs
`make docker-down`	Stop Docker Compose
`make leagues`	Quick API test: list all leagues

MCP Server

The MCP server exposes TDP search functionality to LLMs. Available tools:

search - Hybrid semantic + keyword search across all TDPs
list_papers - List papers with optional league/year/team filters
list_teams - List team names with optional hint filter
list_leagues - List all RoboCup leagues
list_years - List years with optional league/year/team filters
get_tdp_contents - Retrieve full markdown of a specific paper
get_table_of_contents - Get the structured table of contents of a paper
get_abstract - Get a paper's abstract
get_section - Get a specific section by content sequence number
get_paper_info - Get paper metadata (team, league, year, authors)
get_team_info - Get team metadata (website, GitHub, socials)

cargo run -p mcp

Activity Logging

All interactions (searches, paper opens, list operations) are logged to data/activity.db from both Web and MCP sources. HTTP requests from the web server also capture IP and user-agent for scraper detection.

Configure in config.toml:

[event_processing.activity.sqlite]
filename = "data/activity.db"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TDP Search

Architecture

Getting Started

Prerequisites

Setup

Docker

Setup

Configuration

Startup order

CLI Tools

initialize

smoke_test

repl

search_by_sentence

activity

Makefile Targets

MCP Server

Activity Logging

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 289 Commits
.claude/agent-memory/ux-discovery-strategist		.claude/agent-memory/ux-discovery-strategist
api		api
configuration		configuration
data_access		data_access
data_processing		data_processing
data_structures		data_structures
docs		docs
event_processing		event_processing
frontend		frontend
mcp		mcp
scripts		scripts
tools		tools
web		web
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
config.docker.toml.example		config.docker.toml.example
config.toml.example		config.toml.example
docker-compose.test.yml		docker-compose.test.yml
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json

Folders and files

Latest commit

History

Repository files navigation

TDP Search

Architecture

Getting Started

Prerequisites

Setup

Docker

Setup

Configuration

Startup order

CLI Tools

initialize

smoke_test

repl

search_by_sentence

activity

Makefile Targets

MCP Server

Activity Logging

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages