This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Honcho is an infrastructure layer for building AI agents with memory and social cognition. Its primary purposes include:
- Imbuing agents with a sense of identity
- Personalizing user experiences through understanding user psychology
- Providing a Dialectic API that injects personal context just-in-time
- Supporting development of LLM-powered applications that adapt to end users
- Enabling multi-peer sessions where multiple participants (users or agents) can interact
Honcho leverages the inherent reasoning capabilities of LLMs to build coherent models of user psychology over time, enabling more personalized and effective AI interactions.
Honcho uses a peer-based model where both users and agents are represented as "peers". This unified approach enables:
- Multi-participant sessions with mixed human and AI agents
- Configurable observation settings (which peers observe which others)
- Flexible identity management for all participants
- Workspace (formerly App): The root organizational unit containing all resources
- Peer (formerly User): Any participant in the system (human or AI)
- Session: A conversation context that can involve multiple peers
- Message: Data units that can represent communication between peers OR arbitrary data ingested by a peer to enhance its global representation
- Collections & Documents: Internal vector storage for peer representations (not exposed via API)
All API routes follow the pattern: /v1/{resource}/{id}/{action}
- Workspaces: Create, list, update, search
- Peers: Create, list, update, chat (dialectic), messages, representation
- Sessions: Create, list, update, delete, clone, manage peers, get context
- Messages: Create (batch up to 100), list, get, update
- Keys: Create scoped JWTs
- Provides bespoke responses informed by the representation
- Integrates long-term facts from vector storage
- Supports streaming responses
- Configurable LLM providers
- Messages created via API (batch or single)
- Enqueued for background processing:
representation: Update peer's contextsummary: Create session summaries
- Session-based queue processing ensures order
- Results stored internally in vector DB
- Hierarchical config: config.toml + environment variables
- Database settings with connection pooling
- Multiple LLM provider support
- Background worker (deriver) settings
- Authentication can be toggled on/off
- Setup:
uv sync - Run server:
uv run fastapi dev src/main.py - Run tests:
uv run pytest tests/ - Run single test:
uv run pytest tests/path/to/test_file.py::test_function - Linting:
uv run ruff check src/ - Typechecking:
uv run basedpyright - Format code:
uv run ruff format src/
The legacy cf and custom provider tags are gone. Transport is Literal["anthropic", "openai", "gemini"] only — see src/llm/registry.py. Per-component routing happens via <COMPONENT>_MODEL_CONFIG__* env vars (Pydantic settings with env_nested_delimiter="__").
- CF Gateway integration is app-level now, not deployment-level.
src/llm/registry.pyandsrc/embedding_client.pyauto-injectcf-aig-authorization: Bearer $LLM_CF_GATEWAY_AUTH_TOKENon any override client whosebase_urlcontainsgateway.ai.cloudflare.com. SetLLM_CF_GATEWAY_AUTH_TOKENonce globally; the rest is per-componentOVERRIDES__BASE_URL. - Native Gemini works for json_schema. The new
GeminiBackend(src/llm/backends/gemini.py) talks Gemini's native protocol —response_format=json_schemais honored server-side. Route through CF Gateway withbase_url: https://gateway.ai.cloudflare.com/v1/<acct>/<gw>/google-ai-studio(note: NO/openaisuffix — that path was the old OpenAI-compat shim that silently dropped json_schema, deriver/summary used to need workarounds for it). - Native Gemini also fixes
thoughtSignatureround-tripping —src/llm/history_adapters.py:77-78andsrc/llm/executor.py:43-44preserve it across tool iterations. The old "setthinkingBudgetTokens=0for multi-iter tool loops" workaround is no longer needed. - Ollama Cloud routing:
transport: openai+base_url: https://gateway.ai.cloudflare.com/v1/<acct>/<gw>/custom-ollama. Pass the Ollama Cloud key viaMODEL_CONFIG__OVERRIDES__API_KEY_ENV: <env_var_name>so the secret is referenced not duplicated. Note that_uses_max_completion_tokens()insrc/llm/backends/openai.py:21only fires for gpt-5/o-series models — Ollama Cloud chat models stay onmax_tokens. response_format=json_schemastill doesn't work over Ollama Cloud's OpenAI-compat layer. Free-form / tool-call paths are fine; structured-output paths must use a transport whose upstream honors schemas (anthropic, openai/gpt-5+, or gemini-native).- CF AI Gateway remains a transparent proxy. Limitations are upstream-side; the
cf-aig-authorizationheader is the only CF-specific concern in app code.
- Honcho can use LM Studio via
transport: openai+MODEL_CONFIG__OVERRIDES__BASE_URL: http://localhost:1234/v1. - Keep
LLM_OPENAI_API_KEYconfigured for embeddings unless embedding support is added for local models. - For Docker Compose, the per-component
MODEL_CONFIG__OVERRIDES__BASE_URLmust behttp://host.docker.internal:1234/v1, nothttp://localhost:1234/v1. - Pass
MODEL_CONFIG__OVERRIDES__API_KEY: lm-studio(or any non-empty placeholder); LM Studio doesn't validate it. - Current local default model is
qwen2.5-14b-instruct. - When overriding
DIALECTIC_LEVELS__*via env vars, each level needs its full required settings, not justMODEL_CONFIG__TRANSPORTand__MODEL. Include__THINKING_BUDGET_TOKENSandMAX_TOOL_ITERATIONS, and optionallyMAX_OUTPUT_TOKENS. For backups, use the nested__MODEL_CONFIG__FALLBACK__TRANSPORT/__MODELshape. - Docker should own the runtime environment completely. Do not mount the repo onto
/appand do not mount a named volume onto/app/.venv, or the image-built environment can be hidden and replaced with incompatible artifacts. - If Docker services fail with missing Python modules or incompatible native extensions, rebuild the image instead of trying to repair the environment in-place:
docker compose build --no-cache api deriver
docker compose up -d --force-recreate api deriver- Verify LM Studio from the host with:
curl -sS http://localhost:1234/v1/models- Verify LM Studio from Docker with:
docker compose run --rm --entrypoint sh api -lc 'python - <<\"PY\"
import urllib.request
print(urllib.request.urlopen(\"http://host.docker.internal:1234/v1/models\", timeout=5).status)
PY'🚨 DO NOT RUN bun test DIRECTLY. IT WILL NOT WORK. 🚨
The TypeScript SDK tests require a running Honcho server with database and Redis. Running bun test alone will fail immediately because there's no server. The tests are orchestrated via pytest which handles all the infrastructure setup.
The ONLY way to run TypeScript SDK tests:
# From the monorepo root (not from sdks/typescript/)
uv run pytest tests/ -k typescriptTo type-check the TypeScript SDK (this is fine to run directly):
cd sdks/typescript && bun run tsc --noEmit- Follow isort conventions with absolute imports preferred
- Use explicit type hints with SQLAlchemy mapped_column annotations
- snake_case for variables/functions; PascalCase for classes
- Line length: 88 chars (Black compatible)
- Explicit error handling with appropriate exception types
- Docstrings: Use Google style docstrings
- Never hold a DB session during external calls (LLM, embedding, HTTP). If a function needs both a DB session and an external call result, compute the external result first and pass it as a parameter. This avoids tying up DB connections during slow network I/O. Use
tracked_dbfor short-lived, DB-only operations; pass a shared session when multiple DB-only calls can reuse one connection.
Honcho uses three specialized LLM agents that work together to form memories and answer queries:
Role: Memory formation through content ingestion
The Deriver processes incoming messages and extracts observations about peers.
- Trigger: Messages created via API are enqueued for background processing
- Tools:
create_observations,update_peer_card,get_recent_history,search_memory,get_observation_context,search_messages - Output: Explicit observations (direct facts) and deductive observations (inferences)
- Entry point:
src/deriver/agent/worker.py→Agent.run_loop()
Role: Analysis and recall for answering queries
The Dialectic answers questions about peers by strategically gathering context from memory.
- Trigger: API call to
/peers/{peer_id}/chatwithagentic=true - Tools:
search_memory,get_recent_history,get_observation_context,search_messages,get_recent_observations,get_most_derived_observations,get_session_summary,get_peer_card,create_observations(deductive only) - Output: Natural language response grounded in gathered context
- Entry point:
src/dialectic/chat.py→agentic_chat()→DialecticAgent.answer()
Role: Consolidation and self-improvement of memory
The Dreamer explores and consolidates observations to improve memory quality.
- Trigger: Scheduled or explicit dream task via queue
- Tools:
get_recent_observations,get_most_derived_observations,search_memory,create_observations,delete_observations,update_peer_card - Strategy: Random walk exploration - start from recent/high-value observations, search for related content, consolidate redundancies
- Output: Consolidated observations, deleted redundancies
- Entry point:
src/dreamer/agent.py→DreamerAgent.consolidate()
All agents share common infrastructure in src/utils/agent_tools.py:
- Tool definitions: Unified tool schemas used by all agents
- Tool executor:
create_tool_executor()factory creates context-aware executors - LLM client:
honcho_llm_call()handles tool calling loops with configurable iterations
src/
├── main.py # FastAPI app setup with middleware and exception handlers
├── models.py # SQLAlchemy ORM models with proper type annotations
├── schemas.py # Pydantic validation schemas for API
├── config.py # Configuration management
├── db.py # Database connection and session management
├── dependencies.py # Dependency injection (DB sessions)
├── exceptions.py # Custom exception types
├── security.py # JWT authentication
├── embedding_client.py # Embedding service client
├── crud/ # Database operations
│ ├── __init__.py
│ ├── collection.py # Collection CRUD operations
│ ├── deriver.py # Deriver-related CRUD operations
│ ├── document.py # Document CRUD operations
│ ├── message.py # Message CRUD operations
│ ├── peer.py # Peer CRUD operations
│ ├── peer_card.py # Peer Card CRUD operations
│ ├── representation.py # RepresentationManager and representation operations
│ ├── session.py # Session CRUD operations
│ ├── webhook.py # Webhook CRUD operations
│ └── workspace.py # Workspace CRUD operations
├── dialectic/ # Dialectic API implementation
│ ├── __init__.py
│ ├── chat.py # Chat functionality (standard + agentic)
│ ├── prompts.py # Prompt templates
│ └── agent/ # Agentic dialectic implementation
│ ├── __init__.py
│ ├── core.py # DialecticAgent class
│ └── prompts.py # Agent system prompts
├── routers/ # API endpoints
│ ├── workspaces.py
│ ├── peers.py
│ ├── sessions.py
│ ├── messages.py
│ ├── keys.py
│ └── webhooks.py # Webhook endpoints
├── deriver/ # Background processing system
│ ├── __init__.py
│ ├── __main__.py # Deriver entry point
│ ├── consumer.py # Message consumer
│ ├── enqueue.py # Queue operations
│ ├── queue_manager.py # Queue management
│ └── agent/ # Agentic deriver implementation
│ ├── __init__.py
│ ├── core.py # Agent class
│ ├── worker.py # Task processing
│ └── prompts.py # Agent system prompts
├── dreamer/ # Memory consolidation system
│ ├── __init__.py
│ ├── agent.py # DreamerAgent class + process_agent_dream
│ └── dreamer.py # Legacy dreamer (scheduled)
├── utils/ # Utilities
│ ├── __init__.py
│ ├── agent_tools.py # Shared agent tools and executor
│ ├── clients.py # LLM client abstraction
│ ├── files.py # File handling utilities
│ ├── filter.py # Query filtering utilities
│ ├── formatting.py # Message formatting utilities
│ ├── logging.py # Logging and metrics (Rich console output)
│ ├── search.py # Search functionality
│ ├── shared_models.py # Shared data models
│ ├── summarizer.py # Session summarization
│ └── types.py # Type definitions
└── webhooks/ # Webhook system
├── events.py # Webhook event definitions
├── webhook_delivery.py # Webhook delivery logic
└── README.md # Webhook documentation
- Tests in pytest with fixtures in tests/conftest.py
- Use environment variables via python-dotenv (.env)
- All tables use text IDs (nanoid format) as primary keys
- Composite foreign keys for multi-tenant relationships
- Feature flags on workspace, peer, and session levels
- Token counting on messages for usage tracking
- JSONB metadata fields for extensibility
- HNSW indexes for vector similarity search
- Multi-Peer Sessions: Sessions can have multiple participants with different observation settings
- Background Processing: Async queue system for expensive operations
- Provider Abstraction: Model client supports multiple LLM providers
- Scoped Authentication: JWTs can be scoped to workspace, peer, or session level
- Batch Operations: Support for bulk message creation (up to 100 messages)
- Session History: Two-tier summarization (short every 20 messages, long every 60)
- Custom exceptions defined in src/exceptions.py
- Use specific exception types (ResourceNotFoundException, ValidationException, etc.)
- Proper logging with context instead of print statements
- Global exception handlers defined in main.py
- See docs/contributing/error-handling.mdx for details
- Always use
uv runoruvto prefix any commands related to python to ensure you use the virtual environment