Skip to content

Latest commit

 

History

History
310 lines (236 loc) · 15.6 KB

File metadata and controls

310 lines (236 loc) · 15.6 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Honcho Overview

What is Honcho?

Honcho is an infrastructure layer for building AI agents with memory and social cognition. Its primary purposes include:

  • Imbuing agents with a sense of identity
  • Personalizing user experiences through understanding user psychology
  • Providing a Dialectic API that injects personal context just-in-time
  • Supporting development of LLM-powered applications that adapt to end users
  • Enabling multi-peer sessions where multiple participants (users or agents) can interact

Honcho leverages the inherent reasoning capabilities of LLMs to build coherent models of user psychology over time, enabling more personalized and effective AI interactions.

Core Concepts

Peer Paradigm

Honcho uses a peer-based model where both users and agents are represented as "peers". This unified approach enables:

  • Multi-participant sessions with mixed human and AI agents
  • Configurable observation settings (which peers observe which others)
  • Flexible identity management for all participants

Key Primitives

  • Workspace (formerly App): The root organizational unit containing all resources
  • Peer (formerly User): Any participant in the system (human or AI)
  • Session: A conversation context that can involve multiple peers
  • Message: Data units that can represent communication between peers OR arbitrary data ingested by a peer to enhance its global representation
  • Collections & Documents: Internal vector storage for peer representations (not exposed via API)

Architecture Overview

API Structure

All API routes follow the pattern: /v1/{resource}/{id}/{action}

  • Workspaces: Create, list, update, search
  • Peers: Create, list, update, chat (dialectic), messages, representation
  • Sessions: Create, list, update, delete, clone, manage peers, get context
  • Messages: Create (batch up to 100), list, get, update
  • Keys: Create scoped JWTs

Key Features

Dialectic API (/peers/{peer_id}/chat)

  • Provides bespoke responses informed by the representation
  • Integrates long-term facts from vector storage
  • Supports streaming responses
  • Configurable LLM providers

Message Processing Pipeline

  1. Messages created via API (batch or single)
  2. Enqueued for background processing:
    • representation: Update peer's context
    • summary: Create session summaries
  3. Session-based queue processing ensures order
  4. Results stored internally in vector DB

Configuration

  • Hierarchical config: config.toml + environment variables
  • Database settings with connection pooling
  • Multiple LLM provider support
  • Background worker (deriver) settings
  • Authentication can be toggled on/off

Development Guide

Commands

  • Setup: uv sync
  • Run server: uv run fastapi dev src/main.py
  • Run tests: uv run pytest tests/
  • Run single test: uv run pytest tests/path/to/test_file.py::test_function
  • Linting: uv run ruff check src/
  • Typechecking: uv run basedpyright
  • Format code: uv run ruff format src/

LLM provider routing (current as of 2026-05-04 upstream sync)

The legacy cf and custom provider tags are gone. Transport is Literal["anthropic", "openai", "gemini"] only — see src/llm/registry.py. Per-component routing happens via <COMPONENT>_MODEL_CONFIG__* env vars (Pydantic settings with env_nested_delimiter="__").

  • CF Gateway integration is app-level now, not deployment-level. src/llm/registry.py and src/embedding_client.py auto-inject cf-aig-authorization: Bearer $LLM_CF_GATEWAY_AUTH_TOKEN on any override client whose base_url contains gateway.ai.cloudflare.com. Set LLM_CF_GATEWAY_AUTH_TOKEN once globally; the rest is per-component OVERRIDES__BASE_URL.
  • Native Gemini works for json_schema. The new GeminiBackend (src/llm/backends/gemini.py) talks Gemini's native protocol — response_format=json_schema is honored server-side. Route through CF Gateway with base_url: https://gateway.ai.cloudflare.com/v1/<acct>/<gw>/google-ai-studio (note: NO /openai suffix — that path was the old OpenAI-compat shim that silently dropped json_schema, deriver/summary used to need workarounds for it).
  • Native Gemini also fixes thoughtSignature round-trippingsrc/llm/history_adapters.py:77-78 and src/llm/executor.py:43-44 preserve it across tool iterations. The old "set thinkingBudgetTokens=0 for multi-iter tool loops" workaround is no longer needed.
  • Ollama Cloud routing: transport: openai + base_url: https://gateway.ai.cloudflare.com/v1/<acct>/<gw>/custom-ollama. Pass the Ollama Cloud key via MODEL_CONFIG__OVERRIDES__API_KEY_ENV: <env_var_name> so the secret is referenced not duplicated. Note that _uses_max_completion_tokens() in src/llm/backends/openai.py:21 only fires for gpt-5/o-series models — Ollama Cloud chat models stay on max_tokens.
  • response_format=json_schema still doesn't work over Ollama Cloud's OpenAI-compat layer. Free-form / tool-call paths are fine; structured-output paths must use a transport whose upstream honors schemas (anthropic, openai/gpt-5+, or gemini-native).
  • CF AI Gateway remains a transparent proxy. Limitations are upstream-side; the cf-aig-authorization header is the only CF-specific concern in app code.

Local LM Studio Setup

  • Honcho can use LM Studio via transport: openai + MODEL_CONFIG__OVERRIDES__BASE_URL: http://localhost:1234/v1.
  • Keep LLM_OPENAI_API_KEY configured for embeddings unless embedding support is added for local models.
  • For Docker Compose, the per-component MODEL_CONFIG__OVERRIDES__BASE_URL must be http://host.docker.internal:1234/v1, not http://localhost:1234/v1.
  • Pass MODEL_CONFIG__OVERRIDES__API_KEY: lm-studio (or any non-empty placeholder); LM Studio doesn't validate it.
  • Current local default model is qwen2.5-14b-instruct.
  • When overriding DIALECTIC_LEVELS__* via env vars, each level needs its full required settings, not just MODEL_CONFIG__TRANSPORT and __MODEL. Include __THINKING_BUDGET_TOKENS and MAX_TOOL_ITERATIONS, and optionally MAX_OUTPUT_TOKENS. For backups, use the nested __MODEL_CONFIG__FALLBACK__TRANSPORT / __MODEL shape.
  • Docker should own the runtime environment completely. Do not mount the repo onto /app and do not mount a named volume onto /app/.venv, or the image-built environment can be hidden and replaced with incompatible artifacts.
  • If Docker services fail with missing Python modules or incompatible native extensions, rebuild the image instead of trying to repair the environment in-place:
docker compose build --no-cache api deriver
docker compose up -d --force-recreate api deriver
  • Verify LM Studio from the host with:
curl -sS http://localhost:1234/v1/models
  • Verify LM Studio from Docker with:
docker compose run --rm --entrypoint sh api -lc 'python - <<\"PY\"
import urllib.request
print(urllib.request.urlopen(\"http://host.docker.internal:1234/v1/models\", timeout=5).status)
PY'

SDK Testing

TypeScript SDK

🚨 DO NOT RUN bun test DIRECTLY. IT WILL NOT WORK. 🚨

The TypeScript SDK tests require a running Honcho server with database and Redis. Running bun test alone will fail immediately because there's no server. The tests are orchestrated via pytest which handles all the infrastructure setup.

The ONLY way to run TypeScript SDK tests:

# From the monorepo root (not from sdks/typescript/)
uv run pytest tests/ -k typescript

To type-check the TypeScript SDK (this is fine to run directly):

cd sdks/typescript && bun run tsc --noEmit

Code Style

  • Follow isort conventions with absolute imports preferred
  • Use explicit type hints with SQLAlchemy mapped_column annotations
  • snake_case for variables/functions; PascalCase for classes
  • Line length: 88 chars (Black compatible)
  • Explicit error handling with appropriate exception types
  • Docstrings: Use Google style docstrings
  • Never hold a DB session during external calls (LLM, embedding, HTTP). If a function needs both a DB session and an external call result, compute the external result first and pass it as a parameter. This avoids tying up DB connections during slow network I/O. Use tracked_db for short-lived, DB-only operations; pass a shared session when multiple DB-only calls can reuse one connection.

Agent Architecture

Honcho uses three specialized LLM agents that work together to form memories and answer queries:

1. Deriver Agent (src/deriver/agent/)

Role: Memory formation through content ingestion

The Deriver processes incoming messages and extracts observations about peers.

  • Trigger: Messages created via API are enqueued for background processing
  • Tools: create_observations, update_peer_card, get_recent_history, search_memory, get_observation_context, search_messages
  • Output: Explicit observations (direct facts) and deductive observations (inferences)
  • Entry point: src/deriver/agent/worker.pyAgent.run_loop()

2. Dialectic Agent (src/dialectic/agent/)

Role: Analysis and recall for answering queries

The Dialectic answers questions about peers by strategically gathering context from memory.

  • Trigger: API call to /peers/{peer_id}/chat with agentic=true
  • Tools: search_memory, get_recent_history, get_observation_context, search_messages, get_recent_observations, get_most_derived_observations, get_session_summary, get_peer_card, create_observations (deductive only)
  • Output: Natural language response grounded in gathered context
  • Entry point: src/dialectic/chat.pyagentic_chat()DialecticAgent.answer()

3. Dreamer Agent (src/dreamer/agent.py)

Role: Consolidation and self-improvement of memory

The Dreamer explores and consolidates observations to improve memory quality.

  • Trigger: Scheduled or explicit dream task via queue
  • Tools: get_recent_observations, get_most_derived_observations, search_memory, create_observations, delete_observations, update_peer_card
  • Strategy: Random walk exploration - start from recent/high-value observations, search for related content, consolidate redundancies
  • Output: Consolidated observations, deleted redundancies
  • Entry point: src/dreamer/agent.pyDreamerAgent.consolidate()

Shared Agent Infrastructure

All agents share common infrastructure in src/utils/agent_tools.py:

  • Tool definitions: Unified tool schemas used by all agents
  • Tool executor: create_tool_executor() factory creates context-aware executors
  • LLM client: honcho_llm_call() handles tool calling loops with configurable iterations

Project Structure

src/
├── main.py              # FastAPI app setup with middleware and exception handlers
├── models.py            # SQLAlchemy ORM models with proper type annotations
├── schemas.py           # Pydantic validation schemas for API
├── config.py            # Configuration management
├── db.py                # Database connection and session management
├── dependencies.py      # Dependency injection (DB sessions)
├── exceptions.py        # Custom exception types
├── security.py          # JWT authentication
├── embedding_client.py  # Embedding service client
├── crud/                # Database operations
│   ├── __init__.py
│   ├── collection.py     # Collection CRUD operations
│   ├── deriver.py        # Deriver-related CRUD operations
│   ├── document.py       # Document CRUD operations
│   ├── message.py        # Message CRUD operations
│   ├── peer.py           # Peer CRUD operations
│   ├── peer_card.py      # Peer Card CRUD operations
│   ├── representation.py # RepresentationManager and representation operations
│   ├── session.py        # Session CRUD operations
│   ├── webhook.py        # Webhook CRUD operations
│   └── workspace.py      # Workspace CRUD operations
├── dialectic/            # Dialectic API implementation
│   ├── __init__.py
│   ├── chat.py           # Chat functionality (standard + agentic)
│   ├── prompts.py        # Prompt templates
│   └── agent/            # Agentic dialectic implementation
│       ├── __init__.py
│       ├── core.py       # DialecticAgent class
│       └── prompts.py    # Agent system prompts
├── routers/              # API endpoints
│   ├── workspaces.py
│   ├── peers.py
│   ├── sessions.py
│   ├── messages.py
│   ├── keys.py
│   └── webhooks.py      # Webhook endpoints
├── deriver/             # Background processing system
│   ├── __init__.py
│   ├── __main__.py      # Deriver entry point
│   ├── consumer.py      # Message consumer
│   ├── enqueue.py       # Queue operations
│   ├── queue_manager.py # Queue management
│   └── agent/           # Agentic deriver implementation
│       ├── __init__.py
│       ├── core.py      # Agent class
│       ├── worker.py    # Task processing
│       └── prompts.py   # Agent system prompts
├── dreamer/             # Memory consolidation system
│   ├── __init__.py
│   ├── agent.py         # DreamerAgent class + process_agent_dream
│   └── dreamer.py       # Legacy dreamer (scheduled)
├── utils/               # Utilities
│   ├── __init__.py
│   ├── agent_tools.py   # Shared agent tools and executor
│   ├── clients.py       # LLM client abstraction
│   ├── files.py         # File handling utilities
│   ├── filter.py        # Query filtering utilities
│   ├── formatting.py    # Message formatting utilities
│   ├── logging.py       # Logging and metrics (Rich console output)
│   ├── search.py        # Search functionality
│   ├── shared_models.py # Shared data models
│   ├── summarizer.py    # Session summarization
│   └── types.py         # Type definitions
└── webhooks/            # Webhook system
    ├── events.py        # Webhook event definitions
    ├── webhook_delivery.py # Webhook delivery logic
    └── README.md        # Webhook documentation
  • Tests in pytest with fixtures in tests/conftest.py
  • Use environment variables via python-dotenv (.env)

Database Design

  • All tables use text IDs (nanoid format) as primary keys
  • Composite foreign keys for multi-tenant relationships
  • Feature flags on workspace, peer, and session levels
  • Token counting on messages for usage tracking
  • JSONB metadata fields for extensibility
  • HNSW indexes for vector similarity search

Key Architectural Decisions

  1. Multi-Peer Sessions: Sessions can have multiple participants with different observation settings
  2. Background Processing: Async queue system for expensive operations
  3. Provider Abstraction: Model client supports multiple LLM providers
  4. Scoped Authentication: JWTs can be scoped to workspace, peer, or session level
  5. Batch Operations: Support for bulk message creation (up to 100 messages)
  6. Session History: Two-tier summarization (short every 20 messages, long every 60)

Error Handling

  • Custom exceptions defined in src/exceptions.py
  • Use specific exception types (ResourceNotFoundException, ValidationException, etc.)
  • Proper logging with context instead of print statements
  • Global exception handlers defined in main.py
  • See docs/contributing/error-handling.mdx for details

Notes

  • Always use uv run or uv to prefix any commands related to python to ensure you use the virtual environment