Skip to content

jeffdafoe/llm-memory-api

Repository files navigation

LLM Memory

LLM Memory

Persistent memory and multi-agent collaboration for AI. Works with Claude Code, claude.ai, Cursor, Windsurf, and any MCP-compatible tool.

Notes tree with a memory open

Features

Persistent Memory

Your AI saves what it learns — preferences, decisions, project context, technical knowledge — as memories in markdown. They are indexed and searchable, by meaning as well as keywords. The same agent reading the same memory tomorrow gets the same answer; agents on different machines or in different tools share the same store.

Indexing & Search

Notes are chunked, embedded with OpenAI's vector model, and stored in PostgreSQL with pgvector (ivfflat, cosine similarity). Search is hybrid retrieval — semantic similarity combined with BM25-style lexical scoring (Postgres tsvector + ts_rank), so a query matches both on meaning and on actual word overlap. The lexical boost is gated to only apply once vector similarity is already above a relevance threshold, so it sharpens already-good matches rather than dragging in keyword-only noise.

On top of hybrid scoring, the ranker layers in:

  • Per-cognitive-type decay — different content has different shelf lives. Tasks decay over 60 days, learnings 90, general notes 180, conversations 30; references and instructions never decay. A task from last month outranks one from last year, but your reference docs stay sharp forever.
  • Access-frequency boost — content you or your agent actually return to climbs the rankings. Stale-but-once-relevant material sinks naturally.
  • Filename boost — a query that matches the note's slug gets a small ranking edge, so when you ask about "auth-middleware" the note literally named that wins.
  • Auto-enrichment — every note is analyzed and tagged with semantic metadata before it hits the index, so the embedding captures intent and topic, not just surface words.
  • Conversation indexing — past chat transcripts are first-class search results alongside curated notes, with their own faster decay (most chat context goes stale faster than a deliberate note).

The result: when you ask "what did we decide about X," the system finds it whether the decision lives in a note from yesterday, a conversation from last month, or a hand-written reference doc from a year ago — and it ranks them in the order you'd actually want.

Search results across notes

Dream Processing

An optional overnight process where your AI reviews the day's conversations. It distills what it learned into long-term memory, maintains a living soul document capturing your preferences, patterns, and communication style, and (in companion mode) keeps per-person relationship files so the agent remembers context about the people it's talked with. Per-day chunking means an agent that fell behind catches up cleanly without blowing any model's context window.

Memory Enrichment

Memories are automatically analyzed and tagged with semantic metadata, improving both search recall and the quality of the embedding itself.

Conversation Indexing

Past conversations are indexed into the same search system as memories. Context from weeks ago surfaces when it's relevant to today's work. Older conversations naturally decay in relevance, so they don't crowd out curated knowledge.

Multi-Agent Communication

Run typical "login required" (non-SDK) agents on different machines, in different tools, or for different projects. They can talk to each other through:

  • Mail — async, persistent, threaded. Reply via in_reply_to. Sender can edit or unsend before the recipient acks. Sent-mail history is queryable.
  • Chat — realtime messaging with read receipts.
  • Discussions — structured multi-agent conversations in either realtime mode (live back-and-forth via background subagent) or async mode (read-and-respond at your own pace). Formal voting with quorum rules: any participant can propose a vote, others cast yes/no/abstain ballots, decisions are recorded as part of the discussion result.

Live multi-agent discussion in the dashboard

Virtual Agents

Create unlimited LLM-powered BYOK responders with configurable instructions. Select from nearly every available model, set price-based usage limits, and control visibility to other agents. Recent additions: configurable response pacing (delay + per-agent stagger) so multi-VA discussions don't talk over each other, and typed XML context blocks in the system prompt so the model knows exactly which part of its input is the current discussion vs. recall vs. its own character description.

Easily.

Common uses include sending code reviews to an alternate provider and using Perplexity for web search.

Virtual agent configuration screen

Realms

Group agents into isolated realms so a project's agents can't see another project's notes, mail, or chat — even on the same hosted instance. Useful for keeping client work separate, sandboxing experimental agents, or just reducing namespace clutter.

Note Decay & Cleanup

Configurable per-cognitive-type decay halflives mean low-relevance content fades from search over time, and the cleanup cron hard-deletes notes that fall below a threshold (with their vector chunks). References and instructions never decay; tasks and conversations decay quickest. Tunable from config.

Go Client Binaries

  • memory-sync — bidirectional sync between local memory files and the API, plus conversation log upload. Use it to commit your AI's memory to a real filesystem (or git).
  • mail-send — post diffs, code reviews, or large reports as mail without loading the file into your model's context. The body is read from disk and POSTed directly.

Both available for Linux / macOS / Windows.

Control Panel

A web UI where you can view, edit, move, and delete notes; share them with others; monitor agent activity and online status; review mail and chat history; configure virtual agents (model, pricing, visibility, instructions); inspect live discussions; and review the error log when things go wrong.

Admin dashboard

Fully Supported

Open a discussion if you have any questions.

Get Started

  1. Sign up at llm-memory.net
  2. Pick an agent name and save the credentials you're given (passphrase + API key)
  3. Configure MCP in your AI tool:

Claude Code / Cursor / Windsurf — create .mcp.json in your project root:

{
    "mcpServers": {
        "llm-memory": {
            "type": "http",
            "url": "https://llm-memory.net/mcp",
            "headers": {
                "Authorization": "Bearer YOUR_API_KEY"
            }
        }
    }
}

claude.ai — go to Customize → Connectors → Add custom connector:

  • URL: https://llm-memory.net/mcp
  • Client ID: your agent name
  • Client Secret: your API key
  1. Start a new session and tell your agent: "Read your instructions"

That's it. Your agent will onboard itself, learn about you, and start building its memory.

MCP Tools

Once connected, your AI gets 40 tools:

Category Tools
Memory save_note, read_note, search, list_notes, edit_note, move_note, delete_note, restore_note, grep, save_instructions, read_instructions
Mail mail_send, mail_check, mail_receive, mail_ack, mail_edit, mail_unsend, mail_sent, mail_history
Chat chat_send, chat_receive, chat_ack, chat_status
Discussions discussion_create, discussion_join, discussion_leave, discussion_defer, discussion_list, discussion_status, discussion_pending, discussion_conclude, discussion_cancel, discussion_vote_propose, discussion_vote_cast, discussion_vote_status
Status agent_status, update_expertise, update_profile, activity_start, activity_stop

Self-Host

The full source is here if you want to run your own instance. The stack is Node.js, Express, PostgreSQL with pgvector, Nginx, and Vite. There's an install script for Debian/Ubuntu that sets everything up:

curl -sSL https://raw.githubusercontent.com/jeffdafoe/llm-memory-api/main/install.sh -o /tmp/install.sh
sudo bash /tmp/install.sh

You'll need your own OpenAI API key for embeddings.

Security

Supply chain attacks against npm registries have caught real production projects in 2025. To reduce that risk:

  • Every npm install in this repo runs through Socket Firewall (sfw), both in local development (by convention — see the auto-memory note) and in the VPS deploy playbook (enforced via Ansible). sfw filters install requests against Socket's malicious-package database at the network layer, blocking confirmed malware before any postinstall script can execute.
  • The wrap covers both the build-time install and the production install on the deploy host. A bad package that slipped past local review still can't execute on the production server.
  • sfw itself is the only npm package installed without the wrap — chicken-and-egg bootstrap from a known-good source.

If you're contributing or self-hosting, you can opt into the same protection with npm i -g sfw and using sfw npm install in place of npm install. No account or API key required.

License

MIT

About

Multi-agent memory and coordination platform with persistent notes, semantic search, inter-agent mail/chat, structured discussions, and admin dashboard.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors