- Frontend: https://autonomous-financial-research-agent.vercel.app
- Backend API: https://autonomous-financial-research-agent.onrender.com
A production-grade autonomous agent that gathers and synthesises financial data using a ReAct (Reason + Act) loop with semantic memory.
graph TD
User([User Query]) --> QA[Query Analyzer]
QA --> DL[Disambiguation Layer]
DL --> Agent{ReAct Agent Loop}
subgraph "Reasoning Core"
Agent --> Thought[Thought]
Thought --> Action[Action]
Action --> Tool[Tool Dispatch]
Tool --> Observation[Observation]
Observation --> Thought
end
subgraph "Memory Systems"
Agent <--> WM[Working Memory - L1]
Agent <--> SM[(Semantic Memory - L2)]
end
Agent --> Synthesis[Synthesis Engine]
Synthesis --> Resolver[Conflict Resolver]
Resolver --> Report[Final Report]
Report --> Eval[Evaluation Framework]
Eval --> Dash[HTML Dashboard]
| Module | Purpose | Key Features |
|---|---|---|
agent/ |
Reasoning Intelligence | Query Analysis, Ticker Disambiguation, Circuit Breakers |
synthesis/ |
Data Harmonization | Priority-based conflict resolution (SEC > Transcript > News) |
security/ |
System Hardening | PII Redaction, Prompt Injection Shield |
evaluation/ |
Quality Assurance | 20+ Automated Metrics, HTML Dashboard Generation |
memory/ |
Knowledge Retention | FAISS-backed Semantic Memory (Layer 2) |
tools/ |
Data Ingestion | 12+ High-Fidelity tool implementations |
The agent utilizes a registry of specialized tools for deep financial analysis:
- SEC EDGAR: Direct extraction of facts from 10-K, 10-Q, and 8-K filings.
- Transcripts: Processing and summarization of earnings call transcripts.
- News & Sentiment: Real-time news aggregation with VADER-based sentiment scoring.
- Financial Data API: Quantitative metrics retrieval (Revenue, EPS, Multiples).
- Peer Comparison: Automated benchmarking against industry cohorts.
- Fact Checker: Cross-references claims against known data points.
- Calculation Engine: Deterministic arithmetic to prevent LLM hallucination.
The agent uses a three-layer memory architecture to avoid redundant research across sessions:
- A Python
listthat accumulates tool results during a single agent run. - Injected into the LLM prompt each iteration so the model knows what data it already has.
- Volatile — cleared when the session ends.
- A FAISS vector index that stores embeddings of past tool results and report chunks.
- Enables similarity search across all past research sessions.
- Survives restarts — persisted to
memory/faiss_index.bin+memory/metadata.json.
- A JSON-persisted episodic store that records structured episodes from each research run, tracking tool reliability, strategy effectiveness, and error patterns to improve future sessions.
Session Start
│
▼
┌──────────────────────────────────────────────┐
│ 1. RETRIEVE from Semantic Memory (Layer 2) │
│ query → embed → FAISS search → top-k │
│ Inject "RELEVANT PAST RESEARCH" section │
└──────────────────┬───────────────────────────┘
│
┌──────────────▼──────────────┐
│ 2. ReAct Loop iterations │
│ Working Memory (Layer 1)│◀──┐
│ accumulates tool results│ │
└──────────────┬──────────────┘ │
│ │
┌──────────────▼──────────────┐ │
│ 3. STORE after each tool │ │
│ Tool result → chunk → │───┘
│ embed → FAISS insert │
│ Also into Layer 2 │
└──────────────┬──────────────┘
│
┌──────────────▼──────────────┐
│ 4. SAVE on session end │
│ Persist FAISS + metadata│
└─────────────────────────────┘
- Model:
text-embedding-3-small(1536 dimensions) - Cost: $0.02 per 1M tokens — cheapest production-grade option
- Normalisation: All vectors are L2-normalised so that FAISS inner-product search produces cosine similarity scores
- IndexFlatIP (flat inner product)
- With L2-normalised vectors,
dot(a, b) = cosine(a, b)— exact cosine similarity - Brute-force search is fast enough for < 100k vectors (sub-10ms)
- No approximate indexes (HNSW, IVF) needed at this scale
- Max tokens: 400 per chunk
- Overlap: 50 tokens between consecutive chunks
- Tokeniser:
tiktokenwithcl100k_baseencoding - Rationale: Overlap preserves context at boundaries (e.g., a revenue figure mentioned in one sentence with its breakdown in the next)
- Results with cosine similarity < 0.75 are filtered out
- Empirically tuned: scores below 0.75 are typically topically unrelated in financial text
- Scores 0.85+ are usually direct semantic matches
Each stored chunk carries parallel metadata:
{
"source": "sec_edgar",
"ticker": "AAPL",
"period": "2024-Q3",
"type": "revenue",
"chunk_text": "Apple Q3 2024 revenue was $85.8B..."
}Python 3.11+
pip install openai httpx beautifulsoup4 faiss-cpu tiktoken numpyFor Anthropic LLM support (optional):
pip install anthropicCreate a .env file based on .env.example:
# API Keys
OPENAI_API_KEY="sk-..."
ANTHROPIC_API_KEY="sk-ant-..."
TAVILY_API_KEY="tvly-..."
DATABASE_URL="postgresql://user:pass@host:port/db"
# Configuration
LLM_PROVIDER="openai" # openai, anthropic, gemini, groq
LLM_MODEL="gpt-4o"
ALLOWED_ORIGINS="http://localhost:3000"- PII Redaction: All tool outputs are scrubbed for sensitive data before memory storage.
- Prompt Injection Shield: Incoming queries are scanned for malicious heuristic patterns.
- Audit Logs: Every API interaction is logged with IP tracking for security auditing.
- Rate Limiting: Built-in protection against DDoS and API abuse.
from tools import TOOL_REGISTRY
from agents import run_agent, LLMClient
from memory import VectorStore
llm = LLMClient()
store = VectorStore() # loads or creates memory/faiss_index.bin
result = run_agent(
query="Analyze Apple Q3 2024 performance",
tool_registry=TOOL_REGISTRY,
llm_client=llm,
vector_store=store, # enables semantic memory
)# Full test suite (requires OPENAI_API_KEY)
python test_memory.py
# Agent integration test
python test_agent.py├── agent/
│ ├── circuit_breaker.py
│ ├── core.py
│ ├── disambiguation.py
│ ├── error_handler.py
│ ├── fallback_chains.py
│ └── query_analyzer.py
├── agents/
│ ├── llm_client.py
│ ├── prompts.py
│ └── react_loop.py
├── tools/
│ ├── calculation_engine.py
│ ├── company_profile.py
│ ├── fact_checker.py
│ ├── financial_data_api.py
│ ├── news_sentiment.py
│ ├── news_tool.py
│ ├── peer_comparison.py
│ ├── report_generator.py
│ ├── sec_tool.py
│ ├── transcript_tool.py
│ ├── vector_db_search.py
│ └── web_search.py
├── synthesis/
│ ├── conflict_detector.py
│ ├── engine.py
│ ├── extractor.py
│ ├── normalizer.py
│ ├── resolver.py
│ └── narrative.py
├── evaluation/
│ ├── dashboard.py
│ └── metrics.py
├── memory/
│ ├── chunker.py
│ ├── embedder.py
│ ├── episodic.py
│ └── vector_store.py
└── README.md