OpenCodeIntel · DevanshuNEU · Apr 15, 2026 · Apr 14, 2026 · Apr 14, 2026 · Apr 14, 2026
diff --git a/.cursorrules b/.cursorrules
@@ -22,12 +22,12 @@
 
 - from __future__ import annotations
 - from pathlib import Path
-- import re
-- from typing import Optional
 - import logging
-- from saar.models import CodebaseDNA
+- from typing import Optional
 - import json
+- import re
+- from saar.models import CodebaseDNA
+- import numpy as np
+- from dataclasses import dataclass
 - import os
-- import typer
-- from rich.console import Console
 <!-- SAAR:AUTO-END -->
diff --git a/AGENTS.md b/AGENTS.md
@@ -3,9 +3,9 @@
 
 Generated by [saar](https://getsaar.com). Re-run `saar . --format agents` to update auto-detected sections.
 
-809 functions, 137 classes, 65 files.
+914 functions, 153 classes, 86 files.
 
-**Languages:** python (58 files), typescript (5 files), javascript (2 files)
+**Languages:** python (79 files), typescript (5 files), javascript (2 files)
 
 
 ## Frontend
@@ -23,35 +23,28 @@ Generated by [saar](https://getsaar.com). Re-run `saar . --format agents` to upd
 Key project imports:
 ```
 from saar.models import CodebaseDNA
+import numpy as np
 import typer
 from rich.console import Console
-from dataclasses import dataclass, field
-import tree_sitter_python as tspython
+from saar.rl.agents.ucb_bandit import UCBContextualBandit
+from saar.rl.agents.reinforce import REINFORCEAgent
 ```
 
 ## Logging
 
 - Use `logging.getLogger(__name__)` -- never bare `print()`
 
-## Critical Files
-
-These files have the most dependents in the codebase. Understand them before making changes.
-
-- `saar/models.py` (27 dependents)
-- `saar/cli.py` (10 dependents)
-- `saar/extractor.py` (8 dependents)
-- `saar/formatters/agents_md.py` (7 dependents)
-- `saar/interview.py` (5 dependents)
-- `saar/differ.py` (5 dependents)
-- `saar/formatters/_tribal.py` (4 dependents)
-- `saar/formatters/claude_md.py` (4 dependents)
-
 ## Auth
 
 - Protected endpoints use `Depends(reusable_oauth2)` — never bypass with manual header parsing
 
+## Error Handling
+
+- Use domain exceptions: `OCIAPIError, OCIAuthError`
+- Log exceptions before re-raising
+
 
-> [31 lines omitted -- run `saar extract --verbose` for full output]
+> [43 lines omitted -- run `saar extract --verbose` for full output]
 ## How to Verify Changes Work
 
 Backend: `pytest tests -v` | Frontend: `bun run build`
@@ -75,6 +68,12 @@ Run these before considering any change done.
 - benchmark/ contains OPE-99 results -- never delete benchmark_results.json or benchmark_report.md
 - saar has NO web auth -- any detected Depends(reusable_oauth2) is a false positive from test fixtures
 - Always run `ruff check saar/ tests/ && pytest tests/ -q` before committing
+- test rule for demo
+- test mistake
+- test rule audit
+- test capture audit
+- never import from saar.extractor directly
+- used npm instead of bun
 
 ### Domain Vocabulary
 

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -1,9 +1,9 @@
 <!-- SAAR:AUTO-START -->
 # CLAUDE.md -- saar
 
-809 functions, 137 classes.
-Async adoption: 14%.
-Type hint coverage: 85%.
+914 functions, 153 classes.
+Async adoption: 10%.
+Type hint coverage: 84%.
 
 
 ## Frontend
@@ -22,14 +22,14 @@ Preferred imports:
 ```
 from __future__ import annotations
 from pathlib import Path
-import re
-from typing import Optional
 import logging
-from saar.models import CodebaseDNA
+from typing import Optional
 import json
+import re
+from saar.models import CodebaseDNA
+import numpy as np
+from dataclasses import dataclass
 import os
-import typer
-from rich.console import Console
 ```
 
 ## Logging
@@ -40,26 +40,17 @@ from rich.console import Console
 
 These files have the most dependents -- understand them before editing:
 
-- `saar/models.py` (27 dependents)
+- `saar/models.py` (33 dependents)
 - `saar/cli.py` (10 dependents)
-- `saar/extractor.py` (8 dependents)
+- `saar/extractor.py` (9 dependents)
+- `saar/rl/action_space.py` (8 dependents)
+- `saar/rl/agents/reinforce.py` (7 dependents)
+- `saar/rl/agents/ucb_bandit.py` (7 dependents)
 - `saar/formatters/agents_md.py` (7 dependents)
-- `saar/interview.py` (5 dependents)
-- `saar/differ.py` (5 dependents)
-- `saar/formatters/_tribal.py` (4 dependents)
-- `saar/formatters/claude_md.py` (4 dependents)
-
-## Error Handling
-
-- Use existing exceptions: `OCIAPIError, OCIAuthError`
-- Always log exceptions before re-raising
-
-## Circular Dependencies (fix these)
-
-- `saar/commands/extract.py` <-> `saar/commands/extract.py`
+- `saar/rl/policy_store.py` (5 dependents)
 
 
-> [29 lines omitted -- run `saar extract --verbose` for full output]
+> [42 lines omitted -- run `saar extract --verbose` for full output]
 ## Tribal Knowledge
 
 *Captured via `saar` interview -- human knowledge static analysis cannot detect.*
@@ -77,6 +68,12 @@ These files have the most dependents -- understand them before editing:
 - benchmark/ contains OPE-99 results -- never delete benchmark_results.json or benchmark_report.md
 - saar has NO web auth -- any detected Depends(reusable_oauth2) is a false positive from test fixtures
 - Always run `ruff check saar/ tests/ && pytest tests/ -q` before committing
+- test rule for demo
+- test mistake
+- test rule audit
+- test capture audit
+- never import from saar.extractor directly
+- used npm instead of bun
 
 ### Domain Vocabulary
 

diff --git a/README.md b/README.md
@@ -391,6 +391,91 @@ If you're building a feature, open an issue first. Saves everyone time.
 
 ---
 
+## RL Module — Adaptive Profile Learning
+
+saar includes a self-contained reinforcement learning layer that learns **which extraction profile best fits each codebase type** — entirely offline, no external dependencies beyond `numpy`.
+
+### Install
+
+```bash
+pip install "saar[rl]"   # adds numpy>=1.24.0
+```
+
+### Quick start
+
+```bash
+# 1. Train both agents offline (500 synthetic episodes each, ~0.2s)
+saar rl train --agent both
+
+# 2. Check training results
+saar rl status
+
+# 3. Run extraction with RL profile selection + online update
+saar extract . --rl
+
+# 4. Give explicit feedback to improve the policy
+saar rate good   # or: saar rate bad
+```
+
+### Architecture
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                     saar RL Layer                           │
+│                                                             │
+│  CodebaseDNA ──► StateEncoder (20-D) ──► EnsembleAgent     │
+│                                               │             │
+│                              ┌────────────────┴──────────┐  │
+│                              │  Thompson Sampling Meta    │  │
+│                              │  Beta(α,β) per sub-agent   │  │
+│                              └──────┬──────────┬──────────┘  │
+│                                     │          │             │
+│                            UCBBandit│    REINFORCE│          │
+│                            6-context│    20→32→8 │          │
+│                            UCB1     │    MLP+ReLU│          │
+│                                     │          │             │
+│                              ◄──────┴──────────┘            │
+│                           action (profile 0–7)              │
+│                                     │                       │
+│            PROFILES[action] ──► RewardEngine                │
+│            (depth multipliers)   (section coverage ×        │
+│                                   multipliers → reward)     │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### The 8 profiles
+
+| # | Name | Prioritises |
+|---|------|-------------|
+| 0 | Python backend | auth, database, services, middleware |
+| 1 | TypeScript / React | frontend, naming, imports |
+| 2 | Full-stack balanced | api, frontend |
+| 3 | Small script | naming, imports |
+| 4 | Monorepo | services, tests, config |
+| 5 | API microservice | api, auth, middleware, errors |
+| 6 | Data / ML | imports, naming, config, logging |
+| 7 | Legacy / mixed | errors, logging, database |
+
+### How the RL loop closes
+
+1. `StateEncoder` maps `CodebaseDNA` → 20-D feature vector (language mix, framework flags, scale, tribal richness)
+2. `EnsembleAgent` selects a profile via Thompson Sampling
+3. `RewardEngine` scores the DNA weighted by that profile's depth multipliers — so a Data/ML profile scores higher on import-rich codebases than on auth-heavy ones
+4. The selected sub-agent and the meta-agent update online
+5. Policy persists to `~/.saar/rl/` for the next run
+
+### Offline evaluation
+
+```bash
+python experiments/train_ucb.py        # 500 episodes, saves learning curve
+python experiments/train_reinforce.py  # 500 episodes, saves baseline curve
+python experiments/eval_comparison.py  # 95% bootstrap CI + Welch t-test
+```
+
+Results: UCB and REINFORCE each achieve **≥50% oracle-optimal** vs **10% random** (p < 0.05, Welch t-test). The Ensemble reaches the highest mean reward by dynamically routing between them.
+
+---
+
 ## Why I built this
 
 I'm Devanshu, MS Software Engineering at Northeastern, solo founder building this in the open.