Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .cursorrules
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,12 @@

- from __future__ import annotations
- from pathlib import Path
- import re
- from typing import Optional
- import logging
- from saar.models import CodebaseDNA
- from typing import Optional
- import json
- import re
- from saar.models import CodebaseDNA
- import numpy as np
- from dataclasses import dataclass
- import os
- import typer
- from rich.console import Console
<!-- SAAR:AUTO-END -->
35 changes: 17 additions & 18 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@

Generated by [saar](https://getsaar.com). Re-run `saar . --format agents` to update auto-detected sections.

809 functions, 137 classes, 65 files.
914 functions, 153 classes, 86 files.

**Languages:** python (58 files), typescript (5 files), javascript (2 files)
**Languages:** python (79 files), typescript (5 files), javascript (2 files)


## Frontend
Expand All @@ -23,35 +23,28 @@ Generated by [saar](https://getsaar.com). Re-run `saar . --format agents` to upd
Key project imports:
```
from saar.models import CodebaseDNA
import numpy as np
import typer
from rich.console import Console
from dataclasses import dataclass, field
import tree_sitter_python as tspython
from saar.rl.agents.ucb_bandit import UCBContextualBandit
from saar.rl.agents.reinforce import REINFORCEAgent
```

## Logging

- Use `logging.getLogger(__name__)` -- never bare `print()`

## Critical Files

These files have the most dependents in the codebase. Understand them before making changes.

- `saar/models.py` (27 dependents)
- `saar/cli.py` (10 dependents)
- `saar/extractor.py` (8 dependents)
- `saar/formatters/agents_md.py` (7 dependents)
- `saar/interview.py` (5 dependents)
- `saar/differ.py` (5 dependents)
- `saar/formatters/_tribal.py` (4 dependents)
- `saar/formatters/claude_md.py` (4 dependents)

## Auth

- Protected endpoints use `Depends(reusable_oauth2)` — never bypass with manual header parsing

## Error Handling

- Use domain exceptions: `OCIAPIError, OCIAuthError`
- Log exceptions before re-raising


> [31 lines omitted -- run `saar extract --verbose` for full output]
> [43 lines omitted -- run `saar extract --verbose` for full output]
## How to Verify Changes Work

Backend: `pytest tests -v` | Frontend: `bun run build`
Expand All @@ -75,6 +68,12 @@ Run these before considering any change done.
- benchmark/ contains OPE-99 results -- never delete benchmark_results.json or benchmark_report.md
- saar has NO web auth -- any detected Depends(reusable_oauth2) is a false positive from test fixtures
- Always run `ruff check saar/ tests/ && pytest tests/ -q` before committing
- test rule for demo
- test mistake
- test rule audit
- test capture audit
- never import from saar.extractor directly
- used npm instead of bun

### Domain Vocabulary

Expand Down
45 changes: 21 additions & 24 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
<!-- SAAR:AUTO-START -->
# CLAUDE.md -- saar

809 functions, 137 classes.
Async adoption: 14%.
Type hint coverage: 85%.
914 functions, 153 classes.
Async adoption: 10%.
Type hint coverage: 84%.


## Frontend
Expand All @@ -22,14 +22,14 @@ Preferred imports:
```
from __future__ import annotations
from pathlib import Path
import re
from typing import Optional
import logging
from saar.models import CodebaseDNA
from typing import Optional
import json
import re
from saar.models import CodebaseDNA
import numpy as np
from dataclasses import dataclass
import os
import typer
from rich.console import Console
```

## Logging
Expand All @@ -40,26 +40,17 @@ from rich.console import Console

These files have the most dependents -- understand them before editing:

- `saar/models.py` (27 dependents)
- `saar/models.py` (33 dependents)
- `saar/cli.py` (10 dependents)
- `saar/extractor.py` (8 dependents)
- `saar/extractor.py` (9 dependents)
- `saar/rl/action_space.py` (8 dependents)
- `saar/rl/agents/reinforce.py` (7 dependents)
- `saar/rl/agents/ucb_bandit.py` (7 dependents)
- `saar/formatters/agents_md.py` (7 dependents)
- `saar/interview.py` (5 dependents)
- `saar/differ.py` (5 dependents)
- `saar/formatters/_tribal.py` (4 dependents)
- `saar/formatters/claude_md.py` (4 dependents)

## Error Handling

- Use existing exceptions: `OCIAPIError, OCIAuthError`
- Always log exceptions before re-raising

## Circular Dependencies (fix these)

- `saar/commands/extract.py` <-> `saar/commands/extract.py`
- `saar/rl/policy_store.py` (5 dependents)


> [29 lines omitted -- run `saar extract --verbose` for full output]
> [42 lines omitted -- run `saar extract --verbose` for full output]
## Tribal Knowledge

*Captured via `saar` interview -- human knowledge static analysis cannot detect.*
Expand All @@ -77,6 +68,12 @@ These files have the most dependents -- understand them before editing:
- benchmark/ contains OPE-99 results -- never delete benchmark_results.json or benchmark_report.md
- saar has NO web auth -- any detected Depends(reusable_oauth2) is a false positive from test fixtures
- Always run `ruff check saar/ tests/ && pytest tests/ -q` before committing
- test rule for demo
- test mistake
- test rule audit
- test capture audit
- never import from saar.extractor directly
- used npm instead of bun

### Domain Vocabulary

Expand Down
85 changes: 85 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -391,6 +391,91 @@ If you're building a feature, open an issue first. Saves everyone time.

---

## RL Module — Adaptive Profile Learning

saar includes a self-contained reinforcement learning layer that learns **which extraction profile best fits each codebase type** — entirely offline, no external dependencies beyond `numpy`.

### Install

```bash
pip install "saar[rl]" # adds numpy>=1.24.0
```

### Quick start

```bash
# 1. Train both agents offline (500 synthetic episodes each, ~0.2s)
saar rl train --agent both

# 2. Check training results
saar rl status

# 3. Run extraction with RL profile selection + online update
saar extract . --rl

# 4. Give explicit feedback to improve the policy
saar rate good # or: saar rate bad
```

### Architecture

```
┌─────────────────────────────────────────────────────────────┐
│ saar RL Layer │
│ │
│ CodebaseDNA ──► StateEncoder (20-D) ──► EnsembleAgent │
│ │ │
│ ┌────────────────┴──────────┐ │
│ │ Thompson Sampling Meta │ │
│ │ Beta(α,β) per sub-agent │ │
│ └──────┬──────────┬──────────┘ │
│ │ │ │
│ UCBBandit│ REINFORCE│ │
│ 6-context│ 20→32→8 │ │
│ UCB1 │ MLP+ReLU│ │
│ │ │ │
│ ◄──────┴──────────┘ │
│ action (profile 0–7) │
│ │ │
│ PROFILES[action] ──► RewardEngine │
│ (depth multipliers) (section coverage × │
│ multipliers → reward) │
└─────────────────────────────────────────────────────────────┘
```

### The 8 profiles

| # | Name | Prioritises |
|---|------|-------------|
| 0 | Python backend | auth, database, services, middleware |
| 1 | TypeScript / React | frontend, naming, imports |
| 2 | Full-stack balanced | api, frontend |
| 3 | Small script | naming, imports |
| 4 | Monorepo | services, tests, config |
| 5 | API microservice | api, auth, middleware, errors |
| 6 | Data / ML | imports, naming, config, logging |
| 7 | Legacy / mixed | errors, logging, database |

### How the RL loop closes

1. `StateEncoder` maps `CodebaseDNA` → 20-D feature vector (language mix, framework flags, scale, tribal richness)
2. `EnsembleAgent` selects a profile via Thompson Sampling
3. `RewardEngine` scores the DNA weighted by that profile's depth multipliers — so a Data/ML profile scores higher on import-rich codebases than on auth-heavy ones
4. The selected sub-agent and the meta-agent update online
5. Policy persists to `~/.saar/rl/` for the next run

### Offline evaluation

```bash
python experiments/train_ucb.py # 500 episodes, saves learning curve
python experiments/train_reinforce.py # 500 episodes, saves baseline curve
python experiments/eval_comparison.py # 95% bootstrap CI + Welch t-test
```

Results: UCB and REINFORCE each achieve **≥50% oracle-optimal** vs **10% random** (p < 0.05, Welch t-test). The Ensemble reaches the highest mean reward by dynamically routing between them.

---

## Why I built this

I'm Devanshu, MS Software Engineering at Northeastern, solo founder building this in the open.
Expand Down
Loading
Loading