Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
1101416
chore: step-0 cleanup of dead code in audit scope [phase-0]
Apr 14, 2026
5873da1
feat(outcome-signals): add infer_outcome_from_turn heuristic + tests
Apr 14, 2026
2cf70fb
feat(run-agent): record turn-boundary outcomes via infer_outcome_from…
Apr 14, 2026
b6a55c3
feat(reflection): infer NULL outcomes for aged decisions
Apr 14, 2026
3d31cef
test(closed-loop): e2e integration test for outcome -> reflection pip…
Apr 14, 2026
a1ca6ce
fix(run-agent): guard vision_analyze_tool async call against running …
Apr 14, 2026
0e35e86
fix(reflection): convert gather_reflection_input to native async
Apr 14, 2026
d47fea0
test(concurrency): integration test for concurrent gateway + reflection
Apr 14, 2026
4fa8976
feat(delegate): task classifier for compile mode selection
Apr 14, 2026
74e69a5
feat(delegate): parent-compile-once + per-subagent slice fan-out
Apr 14, 2026
8895821
feat(compile): skip re-injection when conversation already carries co…
Apr 14, 2026
793f72c
feat(compile): 5m TTL cache keyed on (project_id, task_hash)
Apr 14, 2026
153b916
test(routing): classifier unit + multi-agent integration
Apr 14, 2026
0f8c59e
feat(reflection): raise MAX_SKILLS_PER_CYCLE to 1 behind eval gate
Apr 14, 2026
9427c39
feat(skills): auto-invoke matching tasks + record skill_outcomes A/B …
Apr 14, 2026
8e85b52
feat(reflection): propose unused-skill deprecation after 30d
Apr 14, 2026
dd2c8c5
feat(reflection): prune reflection_log entries older than 180d
Apr 14, 2026
80e11b8
feat(hipp0-provider): circuit breaker on compile with 3-fail/60s trip…
Apr 14, 2026
e5dffb9
feat(wal): dead-letter 4xx entries; hermes wal status
Apr 14, 2026
f3fc508
feat(memory): stale marker when last compile >30m ago
Apr 14, 2026
18addc4
fix(user-facts): strict key schema, drop fact_key fallback
Apr 14, 2026
18fa617
fix(reflection): asyncio.wait_for(compile, timeout=5)
Apr 14, 2026
e72b930
perf(session-search): FTS5 index + top-10 recent cache
Apr 14, 2026
7a32cb1
perf(trajectory): asyncio.gather + semaphore(10)
Apr 14, 2026
93b2ebe
fix(tokens): centralize token estimation via estimate_messages_tokens…
Apr 14, 2026
33f4be6
test(closed-loop): e2e chain with outcome attribution + failure mode …
Apr 14, 2026
6d8b739
fix(hipp0-wal): race-safe drain + 0o600 perms on WAL/dead-letter
Apr 14, 2026
6d7c334
perf(state): fold prune_sessions N+1 DELETEs into IN-list
Apr 14, 2026
25582e2
chore: remove stale agent directives doc
Apr 14, 2026
3129647
[phase-11] feat(cost): CostGovernor daily budget + auxiliary_client gate
Apr 14, 2026
8581c7a
[phase-12] feat(calibration): outcome-inference drift detector
Apr 14, 2026
fb327df
[phase-13] feat(router): similarity classifier + routing-outcomes fee…
Apr 14, 2026
91f9844
[phase-15] ci: monthly upstream-drift check opens issue on >50 commits
Apr 14, 2026
45be1da
[phase-11] bench: hermulti hot-path perf gate (outcome inferrer + rou…
Apr 14, 2026
296a1b9
[phase-13] feat(api): GET /admin/routing-quality surfaces routing_out…
Apr 14, 2026
cdb72e3
[phase-15] test(closed-loop): parametrized fault-injection variants
Apr 14, 2026
5fa0b05
feat(signals): add extract_decision_signals and wire into turn loop f…
Apr 15, 2026
c9c415d
feat(skills): SkillLoader parses RESOLVER.md + SKILL.md files
Apr 15, 2026
c80d3ba
feat(skills): TriggerMatcher with regex matching and LLM-classifier f…
Apr 15, 2026
0e4e4c2
feat(skills): SkillRunner executes skills via LLM with structured act…
Apr 15, 2026
1036305
feat(skills): SkillDispatcher orchestrates loader/matcher/runner with…
Apr 15, 2026
1d096e7
feat(skills): wire SkillDispatcher into turn loop with regex extracto…
Apr 15, 2026
5ab14a2
fix(gateway): align gateway tests with current handler shapes and gua…
Apr 15, 2026
91ff8e6
fix(tests): auxiliary client + memory_user_id test alignment
Apr 15, 2026
01f9c69
fix(tools): voice/zombie/browser test alignment
Apr 15, 2026
c493d54
fix(tests): task classifier + ctx halving + CLI test alignment
Apr 15, 2026
8374597
docs(tests): add KNOWN_ISSUES documenting croniter skips
Apr 15, 2026
2d996d6
fix(tests): final two pre-existing failures - opencode model list + g…
Apr 15, 2026
ae35589
feat(e2e): agent-level tests - full turn, multi-turn, fault injection
Apr 15, 2026
794785e
feat(e2e): skill dispatcher integration test
Apr 15, 2026
51c92a4
fix(e2e): align memory provider + multi-turn test with hipp0 contract
Apr 15, 2026
67cb776
test: autouse fixture to snapshot/restore os.environ per test (fixes …
Apr 15, 2026
5cda48f
fix(approval): autouse isolation fixture for approval tests
Apr 15, 2026
6442766
fix(approval): snapshot _permanent_approved under _lock before save
Apr 15, 2026
593b0cd
test: snapshot-copy models_dev cache so in-place mutations don't blee…
Apr 15, 2026
ea1b781
test: widen approval wait + bench tolerance for xdist load robustness
Apr 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 39 additions & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,21 @@ jobs:
- name: Run tests
run: |
source .venv/bin/activate
python -m pytest tests/ -q --ignore=tests/integration --ignore=tests/e2e --tb=short -n auto
python -m pytest tests/ -q --ignore=tests/integration --ignore=tests/e2e --ignore=tests/bench --tb=short -n auto
env:
# Ensure tests don't accidentally call real APIs
OPENROUTER_API_KEY: ""
OPENAI_API_KEY: ""
NOUS_API_KEY: ""

- name: Hot-path bench (Phase 11 perf gate)
# Runs after the main suite so a perf regression lands with a clean
# signal rather than buried in the full-suite summary. Serial run
# avoids xdist's worker variance dominating the p95 samples.
run: |
source .venv/bin/activate
python -m pytest tests/bench/ -o addopts='' --tb=short

e2e:
runs-on: ubuntu-latest
timeout-minutes: 10
Expand Down Expand Up @@ -71,3 +79,33 @@ jobs:
OPENROUTER_API_KEY: ""
OPENAI_API_KEY: ""
NOUS_API_KEY: ""

closed-loop:
# Phase 10 gate: the full task -> subagent -> compile -> outcome ->
# attribution -> re-rank chain must stay green on every push/PR.
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v5

- name: Set up Python 3.11
run: uv python install 3.11

- name: Install dependencies
run: |
uv venv .venv --python 3.11
source .venv/bin/activate
uv pip install -e ".[all,dev]"

- name: Run closed-loop integration test
run: |
source .venv/bin/activate
python -m pytest tests/integration/test_closed_loop.py -v
env:
OPENROUTER_API_KEY: ""
OPENAI_API_KEY: ""
NOUS_API_KEY: ""
106 changes: 106 additions & 0 deletions .github/workflows/upstream-drift-check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
name: Upstream drift check

# Runs on the 1st of every month — a compromise between the quarterly
# cadence in project_hermulti_upstream_sync.md and monthly visibility into
# fast-moving upstream work. Also runnable on demand via workflow_dispatch.
on:
schedule:
- cron: '0 12 1 * *'
workflow_dispatch:

permissions:
contents: read
issues: write

jobs:
drift:
runs-on: ubuntu-latest
steps:
- name: Checkout hermulti
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Add upstream remote
run: |
git remote add upstream https://github.com/NousResearch/hermes-agent.git || true
git fetch upstream --depth=200

- name: Measure drift from seed
id: drift
run: |
# Seed commit is documented in memory/project_hermulti_upstream_sync.md
# and pinned here. When we do an upstream reseed, bump this to the
# new ancestor and reset the issue-worthy threshold.
SEED=0493bc7

COMMITS_AHEAD=$(git log "$SEED..upstream/main" --oneline 2>/dev/null | wc -l | tr -d ' ')
SHORTSTAT=$(git diff --shortstat "$SEED" upstream/main 2>/dev/null || echo "compare failed")
RECENT=$(git log "$SEED..upstream/main" --oneline 2>/dev/null | head -20)

echo "commits_ahead=$COMMITS_AHEAD" >> "$GITHUB_OUTPUT"
{
echo "shortstat<<EOF"
echo "$SHORTSTAT"
echo "EOF"
echo "recent<<EOF"
echo "$RECENT"
echo "EOF"
} >> "$GITHUB_OUTPUT"

echo "Upstream is ${COMMITS_AHEAD} commits ahead of seed ${SEED}."
echo "Diff shortstat: $SHORTSTAT"

- name: Open issue if drift exceeds threshold
if: ${{ fromJSON(steps.drift.outputs.commits_ahead) > 50 }}
uses: actions/github-script@v7
with:
script: |
const commitsAhead = ${{ steps.drift.outputs.commits_ahead }};
const shortstat = `${{ steps.drift.outputs.shortstat }}`;
const recent = `${{ steps.drift.outputs.recent }}`;
const title = `Upstream drift: ${commitsAhead} commits from NousResearch/hermes-agent`;
const body = [
`The monthly drift check found ${commitsAhead} upstream commits since the`,
`hermulti seed that have not been merged in.`,
``,
`**Diff shortstat (seed..upstream/main):**`,
'```',
shortstat.trim(),
'```',
``,
`**Most recent upstream commits:**`,
'```',
recent.trim(),
'```',
``,
`See ``memory/project_hermulti_upstream_sync.md`` for the merge strategy`,
`— the 4 hazard files (``gateway/run.py``, ``cli.py``, ``run_agent.py``,`,
'``hermes_cli/main.py``) need hand review before any 3-way reapply.',
``,
`This issue is opened automatically by``.github/workflows/upstream-drift-check.yml``.`,
].join('\n');

const { data: existing } = await github.rest.issues.listForRepo({
owner: context.repo.owner,
repo: context.repo.repo,
state: 'open',
labels: 'upstream-drift',
per_page: 1,
});
if (existing.length > 0) {
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: existing[0].number,
body: `Re-check on ${new Date().toISOString().slice(0,10)}: still ${commitsAhead} commits ahead.\n\n${body}`,
});
} else {
await github.rest.issues.create({
owner: context.repo.owner,
repo: context.repo.repo,
title,
body,
labels: ['upstream-drift', 'maintenance'],
});
}
91 changes: 0 additions & 91 deletions CLAUDE.md

This file was deleted.

12 changes: 12 additions & 0 deletions KNOWN_ISSUES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Known Issues

This document tracks tests that are intentionally skipped because they require
external resources or test removed behavior that is not currently planned.

## Skipped tests

### tests/cron/test_jobs.py (4 tests)
Skipped: require the optional `croniter` package, which is not installed in the
default dev environment. Install `croniter` to run cron-job scheduling tests:
`pip install croniter`.

21 changes: 21 additions & 0 deletions agent/auxiliary_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -510,6 +510,19 @@ def create(self, **kwargs) -> Any:
if temperature is not None:
anthropic_kwargs["temperature"] = temperature

# Cost-ceiling gate: refuse the call if this project is already over
# its daily budget. Best-effort — a missing project id or governor
# failure falls through to the call rather than blocking.
try:
from agent.cost_governor import get_governor, current_project_id, estimate_cost_usd, BudgetExceeded
pid = current_project_id()
if pid:
get_governor().check_budget(pid)
except BudgetExceeded:
raise
except Exception:
pid = None

response = self._client.messages.create(**anthropic_kwargs)
assistant_message, finish_reason = normalize_anthropic_response(response)

Expand All @@ -523,6 +536,14 @@ def create(self, **kwargs) -> Any:
completion_tokens=completion_tokens,
total_tokens=total_tokens,
)
# Record spend post-response. Best-effort; any failure here must
# not propagate to the caller — the LLM result is already in hand.
if pid:
try:
cost = estimate_cost_usd(model, prompt_tokens, completion_tokens)
get_governor().record_spend(pid, cost)
except Exception:
pass

choice = SimpleNamespace(
index=0,
Expand Down
Loading
Loading