Audit remediation + autonomous skill dispatcher + LLM decision capture by perlantir · Pull Request #1 · perlantir/hermulti

perlantir · 2026-04-14T08:46:35Z

Summary

All hermulti-side work for the hipp0 GBrain-tier program. Scope expanded from initial audit remediation to include Phase 2 (signal detector) and Phase 5 (autonomous skill dispatcher, Tier B).

Audit remediation (earlier commits)

Hermes closed outcome loop: turn-boundary record_outcome, outcome_signals.py, reflection NULL-outcome backfill, closed-loop integration tests
Async/sync stability fixes, multi-agent context routing, WAL race-safe drain, prune_sessions N+1
Cost governor (daily budget), routing classifier with feedback edge, outcome inference drift detector
Bench hot-path gates + CI drift monitors
See earlier commits for full detail

Phase 2: signal detector (1 commit)

feat(signals): extract_decision_signals() + DecisionSignal dataclass added to agent/outcome_signals.py; record_decision() added to Hipp0MemoryProvider; wired into turn loop with try/except fallback. Regex-based (5 patterns for explicit decisions, rejections, confidence inference). 5 tests.

Phase 5: autonomous skill dispatcher (Tier B, 5 commits)

Upgrades from regex-only signal capture to a full skill execution framework driven by LLM with structured-output actions.

feat(skills): SkillLoader parses RESOLVER.md + SKILL.md files. Custom YAML frontmatter parser (no external dep). Reads from HIPP0_SKILLS_DIR or defaults to /root/audit/hipp0ai/skills.
feat(skills): TriggerMatcher — regex-first matching of events against skill triggers, with optional LLM-classifier fallback for ambiguous events. Pre-compiles trigger phrases to EventType tags and literal-substring regexes.
feat(skills): SkillRunner — builds LLM prompt from skill body + event, parses structured JSON actions, dispatches to hipp0_provider.record_decision / record_outcome / log / noop. Fully typed via LLMClient and Hipp0ProviderProto Protocols — no hard coupling to concrete classes.
feat(skills): SkillDispatcher orchestrator with priority ordering:
- brain-ops READ phase fires FIRST on PRE_TASK events
- signal-detector runs in PARALLEL (fire-and-forget, never blocks) on INBOUND/OUTBOUND messages
- Other matched skills run SEQUENTIALLY (awaited)
- brain-ops WRITE phase fires LAST on POST_DECISION / POST_OUTCOME
feat(skills): Turn loop integration — _get_skill_dispatcher() lazy-init helper on AIAgent. When enabled, dispatches SkillEvent(OUTBOUND_MESSAGE) via fire-and-forget asyncio.create_task(). Regex fallback preserves Phase 2 semantics when dispatcher is off.

Uses hermulti's existing auxiliary_client.py via AuxiliaryLLMAdapter (prefers async clients, falls back to sync call_llm through thread executor).

Config flags

HIPP0_SKILL_DISPATCHER=auto|on|off (default: auto, enabled iff LLM configured)
HIPP0_SKILL_LLM_PROVIDER=anthropic|codex|openai-codex (default: probe all)
HIPP0_SKILLS_DIR (default: /root/audit/hipp0ai/skills)

Test plan

pytest tests/skills/ — 122 tests covering loader (5), matcher (9), runner (8), dispatcher (8) + integration
pytest tests/agent/test_decision_signals.py — 5 Phase 2 regex tests
Combined: 127 tests green
py_compile run_agent.py parses cleanly after integration edit
Dispatcher verified fire-and-forget — cannot block turn loop on LLM slowness/failure
Regex fallback preserves Phase 2 behavior when dispatcher disabled
All earlier audit-remediation tests pass
Pre-commit hook (em-dash/secret scrub) passes on all commits

Companion PR

Paired with hipp0ai/#3 which adds the hipp0-side GBrain-tier work (phases 1-5).

Remove unused typing.Any import in gateway/persistent_agent_router.py.

Simple pure helper that classifies a user turn as "positive"/"negative"/None based on explicit feedback markers. Used by the upcoming turn-boundary record_outcome hook and the reflection NULL-outcome backfill to close the signal loop.

…_turn Before this, record_outcome had zero callers in run_agent.py — the outcome column on sessions was only ever set by gateway/telegram reaction handlers, leaving the CLI path (and most sessions) permanently NULL. Now every completed turn tries to infer a coarse positive/negative signal from user feedback markers and persists it via DecisionDB.record_outcome. Wrapped in try/except so outcome recording can never break the turn.

Sessions that stay NULL for more than AGED_NULL_OUTCOME_DAYS (3) never received a reaction signal and are unlikely to. Run the turn-boundary heuristic over the last user message and, when it yields a confident label, persist it so subsequent reflection cycles can actually learn from the session instead of bucketing it under "no outcome recorded". Neutral/unknown cases are left NULL by design.

…eline Exercises all three Phase 1 pieces composed: infer_outcome_from_turn -> SessionDB.record_outcome -> reflection._query_sessions backfill + bucket assignment. Uses a temp state.db via the isolate_hermes_home conftest fixture and monkeypatches reflection._state_db_path to point at it. Also fixes a stale DecisionDB class reference in outcome_signals.py docstring (the class is actually SessionDB).

…loop Calling asyncio.run() from within a thread that already owns a running event loop (the gateway dispatches handle_message via an executor) raises RuntimeError. Mirror the thread-pool bridge pattern used in agent/context_references.py so the Anthropic image fallback works in both sync CLI paths and gateway concurrency.

asyncio.get_event_loop().run_until_complete() inside a coroutine (or any thread with a live loop) is a hard crash under Python 3.12's stricter event-loop policy. Make gather_reflection_input an async function that awaits _try_compile_context directly, and have run_reflection call it rather than duplicating the inline session/tool-usage/memory gather. Top-level cron entry (run_reflection_job) still owns the only asyncio.run() call — the ticker dispatches it from a worker thread with no running loop, which is the supported pattern.

Exercises the two async-in-sync hazards Phase 2 fixed: * _describe_image_for_anthropic_fallback called directly from a live event loop (plus 10-way asyncio.to_thread fan-out) — would raise "asyncio.run() cannot be called from a running event loop" without the thread-pool bridge. * gather_reflection_input awaited from inside a running loop — would raise via get_event_loop().run_until_complete() without the async conversion. Verified the suite fails against HEAD~2 (pre-fix) and passes against HEAD.

Adds classify_task() heuristic that routes delegate invocations to the cheapest compile mode: skip-compile for self-contained tasks, technical/full compile for debugging, user/fast for preferences, and a default fast compile otherwise. Wired into invoke() so the right context reaches the subagent without always hitting the broad full-compile path.

Adds PersistentDelegateTool.invoke_batch: a single broad compile against the joined tasks, then per-subagent slicing of the returned decisions via token-overlap scoring. Each subagent's invoke() accepts a precompiled CompiledContext so the broad-compile output can be passed through without re-hitting /api/compile. Same-project guard: mixed-project batches fall back to the unchanged per-task parallel path to avoid cross-project context leakage. The natural run_agent.py fan-out site (_execute_tool_calls_concurrent) uses a thread pool over sync tool handlers; wiring the async batch path through it is out of scope for this commit. Callers and the integration test exercise invoke_batch directly; the TTL cache in the next commit absorbs the same fan-out when the batch path isn't used.

…ntent Adds _drop_redundant_compiled: for each compiled decision, compute token-overlap against the tail of the parent's recent messages; drop decisions already carried by the conversation (>=80% token overlap). If the whole compile is redundant, zero it out so the prompt block doesn't nag the model with duplicates. Wired into invoke() via parent_agent._session_messages or an explicit recent_messages kwarg.

In-memory dict keyed on (project_id, sha256(task)[:16], fast_mode, namespace) with a 300s TTL. Consulted before provider.compile() and populated with non-degraded results only (avoids pinning the agent in degraded mode past the 5m window). Absorbs N-subagent fan-out when invoke_batch isn't used.

tests/test_task_classifier.py: pure unit coverage for classify_task (self-contained skip, technical/user namespace routing, default). tests/integration/test_multiagent_routing.py: spawns 3 heterogeneous delegate tasks via invoke_batch, asserts exactly 1 compile call reaches the mock HIPP0, each subagent gets a task-relevant slice, user_facts propagate to every subagent, and a separate case that verifies the 5m TTL cache absorbs a repeat compile.

Skills can now be auto-created from reflection proposals, capped at 1 per cycle and gated by an evidence eval: the candidate must be anchored to at least one NEGATIVE-outcome session in the lookback window that mentions a topic token from the proposed skill name/hint. Proposals without prior- failure evidence are logged as skill_eval_gate_failed and skipped.

…delta After a skill is created by the reflection gate, wire it into the outcome ledger so its value can be measured: - Baseline: same-topic outcomes in the 7d BEFORE creation are written as kind='baseline' rows in a new skill_outcomes table (created on demand). - Matches: up to 3 recent sessions whose first-user-message contains a topic token are written as kind='match' rows. - New record_skill_outcome_for_session hook lets the outcome pipeline append kind='post' rows so an A/B delta can be computed later. An A/B summary (baseline total / positive / negative / ratio) is appended to the reflection log at registration time.

Add _propose_unused_skill_deprecation(), called once per reflection cycle. Cross-references skill SKILL.md mtimes against the reflection input's tool_usage map: skills with mtime >30d old whose name tokens do not appear in any recently-used tool are logged as skill_deprecation_proposal entries in the reflection log. Never auto-deletes — a human reviews the log.

Add _prune_reflection_log() invoked at the end of each reflection cycle. Reads the per-agent reflection_log.jsonl, drops JSON entries whose timestamp is older than REFLECTION_LOG_RETENTION_DAYS (180), and rewrites atomically via a .tmp sibling. Fail-open for malformed lines or entries missing a timestamp — we keep them rather than silently destroying rows we can't parse.

… + 2m cooldown Prevent pileup on dead HIPP0 by short-circuiting compile() to degraded-mode while OPEN. CLOSED -> OPEN after 3 unavailable events inside a 60s sliding window; HALF_OPEN probe after 2m cooldown; CLOSE on probe success; re-OPEN on probe failure.

4xx replay failures are contract bugs — don't silently drop. Move them to dead_letter.jsonl next to pending.jsonl with timestamp, status, and error body for operator inspection. Dead-lettered entries are never retried. Adds `hermes wal status` CLI showing per-agent WAL depth, dead-letter depth, and oldest entry age.

Prepend `[STALE MEMORY: last successful compile {N}m ago]` to the rendered compile block when the circuit breaker is OPEN or the last successful compile is older than 30 minutes, so the model knows recall may be out of date.

The `f.get("key") or f.get("fact_key")` fallback masked a HIPP0 contract bug: legacy `fact_key`/`fact_value` entries kept rendering under the current endpoint shape. Require `key` strictly; log-and-drop malformed entries so contract violations surface.

Cap the compile-context fetch in gather_reflection_input at 5s so a slow/dead HIPP0 can't stall the reflection pipeline. On timeout, log and proceed without compiled context.

Bump schema to v9: rebuild messages_fts with session_id, role and timestamp UNINDEXED so session/role filters can be served directly from the FTS layer instead of round-tripping to the messages table. Migration drops the old virtual table and triggers, recreates them with the new column set, and repopulates from messages. Add a 5-minute TTL in-process cache on list_sessions_rich() for the top-10 recent sessions hot path (offset=0, limit<=10). Dashboard polling absorbs the churn without hammering SQLite.

Add TrajectoryCompressor.compress_many_async(entries) for ad-hoc batch callers: runs process_entry_async for every entry concurrently behind an asyncio.Semaphore(10) via asyncio.gather, preserving input order. Caps outbound LLM summarization fan-out independently of max_concurrent_requests (which governs the full-directory pipeline).

…_rough Route all ad-hoc `len(x) // 4` token-estimation sites through the existing `agent.model_metadata.estimate_tokens_rough` / `estimate_messages_tokens_rough` helpers so capacity, compression, and cost-estimate math share one source of truth. Sites swapped: - trajectory_compressor.count_tokens() tokenizer fallback - agent/prompt_builder.estimate_prompt_tokens() - agent/hipp0_memory_provider degraded-compile fallback - tools/skills_tool._estimate_tokens() - gateway/platforms/web_platform (transcript/soul/input/output estimates) - scripts/sample_and_compress tokenizer fallback Adds tests/test_token_estimation.py with fixture messages and sensible-range assertions.

…+ CI gate Phase 10 final gate. Extends tests/integration/test_closed_loop.py to cover the full task -> subagent -> compile -> outcome inference -> record_outcome -> reflection backfill -> recompile re-ranks chain end-to-end, using a FakeHipp0Provider that records every call and simulates the hipp0-side trust-multiplier effect to prove HERMES-side wiring. Adds two failure-mode tests: - record_outcome silently dropped -> second compile keeps baseline ranking - provider.record_outcome raises -> caller surfaces the error Adds a dedicated closed-loop CI job to .github/workflows/tests.yml so a regression fails with a clear signal independent of the main test suite.

_drain_wal() read the WAL into memory then rewrote it with write_text() — a concurrent _wal_append between those two steps would be silently clobbered. Serialize both operations under an asyncio.Lock and use atomic replace for rewrites. WAL and dead_letter.jsonl hold full conversation payloads but were created with umask defaults (commonly 0o644), exposing memory to other local users on shared hosts. Create both via os.open() with mode 0o600. Tests added: - test_wal_files_are_mode_0o600 — permission regression - test_drain_rewrite_preserves_concurrent_append — race regression

prune_sessions_older_than() issued one DELETE FROM messages + one DELETE FROM sessions per session_id in a Python loop. Replace with two IN-list statements so a 10k-session prune collapses from 20k round-trips to 2.

github-actions · 2026-04-14T08:46:48Z

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: marshal/pickle/compile usage

These can deserialize or construct executable code objects.

Matches:

47:+# Circuit breaker tuning for compile(). Three unavailable events inside
61:+    """Minimal circuit breaker for Hipp0MemoryProvider.compile().
126:+    # Minutes since the provider's last successful compile(). Set when
180:+        # Wall-clock timestamp of the last successful compile(). Used by
200:+            return self._degraded_compile(
1692:+    async def noop_compile(_name):
1732:+    async def noop_compile(_name):
1973:+    Records every compile() and record_outcome() call, and uses its own
1974:+    outcome state to re-rank decisions on subsequent compile() calls.
1983:+    async def compile(self, task_description: str, **kwargs):

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

…ter) Pure-CPU microbenches on the per-turn classifiers: perf_counter sampler with baseline JSON in tests/bench/budgets.json, 1.4x tolerance. CI runs serial (no xdist) so worker variance doesn't pollute p95. HERMES_BENCH_UPDATE=1 reseeds the baseline.

…comes Handler aggregates the routing-outcomes JSONL and returns per-class decision_count / outcome distribution / positive_rate. Auth-gated via the existing _check_auth path so it respects HERMES_API_TOKEN. Consumed by Phase 13's nightly threshold-tuning job and on-demand dashboards; returns 503 when tools.routing_outcomes can't be imported (e.g. trimmed-down distribution) rather than taking the server down.

Adds FaultyHipp0Provider with switchable compile/record faults: - compile: hipp0_500, circuit_open, budget_exceeded (BudgetExceeded) - record: wal_full (OSError ENOSPC), circuit_open Three new parametrized tests: 1. compile faults must raise a distinguishable exception the turn loop can match on (no silent swallowing). 2. record_outcome faults on WAL / circuit must propagate typed errors, while the local SessionDB record path stays functional so the turn loop keeps making progress with the remote down. 3. recovery test: after a transient compile fault, the next compile succeeds (guards against sticky-failure regressions). 13/13 closed-loop tests green (8 existing + 5 new).

github-actions · 2026-04-14T19:59:32Z

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: marshal/pickle/compile usage

These can deserialize or construct executable code objects.

Matches:

556:+# Circuit breaker tuning for compile(). Three unavailable events inside
570:+    """Minimal circuit breaker for Hipp0MemoryProvider.compile().
635:+    # Minutes since the provider's last successful compile(). Set when
689:+        # Wall-clock timestamp of the last successful compile(). Used by
709:+            return self._degraded_compile(
3014:+    async def noop_compile(_name):
3054:+    async def noop_compile(_name):
3295:+    Records every compile() and record_outcome() call, and uses its own
3296:+    outcome state to re-rank decisions on subsequent compile() calls.
3305:+    async def compile(self, task_description: str, **kwargs):

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

…or passive decision capture Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…allback

…ions

… priority ordering

…r as fallback Adds agent/skills/llm_adapter.py (AuxiliaryLLMAdapter) bridging the dispatcher's minimal LLMClient Protocol to hermulti's auxiliary_client primitives. In run_agent.py, the existing regex-only decision-signal capture now first tries the SkillDispatcher (fire-and-forget OUTBOUND_MESSAGE event) and falls back to extract_decision_signals when the dispatcher is disabled or no LLM is configured.

github-actions · 2026-04-15T15:28:56Z

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: marshal/pickle/compile usage

These can deserialize or construct executable code objects.

Matches:

556:+# Circuit breaker tuning for compile(). Three unavailable events inside
570:+    """Minimal circuit breaker for Hipp0MemoryProvider.compile().
635:+    # Minutes since the provider's last successful compile(). Set when
689:+        # Wall-clock timestamp of the last successful compile(). Used by
709:+            return self._degraded_compile(
4070:+    async def noop_compile(_name):
4110:+    async def noop_compile(_name):
4351:+    Records every compile() and record_outcome() call, and uses its own
4352:+    outcome state to re-rank decisions on subsequent compile() calls.
4361:+    async def compile(self, task_description: str, **kwargs):

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

…rd matrix SyncError

All in-scope test directories (bench, cron, e2e, environments, honcho_plugin, plugins) pass cleanly. Document the 4 cron tests that skip when the optional croniter package is unavailable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ateway progress topics - test_models_dev: add autouse fixture to save/restore module-global _models_dev_cache. Was overwriting it with SAMPLE_REGISTRY, breaking downstream opencode-go detection via list_authenticated_providers. - test_run_progress_topics: pre-load tools.terminal_tool so the tool registry is populated before gateway.run emits progress. Previously the emoji depended on test ordering (registry loaded vs not). - test_internal_event_bypass_pairing: redirect gateway.pairing.PAIRING_DIR to tmp_path so the _rate_limits.json state does not leak between tests via the real ~/.hermes/platforms/pairing directory.

github-actions · 2026-04-15T18:14:33Z

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: marshal/pickle/compile usage

These can deserialize or construct executable code objects.

Matches:

574:+# Circuit breaker tuning for compile(). Three unavailable events inside
588:+    """Minimal circuit breaker for Hipp0MemoryProvider.compile().
653:+    # Minutes since the provider's last successful compile(). Set when
707:+        # Wall-clock timestamp of the last successful compile(). Used by
727:+            return self._degraded_compile(
4336:+    async def noop_compile(_name):
4376:+    async def noop_compile(_name):
4617:+    Records every compile() and record_outcome() call, and uses its own
4618:+    outcome state to re-rank decisions on subsequent compile() calls.
4627:+    async def compile(self, task_description: str, **kwargs):

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

Companion test to hipp0ai e2e scenario 04. Wires a RecordingLLM returning a record_decision action and a RecordingProvider, then dispatches an OUTBOUND_MESSAGE and asserts the action propagates to provider.record_decision() with the expected payload. Lives under tests/integration/ so the fast unit loop can skip it via collection-time filtering; relies on /root/audit/hipp0ai/skills/ being mounted. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Hipp0MemoryProvider.record_decision was POSTing {title, content, ...} to /api/decisions. hipp0 rejects this: the route requires description (not content) and project_id. Without these every captured decision silently swallowed a 400 in the try/except and returned False. - test_multi_turn_conversation used a literal string for session_id and `content` on the decision payload. hipp0 enforces session_id is a UUID, so the test always failed. Register a hermes agent, start a real session to obtain a UUID, and use `description` on the decision.

github-actions · 2026-04-15T18:50:50Z

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Outbound network calls (POST/PUT)

Outbound POST/PUT requests in new code could be data exfiltration. Verify the destination URLs are legitimate.

Matches (first 10):

4196:+    proj = httpx.post(
4206:+    reg = httpx.post(
4219:+    httpx.post(
4233:+    start = httpx.post(
4246:+    end = httpx.post(

⚠️ WARNING: marshal/pickle/compile usage

These can deserialize or construct executable code objects.

Matches:

574:+# Circuit breaker tuning for compile(). Three unavailable events inside
588:+    """Minimal circuit breaker for Hipp0MemoryProvider.compile().
653:+    # Minutes since the provider's last successful compile(). Set when
707:+        # Wall-clock timestamp of the last successful compile(). Used by
727:+            return self._degraded_compile(
4621:+    async def noop_compile(_name):
4661:+    async def noop_compile(_name):
4902:+    Records every compile() and record_outcome() call, and uses its own
4903:+    outcome state to re-rank decisions on subsequent compile() calls.
4912:+    async def compile(self, task_description: str, **kwargs):

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

…xdist flakes)

Wipes tools.approval module-level dicts (_gateway_queues, _gateway_notify_cbs, _session_approved, _permanent_approved, _pending) before and after every approval-related test in tests/gateway/ so xdist workers cannot observe torn state from sibling runs. tools/approval.py already serializes every mutation through _lock, so thread-safety of the module itself is already covered. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

save_permanent_allowlist() iterates via list(patterns) outside the lock. If another thread calls approve_permanent() during that iteration, CPython raises "Set changed size during iteration". Three call sites (check_dangerous_command, check_all_command_guards x2) previously passed the live set; now they copy under _lock first. Completes the thread-safety audit started in 5cda48f. The autouse isolation fixture already covers cross-test state pollution under xdist. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…d back

github-actions · 2026-04-15T20:18:04Z

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Outbound network calls (POST/PUT)

Outbound POST/PUT requests in new code could be data exfiltration. Verify the destination URLs are legitimate.

Matches (first 10):

4268:+    proj = httpx.post(
4278:+    reg = httpx.post(
4291:+    httpx.post(
4305:+    start = httpx.post(
4318:+    end = httpx.post(

⚠️ WARNING: marshal/pickle/compile usage

These can deserialize or construct executable code objects.

Matches:

574:+# Circuit breaker tuning for compile(). Three unavailable events inside
588:+    """Minimal circuit breaker for Hipp0MemoryProvider.compile().
653:+    # Minutes since the provider's last successful compile(). Set when
707:+        # Wall-clock timestamp of the last successful compile(). Used by
727:+            return self._degraded_compile(
4778:+    async def noop_compile(_name):
4818:+    async def noop_compile(_name):
5059:+    Records every compile() and record_outcome() call, and uses its own
5060:+    outcome state to re-rank decisions on subsequent compile() calls.
5069:+    async def compile(self, task_description: str, **kwargs):

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

root added 28 commits April 14, 2026 06:04

chore: step-0 cleanup of dead code in audit scope [phase-0]

1101416

Remove unused typing.Any import in gateway/persistent_agent_router.py.

feat(memory): stale marker when last compile >30m ago

f3fc508

Prepend `[STALE MEMORY: last successful compile {N}m ago]` to the rendered compile block when the circuit breaker is OPEN or the last successful compile is older than 30 minutes, so the model knows recall may be out of date.

fix(reflection): asyncio.wait_for(compile, timeout=5)

18fa617

Cap the compile-context fetch in gather_reflection_input at 5s so a slow/dead HIPP0 can't stall the reflection pipeline. On timeout, log and proceed without compiled context.

perf(state): fold prune_sessions N+1 DELETEs into IN-list

6d7c334

prune_sessions_older_than() issued one DELETE FROM messages + one DELETE FROM sessions per session_id in a Python loop. Replace with two IN-list statements so a 10k-session prune collapses from 20k round-trips to 2.

chore: remove stale agent directives doc

25582e2

perlantir changed the title ~~Audit phases 0-10 + security/perf close-out~~ Audit phases 0-15 + security/perf close-out Apr 14, 2026

root added 3 commits April 14, 2026 19:54

root and others added 6 commits April 15, 2026 07:33

feat(signals): add extract_decision_signals and wire into turn loop f…

5fa0b05

…or passive decision capture Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(skills): SkillLoader parses RESOLVER.md + SKILL.md files

c9c415d

feat(skills): TriggerMatcher with regex matching and LLM-classifier f…

c80d3ba

…allback

feat(skills): SkillRunner executes skills via LLM with structured act…

0e4e4c2

…ions

feat(skills): SkillDispatcher orchestrates loader/matcher/runner with…

1036305

… priority ordering

perlantir changed the title ~~Audit phases 0-15 + security/perf close-out~~ Audit remediation + autonomous skill dispatcher + LLM decision capture Apr 15, 2026

root and others added 6 commits April 15, 2026 17:26

fix(gateway): align gateway tests with current handler shapes and gua…

5ab14a2

…rd matrix SyncError

fix(tests): auxiliary client + memory_user_id test alignment

91ff8e6

fix(tools): voice/zombie/browser test alignment

01f9c69

fix(tests): task classifier + ctx halving + CLI test alignment

c493d54

root and others added 3 commits April 15, 2026 18:25

feat(e2e): agent-level tests - full turn, multi-turn, fault injection

ae35589

root and others added 5 commits April 15, 2026 19:11

test: autouse fixture to snapshot/restore os.environ per test (fixes …

67cb776

…xdist flakes)

test: snapshot-copy models_dev cache so in-place mutations don't blee…

593b0cd

…d back

test: widen approval wait + bench tolerance for xdist load robustness

ea1b781

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audit remediation + autonomous skill dispatcher + LLM decision capture#1

Audit remediation + autonomous skill dispatcher + LLM decision capture#1
perlantir wants to merge 56 commits intomainfrom
fix/audit-phase-0-cleanup

perlantir commented Apr 14, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 14, 2026

Uh oh!

github-actions Bot commented Apr 14, 2026

Uh oh!

github-actions Bot commented Apr 15, 2026

Uh oh!

github-actions Bot commented Apr 15, 2026

Uh oh!

github-actions Bot commented Apr 15, 2026

Uh oh!

github-actions Bot commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

perlantir commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Audit remediation (earlier commits)

Phase 2: signal detector (1 commit)

Phase 5: autonomous skill dispatcher (Tier B, 5 commits)

Config flags

Test plan

Companion PR

Uh oh!

github-actions Bot commented Apr 14, 2026

⚠️ Supply Chain Risk Detected

⚠️ WARNING: marshal/pickle/compile usage

Uh oh!

github-actions Bot commented Apr 14, 2026

⚠️ Supply Chain Risk Detected

⚠️ WARNING: marshal/pickle/compile usage

Uh oh!

github-actions Bot commented Apr 15, 2026

⚠️ Supply Chain Risk Detected

⚠️ WARNING: marshal/pickle/compile usage

Uh oh!

github-actions Bot commented Apr 15, 2026

⚠️ Supply Chain Risk Detected

⚠️ WARNING: marshal/pickle/compile usage

Uh oh!

github-actions Bot commented Apr 15, 2026

⚠️ Supply Chain Risk Detected

⚠️ WARNING: Outbound network calls (POST/PUT)

⚠️ WARNING: marshal/pickle/compile usage

Uh oh!

github-actions Bot commented Apr 15, 2026

⚠️ Supply Chain Risk Detected

⚠️ WARNING: Outbound network calls (POST/PUT)

⚠️ WARNING: marshal/pickle/compile usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

perlantir commented Apr 14, 2026 •

edited

Loading