test: add e2e enforcement scenarios#174
Open
luca-iachini wants to merge 34 commits into
Open
Conversation
8d0cd33 to
4fd2256
Compare
Add the full integration test infrastructure: harness, config, audit utilities, CI workflow, and supporting crate changes. Wire up one scenario (normal_llm_call) to validate the end-to-end flow before the remaining scenarios land in the follow-up PR.
Add 7 scenarios covering the key enforcement policies: block_paste_service, block_unlisted_host, tool_call_exfil, direct_tcp_bypass, fs_read_deny, fs_delete_deny, code_fibonacci.
supervisor writes flat AuthorityConfig TOML; firma authority --config calls load_section(..., "authority") which expects a section wrapper.
Per-run authority always runs plaintext on loopback. User config may have TLS cert paths and a fixed listen_addr; carrying those into the spawned process causes FRAME_SIZE_ERROR (h2c client vs TLS server). Clear tls config and select an ephemeral loopback port up front.
e85cd03 to
959bf29
Compare
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
…egration-tests # Conflicts: # Cargo.lock # crates/firma-run/src/authority/supervisor.rs
LukeMathWalker
approved these changes
Jun 29, 2026
veeso
approved these changes
Jun 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
End-to-end enforcement scenarios driving real agents (
claude,codex) throughfirma run. Each runs two phases: baseline (agent alone, confirms the task is doable) and enforcement (under firma, confirms the expected decision).simple_promptallow_http_callpermitpolicy both grant itdeny_forbidden_http_resourceforbidon the resource UIDdeny_unclassified_intentdeny_http_callblock_raw_tcp_egressfs_read_denyfs_delete_denyRun