Skip to content

test: add e2e enforcement scenarios#174

Open
luca-iachini wants to merge 34 commits into
mainfrom
fir-368-integration-tests
Open

test: add e2e enforcement scenarios#174
luca-iachini wants to merge 34 commits into
mainfrom
fir-368-integration-tests

Conversation

@luca-iachini

@luca-iachini luca-iachini commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary

End-to-end enforcement scenarios driving real agents (claude, codex) through firma run. Each runs two phases: baseline (agent alone, confirms the task is doable) and enforcement (under firma, confirms the expected decision).

Scenario What it checks
simple_prompt ALLOW plain chat — no protected action, agent completes
allow_http_call ALLOW outbound HTTP when capability + permit policy both grant it
deny_forbidden_http_resource DENY — explicit forbid on the resource UID
deny_unclassified_intent DENY — unclassified request (no mapping rule) fails closed
deny_http_call DENY — mapped action class the agent holds no capability for
block_raw_tcp_egress BLOCK — raw TCP socket egress refused by the sandbox
fs_read_deny DENY filesystem read of a secret outside allowed scope
fs_delete_deny DENY filesystem delete of a protected file

Run

cargo nextest run -p firma --test e2e --run-ignored all -E 'test(claude::)'
cargo nextest run -p firma --test e2e --run-ignored all -E 'test(codex::)'

@luca-iachini luca-iachini force-pushed the fir-368-integration-tests branch from 8d0cd33 to 4fd2256 Compare June 19, 2026 08:00
@luca-iachini luca-iachini marked this pull request as draft June 19, 2026 17:28
Base automatically changed from fir-368-e2e-tests to main June 23, 2026 16:33
Add the full integration test infrastructure: harness, config, audit
utilities, CI workflow, and supporting crate changes. Wire up one
scenario (normal_llm_call) to validate the end-to-end flow before
the remaining scenarios land in the follow-up PR.
Add 7 scenarios covering the key enforcement policies:
block_paste_service, block_unlisted_host, tool_call_exfil,
direct_tcp_bypass, fs_read_deny, fs_delete_deny, code_fibonacci.
supervisor writes flat AuthorityConfig TOML; firma authority --config
calls load_section(..., "authority") which expects a section wrapper.
Per-run authority always runs plaintext on loopback. User config may
have TLS cert paths and a fixed listen_addr; carrying those into the
spawned process causes FRAME_SIZE_ERROR (h2c client vs TLS server).
Clear tls config and select an ephemeral loopback port up front.
@luca-iachini luca-iachini force-pushed the fir-368-integration-tests branch from e85cd03 to 959bf29 Compare June 24, 2026 16:19
@codecov

codecov Bot commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 77.98742% with 35 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/firma-run/src/authority/supervisor.rs 73.86% 11 Missing and 12 partials ⚠️
crates/firma-authority/src/keygen.rs 87.30% 6 Missing and 2 partials ⚠️
crates/firma-run/src/runtime.rs 0.00% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

@luca-iachini luca-iachini marked this pull request as ready for review June 26, 2026 15:53
@luca-iachini luca-iachini changed the title test: add remaining e2e enforcement scenarios test: add e2e enforcement scenarios Jun 26, 2026
Comment thread tests/e2e/scenarios/fs_delete_deny.rs
Comment thread tests/e2e/scenarios/fs_read_deny.rs Outdated
Comment thread tests/e2e/scenarios/fs_read_deny.rs Outdated
Comment thread tests/e2e/scenarios/fs_read_deny.rs Outdated
Comment thread tests/e2e/scenarios/deny_unmapped_http_call.rs Outdated
Comment thread tests/e2e/scenarios/deny_unclassified_intent.rs
Comment thread tests/e2e/snapshots/e2e__audit__claude_allow_via_policy.snap Outdated
Comment thread crates/firma-run/src/authority/supervisor.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants