This document explains the lines that carry behavior or design intent. Straight Rust struct literals, field assignments, imports, and obvious test assertions are left to the code unless they hide a Seaport-specific decision.
mod agent;,mod error;,mod evaluation;, andmod telemetry;keep the implementation split by responsibility while preserving one public crate API.- The
pub use ...lines re-export the intended public surface so callers do not need to know the internal file layout. SEAPORT_NAMEgives diagnostics and examples one stable crate identifier.
maindelegates torunand exits with command-specific status codes so the CLI can be tested without terminating the test process.rundispatches the top-levelseaportcommands.run_evalparses the expected local and registered dataset flags, then returns a clear not-implemented error until sandbox execution is wired.datasetsupports bothdataset listand thedatasets listalias.initcreates the first task skeleton format users can edit without writing Rust.CliErrorcarries an exit code alongside the message so usage errors and unimplemented commands are distinguishable.
Agent::nameis required because reports, run IDs, and telemetry need a stable agent identity.Agent::respondreturnsResult<String, SeaportError>so agent failures stay structured instead of becoming plain strings.EchoAgentandStaticAgentare intentionally simple test agents. They make examples and tests deterministic without adding mock frameworks.
ErrorKindseparates routing-level categories from exact error codes.SeaportErrorvariants carry the data needed to debug each failure without parsing formatted text.SeaportError::kindmaps variants to stable categories for metrics and alert grouping.SeaportError::codereturns stable machine-readable codes for telemetry, logs, and external integrations.Displayis implemented for humans; callers should usecodeandkindfor programmatic decisions.
TelemetryEvent::sequenceavoids timestamps in evaluation reports, keeping telemetry deterministic.TelemetryRecorder::newstartsnext_sequenceat one, making event order easy to read.TelemetryRecorder::recordis private so every event goes through the same sorting and sequencing path.telemetry_attributesaccepts a fixed array to keep call sites compact while still sorting attributes deterministically.
BTreeSetis used for duplicate detection because it has deterministic ordering and no randomized hashing.TestCase::newdoes not validate immediately; validation happens at run boundaries so batches can report errors consistently.RunConfig::validaterejectsSome(0)because a zero-character answer limit is almost always a configuration mistake.Scoreris a trait so exact match is the default, not the only scoring model.ExactMatchScorerreturns only1.0or0.0, making the initial evaluator deterministic and easy to audit.Evaluator::evaluatecreates an internal telemetry recorder for the common success path.Evaluator::evaluate_with_telemetrylets callers keep failure telemetry when the evaluation returns an error.validate_casesrejects empty IDs, rejects duplicates, then sorts by ID so equivalent case sets produce the same report order and run ID.validate_output_limitcountscharsinstead of bytes so limits match what humans see in Unicode text.validate_scorerejects NaN and values outside0.0..=1.0, keeping summary math stable.deterministic_run_idhashes the agent, scorer, and ordered case content so the same logical run gets the same ID.StableHashuses fixed FNV-1a constants instead ofstdrandomized hashing.write_strappends0xffas a field separator so adjacent strings cannot accidentally collapse into the same byte stream.