Conversation
Ignore known vulnerabilities in dependencies pinned for MSRV 1.70 compatibility: - RUSTSEC-2026-0007 (bytes): integer overflow, fix requires >= 1.11.1 - RUSTSEC-2026-0009 (time): stack exhaustion, fix requires >= 0.3.47 Both are DoS-only risks, acceptable for a benchmark tool. AI-assisted-by: Claude claude-4.6-opus-high-thinking (Anthropic) Made-with: Cursor
…ages Pass send timestamp to MessageLatencyRecord instead of capturing it when the record is created. This makes the timestamp represent when the message was sent, so gaps between timestamps now match the actual latency. Changes: - MessageLatencyRecord::new() and new_combined() now require send_timestamp_ns parameter instead of capturing SystemTime::now() internally - Added MessageLatencyRecord::current_timestamp_ns() helper for call sites that need to capture the current wall-clock time - Updated all streaming record creation sites in benchmark.rs and benchmark_blocking.rs to capture and pass send timestamps Cherry-picked from container-to-container-ipc branch (6cb8f9b). Host-container-specific changes excluded (host_container.rs not present). AI-assisted-by: Claude claude-4.6-opus-high-thinking (Anthropic) Made-with: Cursor
…n documentation - Add 3 new unit tests for MessageLatencyRecord timestamp handling: test_new_uses_provided_send_timestamp, test_new_combined_uses_provided_send_timestamp, test_current_timestamp_ns_returns_recent_value - Add 2 end-to-end streaming tests in benchmark.rs: test_one_way_streaming_captures_send_timestamp, test_round_trip_streaming_captures_send_timestamp - Document sequential one-way/round-trip test execution in README.md (new "Test Execution Order" section), CONFIG.md, dashboard README, and run() doc comments in benchmark.rs and benchmark_blocking.rs - All 332 tests pass (331 passed, 1 ignored), zero clippy warnings AI-assisted-by: Claude claude-4.6-opus-high-thinking Made-with: Cursor
- Refactor is_some()/unwrap() anti-pattern in tcp_socket.rs and unix_domain_socket.rs to use idiomatic `if let` destructuring, fixing clippy::unnecessary_unwrap errors on Rust 1.94.0 - Pin uuid to >=1.17, <1.22 to prevent MSRV CI from resolving uuid 1.22+ which depends on getrandom 0.4 (Rust 2024 edition, MSRV 1.85), incompatible with our Rust 1.70 MSRV - All tests passing, clippy clean, formatting verified AI-assisted-by: Claude claude-4.6-opus-high-thinking Made-with: Cursor
- uuid 1.21+ (not 1.22+) switched to getrandom ^0.4 which requires Rust 2024 edition (MSRV 1.85), incompatible with our Rust 1.70 MSRV - Pin uuid to >=1.17, <1.21 to ensure getrandom ^0.3 is used AI-assisted-by: Claude claude-4.6-opus-high-thinking Made-with: Cursor
…aths The previous commit (68afe07) added a send_timestamp_ns parameter to MessageLatencyRecord but only the blocking round-trip path was actually capturing timestamps at send time. The async round-trip, async combined, and all one-way paths were still using current_timestamp_ns() at record-creation time (post-test), causing all timestamps within a run to cluster into the same second. Changes: - Async round-trip: capture wall-clock timestamp inside the spawned client future before each send(), return Vec<(Duration, u64)> - Async combined: same pattern for one-way latency vector - One-way (async + blocking): server now writes "wall_send_ns,latency_ns" per line (wall_send_ns = wall_clock_now - latency); client readers parse and use the server-computed send timestamp - Add parse_latency_file_line() with 7 unit tests covering valid input, missing commas, empty lines, non-numeric values, and extra commas - Enhance existing end-to-end streaming tests to validate timestamps fall within the test execution window and are not all identical - Document streaming output column definitions in README.md including timestamp_ns semantics and accuracy note for one-way clock mixing All 265+ unit tests pass, clippy clean, no scope creep. AI-assisted-by: Claude claude-4.6-opus-high-thinking (Anthropic) Made-with: Cursor
dustinblack
left a comment
There was a problem hiding this comment.
I closed #92 because fresh testing showed the problem was no longer reproducible. Issue #106 was then created the following day as the basis for this PR, but it contains no evidence of an actual problem — no reproduction steps, no sample output showing incorrect timestamps, and no before/after data.
I'd suggest closing this PR and #106 unless concrete evidence can be provided in #106 demonstrating that the specific code paths being fixed here actually produce incorrect timestamps on the current main branch — similar to what I provided in #92.
|
These are valid issues/problems. I added to issue #106 showing the problems and before/after scenarios. |
dustinblack
left a comment
There was a problem hiding this comment.
Thanks for adding the evidence to #106 — the before/after data clearly demonstrates the problem across round-trip, combined, and one-way code paths. The bugs are real and the fixes are technically sound. Withdrawing my earlier suggestion to close.
Technical assessment of the changes:
The fixes are correct. Moving SystemTime::now() capture from record-creation time to send time (via the send_timestamp_ns parameter on new()/new_combined()) is the right approach. The one-way server-side approximation (wall_now - latency) mixes clock domains but is clearly documented and is the best available without clock sync. The parse_latency_file_line() parser is clean, and the test coverage is thorough.
Before this can be approved, it needs:
-
Rebase on main. This branch is based on
e826b20, missing PRs #103 and #104. The clippy refactors and.cargo/audit.tomlare already on main and will drop out. CI failures should resolve. -
Coordinate file format with PR #109. This PR changes the server latency file from
latency_nstowall_send_ns,latency_ns. PR #109'swrite_latency_buffer()writes the old single-field format. Whichever merges second needs to match. Since #109 is closer to approval, consider whether this PR should rebase on top of #109 or vice versa. -
Remove documentation that overlaps with PR #108. The "Streaming Output Columns" and "Test Execution Order" README sections appear in both PRs. Whichever PR owns those sections, the other should drop them to avoid merge conflicts. Since those sections are most relevant here, I'd suggest keeping them in this PR and removing them from #108.
Remove sections that belong in the streaming-timestamps PR (#105), not this buffer-sizing PR: - "Streaming Output Columns" table and timestamp_ns accuracy note - "Test Execution Order" section - Streaming JSON/CSV description rewording - Round-trip CLI example comment expansion README diff now only contains buffer-sizing documentation changes: auto-sizing table, error prevention updates, and example buffer size correction. AI-assisted-by: Claude claude-4.6-opus-high-thinking Made-with: Cursor
…Hangs (#108) * fix: Use fixed 64KB buffer for standalone SHM to enable streaming Previously, standalone SHM mode calculated buffer size to fit ALL messages (e.g., 6.4MB for 50k messages). This caused the writer to dump everything instantly while the reader slowly drained using pthread_cond_timedwait with 500us polling timeouts, leading to huge accumulated latencies (~489ms for 64B). The fix uses a fixed 64KB buffer (or 2x message size if larger), matching container behavior. This enables proper streaming where the writer blocks when the buffer is full. Before: 489ms mean latency for 64B/50k messages After: 1.95ms mean latency for 64B/10k messages (blocking) 15.85ms mean latency for 64B/10k messages (async) Also updates test_transport_config_buffer_size_logic to expect the new fixed-buffer behavior for SHM while keeping the message-count sizing for TCP/UDS. Cherry-picked from container-to-container-ipc branch (3b49877). AI-assisted-by: Claude claude-4.6-opus-high-thinking (Anthropic) Made-with: Cursor * test/docs: Add buffer sizing and SHM backpressure tests, update documentation - Add 9 new tests covering SHM buffer sizing, backpressure, and condvar timed-wait behavior: - test_shm_large_message_buffer_sizing: verifies 2x msg size path when messages exceed 32KB (async) - test_shm_duration_mode_uses_fixed_buffer: verifies SHM gets 64KB in duration mode, not 1GB TCP/UDS default (async) - test_blocking_transport_config_buffer_size_logic: full buffer sizing test for blocking mode (SHM, TCP, PMQ, duration) - test_blocking_shm_duration_mode_uses_fixed_buffer: SHM 64KB in blocking duration mode - test_blocking_shm_large_message_buffer_sizing: 2x path (blocking) - test_backpressure_with_small_buffer: exercises timed condvar wait with 1KB buffer and 20 messages - test_payload_integrity_under_backpressure: byte-level payload verification through backpressure-induced blocking writes - test_ring_buffer_wrap_around_under_backpressure: write_pos wraps the circular buffer multiple times under backpressure - test_shutdown_detected_during_blocked_write: server closes while client is blocked waiting for buffer space - Update README.md Buffer Size Configuration with per-mechanism auto-sizing table and updated error prevention guidance - Update CONFIG.md SHM defaults from 8192 to 64KB (auto) and add automatic buffer sizing explanation - All tests passing, clippy clean AI-assisted-by: Claude claude-4.6-opus-high-thinking (Anthropic) Made-with: Cursor * test/docs: Add coverage tests and fix stale buffer sizing documentation - Add test_user_buffer_size_overrides_shm_default (async + blocking): verifies user-provided --buffer-size overrides SHM's 64KB default - Add test_shm_buffer_sizing_at_32kb_boundary (async + blocking): tests exact transition where 2*(msg_size+64) crosses 64KB threshold - Add test_high_volume_condvar_stress: 100 messages through 512-byte buffer to stress pthread_cond_timedwait retry loop - Update create_transport_config_internal doc comment in benchmark.rs to describe per-mechanism buffer sizing (SHM, PMQ, TCP/UDS) - Update Adaptive Buffer Sizing doc in benchmark_blocking.rs to describe per-mechanism behavior instead of vague description - Fix README.md example output: SharedMemory buffer size 10240000 -> 65536 to reflect new fixed 64KB auto-sizing - All 340 tests passing, zero clippy warnings AI-assisted-by: Claude claude-4.6-opus-high-thinking Made-with: Cursor * fix(msrv): Pin transitive dependencies to maintain Rust 1.70 compatibility The CI MSRV job removes Cargo.lock and resolves fresh dependencies. Several transitive dependencies recently bumped their MSRV above 1.70: - uuid 1.21+ requires Rust 1.85 → pinned to <1.21 - tempfile 3.25+ pulls getrandom >=0.3,<0.5 which resolves to 0.4.x (edition 2024, unparseable by Rust 1.70's cargo) → pinned to <3.25 - zmij 1.0.20+ requires Rust 1.71 → pinned to =1.0.19 - quote 1.0.45+ requires Rust 1.71 → pinned to =1.0.44 - syn 2.0.115+ requires Rust 1.71 → pinned to =2.0.114 - unicode-ident 1.0.23+ requires Rust 1.71 → pinned to =1.0.22 Verified: MSRV builds and tests pass both with and without Cargo.lock in a Rust 1.70 container. Local clippy, fmt, and tests all clean. AI-assisted-by: Claude claude-4.6-opus-high-thinking Made-with: Cursor * ci: trigger CI rebuild with updated MSRV dependency pins No code changes. Forces new CI run to pick up dependency pins from commit cd28295 (uuid <1.21, tempfile <3.25, zmij =1.0.19, quote =1.0.44, syn =2.0.114, unicode-ident =1.0.22). AI-assisted-by: Claude claude-4.6-opus-high-thinking Made-with: Cursor * fix: Remove out-of-scope condvar/polling code per review Remove container-IPC code that was out of scope for issue #107 (buffer sizing fix). This scopes the PR to items 4-5 only. Removed: - write_data_polling() and read_data_polling() fallback functions - pthread_cond_timedwait (reverted to pthread_cond_wait) - Broken-condvar detection (100-iteration/10ms heuristic) - Mutex-lock-failure fallbacks to polling paths - 30-second timeout counters (wait_count > 60000) - test_high_volume_condvar_stress test Restored from main (PR #104): - write_data_blocking() signature with timestamp_offset parameter so latency measurement excludes backpressure wait time - read_data_blocking() with clean pthread_cond_wait - Proper pthread_cond_signal in both write and read paths The two functional regressions cited in review are resolved: 1. Timestamp regression: write_data_polling() lacked timestamp_offset, but that function no longer exists. The only write path now refreshes the timestamp after the condvar wait. 2. Missing condvar signal: write_data_polling() never called pthread_cond_signal(&data_ready), but that function no longer exists. The only write path signals after every write. All tests passing (42/42). No clippy warnings. AI-assisted-by: Claude claude-4.6-opus-high-thinking Made-with: Cursor * docs: Remove out-of-scope README sections per review Remove sections that belong in the streaming-timestamps PR (#105), not this buffer-sizing PR: - "Streaming Output Columns" table and timestamp_ns accuracy note - "Test Execution Order" section - Streaming JSON/CSV description rewording - Round-trip CLI example comment expansion README diff now only contains buffer-sizing documentation changes: auto-sizing table, error prevention updates, and example buffer size correction. AI-assisted-by: Claude claude-4.6-opus-high-thinking Made-with: Cursor * fix: Unify message overhead constant across buffer sizing paths Replace hardcoded 64 and 32 values with the MESSAGE_OVERHEAD constant in both benchmark.rs and benchmark_blocking.rs: - TCP/UDS msg-count sizing: was hardcoded 64, now MESSAGE_OVERHEAD - SHM logging/validation: was hardcoded 32, now MESSAGE_OVERHEAD - Add comment explaining what MESSAGE_OVERHEAD covers: 8 (id) + 8 (timestamp) + 8 (bincode vec length) + 1 (message type) + 4 (ring buffer length prefix) = 29 bytes, rounded to 64 Addresses review feedback about inconsistent overhead values. AI-assisted-by: Claude claude-4.6-opus-high-thinking Made-with: Cursor
Description
Brief description of changes
Type of Change
Testing
Checklist
PR: Fix Streaming Output Timestamps Across All Code Paths
Summary
Streaming output timestamps (
timestamp_nsin JSON/CSV per-messagerecords) were inaccurate across multiple code paths. Instead of
recording when each message was actually sent, timestamps were captured
at record-creation time — either post-test during batch iteration or
at file-read time. This caused all timestamps within a test run to
cluster into the same second, making time-series analysis of streaming
data meaningless.
Because this tool's primary purpose is precision measurement, incorrect
timestamps in the streaming output undermine the validity of per-message
latency records for any downstream analysis or visualization.
Branch:
fix/streaming-timestampsBase:
mainFiles changed: 11 (496 insertions, 60 deletions)
Root Cause
Commit
e0ff7fa(Oct 8, 2025, PR #82) refactored the benchmark runnerto collect latencies inside spawned async futures, then batch-create
MessageLatencyRecordobjects after the futures completed. This movedthe
SystemTime::now()call from inside the hot loop (where itreflected each message's send time) to the post-test iteration (where
it reflected the moment the record was created — all within the same
second).
The initial fix on this branch (
68afe07) changed theMessageLatencyRecordAPI to acceptsend_timestamp_nsas aparameter, but only the blocking round-trip path was actually capturing
timestamps at send time. Three other code path families remained broken.
Problems and Fixes
1. Async round-trip timestamps captured post-test
Problem: The client future returned
Vec<Duration>. After thefuture completed, the post-test loop called
current_timestamp_ns()at record-creation time. All records received the same timestamp.
Fix: Changed the future to capture
current_timestamp_ns()beforeeach
send()and returnVec<(Duration, u64)>. The post-test loopuses the captured timestamp.
Applies to: Both duration-based and count-based async round-trip
loops in
src/benchmark.rs.2. Async combined timestamps captured post-test
Problem: Same pattern as round-trip — the combined test (one-way +
round-trip in a single run) collected latency durations inside the
spawned future and created records post-test with stale timestamps.
Fix: Changed
one_way_latenciestoVec<(Duration, u64)>tocarry the wall-clock timestamp alongside each measurement. The
timestamp is captured before each
send()inside the future.Applies to: Both duration-based and count-based async combined
loops in
src/benchmark.rs.3. One-way timestamps from file-read time instead of send time
Problem: One-way tests use a server process that measures receive
latency and writes results to a temporary file. The original format
was one
latency_nsvalue per line. The client read this filepost-test and called
current_timestamp_ns()at file-read time —all records received the same timestamp.
Fix (server side): Both async and blocking server loops now write
wall_send_ns,latency_nsper line. The wall-clock send time iscomputed as
SystemTime::now() - latency_ns, approximating when themessage entered the IPC channel.
Fix (client side): A new
parse_latency_file_line()functionparses the two-field format. Both async and blocking file readers
use the parsed
wall_send_nsassend_timestamp_ns.Applies to:
src/main.rs(both server loops),src/benchmark.rs(async reader),
src/benchmark_blocking.rs(blocking reader).Known Limitations
Wall-clock / monotonic clock mixing in one-way path
For one-way tests, the server computes the send timestamp by
subtracting the measured monotonic latency from its current wall-clock
time. This mixes two clock domains:
latency_ns=monotonic_receive - monotonic_send(from message timestamp)
wall_send_ns=SystemTime::now() - latency_nsIf NTP adjusts the system clock between message send and receive, the
computed
wall_send_nswill be slightly off. This is the bestapproximation available without clock synchronization between the
client and server processes. The error is bounded by the magnitude
of any NTP adjustment during the test (typically microseconds).
Timestamp capture ordering
In the round-trip and combined futures,
current_timestamp_ns()iscaptured one instruction before
Instant::now(). The wall-clocktimestamp therefore slightly predates the monotonic measurement start.
The gap is single-digit nanoseconds — orders of magnitude below the
IPC latencies being measured.
Tests Added
Unit tests for
parse_latency_file_line(7 tests)test_parse_latency_file_line_valid"170...,42000"test_parse_latency_file_line_zeros"0,0"test_parse_latency_file_line_missing_commatest_parse_latency_file_line_emptytest_parse_latency_file_line_non_numeric_first"abc,789"test_parse_latency_file_line_non_numeric_second"123,xyz"test_parse_latency_file_line_extra_commas"1,2,3"Enhanced end-to-end streaming tests (2 tests)
Both
test_one_way_streaming_captures_send_timestampandtest_round_trip_streaming_captures_send_timestampnow validate:timestamp_nsvalues fall within the test execution window(
before_ns <= ts <= after_ns)Pre-existing timestamp API tests (3 tests in
results.rs)test_new_uses_provided_send_timestamptest_new_combined_uses_provided_send_timestamptest_current_timestamp_ns_returns_recent_valueDocumentation Added
README.md: Streaming Output Columns
Added a new "Streaming Output Columns" section documenting all six
per-message streaming columns with types, descriptions, and nullable
semantics:
timestamp_nsu64message_idu64mechanismstringmessage_sizeu64one_way_latency_nsu64/nullround_trip_latency_nsu64/nullIncludes a note on
timestamp_nsaccuracy for one-way tests(wall-clock / monotonic clock mixing).
Validation
Unit and Integration Tests
Functional Verification
3-second duration benchmarks across all mechanisms with streaming
output enabled. Timestamps validated to span the full test window:
All mechanisms show timestamps distributed across the full test
duration. PMQ delta is 45ms short due to backpressure from shallow
system queue depth (typically 10 messages).
Before/After: Timestamp Distribution
Before fix (main branch): All
timestamp_nsvalues in streamingoutput were within the same second, regardless of test duration. A
3-second test with 200,000+ messages would show all timestamps
clustering around a single epoch second.
After fix: Timestamps span the full test duration. Each message's
timestamp_nsreflects the approximate wall-clock time it was sent,enabling meaningful time-series analysis of per-message latency data.
Files Changed
src/benchmark.rswall_tsinside async round-trip and combined futures before eachsend(); update one-way file reader to parse newwall_send_ns,latency_nsformat; addparse_latency_file_line()with 7 unit tests; enhance 2 end-to-end streaming tests with timestamp validationsrc/benchmark_blocking.rsparse_latency_file_line()src/main.rswall_send_ns,latency_nsper line (waslatency_nsonly)README.mdsrc/results.rsMessageLatencyRecord::new()andnew_combined()acceptsend_timestamp_nsparameter; addcurrent_timestamp_ns()helper; 3 API testssrc/ipc/tcp_socket.rsis_some()/unwrap()toif let(clippy fix)src/ipc/unix_domain_socket.rsCargo.toml<1.21for MSRV compatibility.cargo/audit.tomlCONFIG.mdutils/dashboard/README.mdRisk Assessment
get_monotonic_time_ns()) is untouched. Only the wall-clockmetadata timestamp in streaming records is changed.
schema is unchanged (same columns, same types). The internal
server-to-client latency file format changed from
latency_nstowall_send_ns,latency_ns, but this file is ephemeral (created anddeleted within a single benchmark run) and never exposed to users.