Skip to content

Fix/buffer size overloading - Fix SHM Buffer Over-Sizing and Condvar Hangs#108

Merged
ewchong merged 8 commits intomainfrom
fix/buffer-size-overloading
Mar 31, 2026
Merged

Fix/buffer size overloading - Fix SHM Buffer Over-Sizing and Condvar Hangs#108
ewchong merged 8 commits intomainfrom
fix/buffer-size-overloading

Conversation

@mcurrier2
Copy link
Copy Markdown
Collaborator

Description

Brief description of changes

Type of Change

  • [ x] Bug fix
  • New feature
  • Breaking change
  • [x ] Documentation update

Testing

  • [ x] Tests pass locally
  • Added tests for new functionality
  • [ x] Updated documentation

Checklist

  • Code follows style guidelines
  • Self-review completed
  • Comments added for complex code
  • Documentation updated
  • No breaking changes (or marked as breaking)

PR: Fix SHM Buffer Over-Sizing and Condvar Hangs

Summary

Shared memory (SHM) benchmarks produced misleading latency measurements
because the auto-sized ring buffer was large enough to hold all messages
at once. The writer dumped everything instantly while the reader slowly
drained the buffer, causing artificially huge accumulated latencies for
later messages. Additionally, the blocking SHM ring buffer used
pthread_cond_wait (infinite wait) for synchronization, which could
hang indefinitely when condition variable signals were missed — a known
failure mode in containerized environments with glibc ABI mismatches.

This PR replaces the all-messages-at-once buffer sizing with a fixed
64KB streaming buffer for SHM, and replaces infinite condvar waits
with timed waits plus a polling fallback for robustness.

Branch: fix/buffer-size-overloading
Base: main
Files changed: 6 (802 insertions, 43 deletions)


Problems and Fixes

1. SHM buffer sized to fit all messages (batching instead of streaming)

Problem: When --buffer-size was omitted, both async and blocking
benchmark runners auto-sized the SHM ring buffer to fit every message:
msg_count * (msg_size + 64). For a 10,000-message × 1KB benchmark,
this created a ~10MB buffer. The writer filled it instantly without
ever blocking, while the reader drained messages one at a time. Later
messages sat in the buffer for seconds, accumulating artificial latency
that had nothing to do with IPC performance.

Before:

// Buffer size logic (same for all mechanisms):
let buffer_size = self.config.buffer_size.unwrap_or_else(|| {
    if is_pmq {
        PMQ_SAFE_DEFAULT_BUFFER_SIZE
    } else if self.config.duration.is_some() {
        DURATION_MODE_BUFFER_SIZE
    } else {
        // For message-count mode, size the buffer to fit all messages.
        self.get_msg_count() * (self.config.message_size + 64)
    }
});

After:

// Buffer size logic (per-mechanism):
let is_shm = self.mechanism == IpcMechanism::SharedMemory;
const SHM_DEFAULT_BUFFER_SIZE: usize = 65536; // 64KB
const MESSAGE_OVERHEAD: usize = 64;

let buffer_size = self.config.buffer_size.unwrap_or_else(|| {
    if is_pmq {
        PMQ_SAFE_DEFAULT_BUFFER_SIZE
    } else if is_shm {
        // Fixed buffer for SHM to enable streaming, not batching
        let min_for_message = (self.config.message_size + MESSAGE_OVERHEAD) * 2;
        std::cmp::max(SHM_DEFAULT_BUFFER_SIZE, min_for_message)
    } else if self.config.duration.is_some() {
        DURATION_MODE_BUFFER_SIZE
    } else {
        // TCP/UDS handle backpressure via kernel buffers
        self.get_msg_count() * (self.config.message_size + 64)
    }
});

Effect: SHM now uses a fixed 64KB ring buffer (or 2x message size
for messages larger than 32KB). The writer blocks when the buffer is
full, producing per-message latencies that reflect actual IPC
throughput rather than queue drain time.

Mechanism Old Auto-Size New Auto-Size
SHM msg_count * (msg_size + 64) Fixed 64 KB (or 2x msg size)
PMQ 8,192 bytes Unchanged
TCP/UDS (msg-count) msg_count * (msg_size + 64) Unchanged
TCP/UDS (duration) 1 GB Unchanged

Applies to: src/benchmark.rs (async) and
src/benchmark_blocking.rs (blocking) — identical logic in both.

2. SHM condvar infinite wait causing hangs

Problem: write_data_blocking() and read_data_blocking() used
pthread_cond_wait() with no timeout. If the condition variable signal
was missed (common in cross-process scenarios, especially in containers
with glibc ABI mismatches), the thread would hang indefinitely.

Before:

unsafe fn write_data_blocking(&self, data: &[u8]) -> Result<()> {
    // ...
    libc::pthread_mutex_lock(&self.mutex as *const _ as *mut _);

    while self.available_write_space() < required_space {
        if self.shutdown.load(Ordering::Acquire) {
            libc::pthread_mutex_unlock(&self.mutex as *const _ as *mut _);
            return Err(anyhow!("Connection closed"));
        }

        // Infinite wait — hangs if signal is missed
        libc::pthread_cond_wait(
            &self.space_ready as *const _ as *mut _,
            &self.mutex as *const _ as *mut _,
        );
    }
    // ...
}

After:

unsafe fn write_data_blocking(&self, data: &[u8]) -> Result<()> {
    // ...
    let lock_result = libc::pthread_mutex_lock(&self.mutex as *const _ as *mut _);
    if lock_result != 0 {
        return self.write_data_polling(data); // Fallback
    }

    let mut wait_count = 0;
    let loop_start = std::time::Instant::now();
    while self.available_write_space() < required_space {
        if self.shutdown.load(Ordering::Acquire) { /* ... */ }

        // Detect broken condvar (100+ iterations in <10ms)
        if wait_count >= 100 && loop_start.elapsed() < Duration::from_millis(10) {
            libc::pthread_mutex_unlock(&self.mutex as *const _ as *mut _);
            return self.write_data_polling(data);
        }

        // Timed wait (500µs) instead of infinite wait
        let mut timeout = libc::timespec { tv_sec: 0, tv_nsec: 0 };
        libc::clock_gettime(libc::CLOCK_REALTIME, &mut timeout);
        timeout.tv_nsec += 500_000; // 500µs
        if timeout.tv_nsec >= 1_000_000_000 {
            timeout.tv_sec += 1;
            timeout.tv_nsec -= 1_000_000_000;
        }

        libc::pthread_cond_timedwait(
            &self.space_ready as *const _ as *mut _,
            &self.mutex as *const _ as *mut _,
            &timeout,
        );

        wait_count += 1;
        if wait_count > 60000 { // 30s timeout (60000 * 500µs)
            libc::pthread_mutex_unlock(&self.mutex as *const _ as *mut _);
            return Err(anyhow!("Timeout waiting for buffer space"));
        }
    }
    // ...
}

The same pattern was applied to read_data_blocking().

Changes:

  1. pthread_cond_waitpthread_cond_timedwait with 500µs timeout
  2. Broken-condvar detection (100 iterations in <10ms → polling fallback)
  3. 30-second overall timeout (60,000 × 500µs)
  4. Mutex lock failure → polling fallback

3. No fallback when pthread primitives fail

Problem: If pthread_mutex_lock failed or pthread_cond_timedwait
malfunctioned (both possible in containerized environments with
mismatched glibc versions), there was no alternative code path. The
operation would either crash or hang.

Fix: Added write_data_polling() and read_data_polling() methods
that use thread::sleep(Duration::from_micros(100)) polling with a
30-second timeout. These are activated when:

  • pthread_mutex_lock returns non-zero
  • The broken-condvar detector fires (100+ iterations in <10ms)
unsafe fn write_data_polling(&self, data: &[u8]) -> Result<()> {
    let start = std::time::Instant::now();
    let timeout = Duration::from_secs(30);

    while self.available_write_space() < required_space {
        if self.shutdown.load(Ordering::Acquire) {
            return Err(anyhow!("Connection closed"));
        }
        if start.elapsed() > timeout {
            return Err(anyhow!(
                "Timeout waiting for buffer space (polling fallback)"
            ));
        }
        thread::sleep(Duration::from_micros(100));
    }
    // ... write data using atomics only (no mutex) ...
}

Applies to: src/ipc/shared_memory_blocking.rs — both
write_data_polling() and read_data_polling().

4. SHM buffer warning log was misleading

Problem: When the SHM buffer was smaller than total data volume,
the benchmark logged a warn! saying "This may cause backpressure,
which is a valid test scenario." With the new fixed 64KB buffer,
this warning fired on virtually every SHM run and was no longer
indicating an unusual condition — it was the intended design.

Before:

warn!(
    "Buffer size ({} bytes) is smaller than the total data size \
     ({} bytes). This may cause backpressure, which is a valid \
     test scenario.",
    buffer_size, total_message_data
);

After:

debug!(
    "SHM using fixed {}KB buffer for {} bytes of data \
     - streaming mode enabled",
    buffer_size / 1024, total_message_data
);

Applies to: Both src/benchmark.rs and
src/benchmark_blocking.rs.


Tests Added

Buffer sizing tests (5 tests)

Test File Coverage
test_transport_config_buffer_size_logic (updated) benchmark.rs Added Scenario 2a: SHM uses fixed 64KB, not total; split TCP/UDS into Scenario 2b
test_shm_large_message_buffer_sizing benchmark.rs SHM auto-size uses 2x msg size when message > 32KB (40KB message → 80,128 byte buffer)
test_shm_duration_mode_uses_fixed_buffer benchmark.rs SHM in duration mode gets 64KB, not the 1GB TCP/UDS default
test_blocking_transport_config_buffer_size_logic benchmark_blocking.rs 5-scenario blocking test: user-provided, SHM fixed, TCP msg-count, TCP duration, PMQ safe default
test_blocking_shm_duration_mode_uses_fixed_buffer benchmark_blocking.rs Blocking SHM in duration mode gets 64KB
test_blocking_shm_large_message_buffer_sizing benchmark_blocking.rs Blocking SHM uses 2x msg size for large messages

SHM backpressure and robustness tests (4 tests)

Test File Coverage
test_backpressure_with_small_buffer shared_memory_blocking.rs 20 messages × 200 bytes through 1KB buffer — writer must block repeatedly, exercises pthread_cond_timedwait path
test_payload_integrity_under_backpressure shared_memory_blocking.rs 10 messages × 256 bytes through 512-byte buffer — verifies byte-level payload correctness after backpressure-induced blocking
test_ring_buffer_wrap_around_under_backpressure shared_memory_blocking.rs 50 messages × 128 bytes through 1KB buffer — write position wraps the circular buffer multiple times
test_shutdown_detected_during_blocked_write shared_memory_blocking.rs Server closes after reading 1 of 2 messages while writer is blocked on full buffer — verifies shutdown/timeout error

Documentation Updated

CONFIG.md

  • --buffer-size default changed from 8192 to *auto*
  • SHM buffer_size default changed from 8192 to 64 KB (auto)
  • Added auto-sizing explanation paragraph
  • SHM optimal settings: concurrency → 1, buffer guidance →
    "leave at automatic"

README.md

  • Replaced Buffer Size Configuration prose with per-mechanism
    auto-sizing table
  • Updated Error Prevention troubleshooting for new SHM timeout
    messages

Validation

Unit and Integration Tests

$ cargo test
test result: ok. 262 passed; 0 failed; 1 ignored  (unit)
+ 27 integration tests, 42 doc tests — 335 total

$ cargo clippy --all-targets -- -D warnings
# zero warnings

$ cargo fmt --check
# clean

Before/After: SHM Buffer Sizing (Actual Test Results)

Test command (identical on both branches):

./target/release/ipc-benchmark -m shm --blocking -i 1000 -s 1024 \
    --one-way -w 0 --streaming-output-csv output.csv

Before fix (main branch):

Buffer Size:  1,088,000 bytes (auto-sized to fit all 1000 messages)

Benchmark Results:
  One-Way Latency:
           Mean: 2.56 ms, P95: 2.78 ms, P99: 2.78 ms
           Min:  169.11 us, Max: 2.78 ms

Per-message latency from streaming CSV:
  Msg 1:        169 µs
  Msg 250:    2,768 µs  (16x worse than msg 1)
  Msg 500:    2,594 µs
  Msg 750:    2,635 µs
  Msg 1000:   2,592 µs

The writer dumps all 1000 messages into the 1MB buffer instantly.
The reader drains them one at a time. By message 250, latency has
climbed to 2.7ms and stays there — this is queue-drain time, not
IPC latency.

After fix (fix/buffer-size-overloading branch):

Buffer Size:  65,536 bytes (fixed 64KB streaming buffer)

Benchmark Results:
  One-Way Latency:
           Mean: 304.54 us, P95: 667.65 us, P99: 683.52 us
           Min:  10.03 us, Max: 687.00 us

Per-message latency from streaming CSV:
  Msg 1:        187 µs
  Msg 250:      354 µs  (1.9x msg 1)
  Msg 500:      282 µs
  Msg 750:       10 µs
  Msg 1000:      14 µs

The writer blocks when the 64KB buffer fills (~60 messages), waits
for the reader to drain space, then continues. Every message's
latency reflects actual IPC throughput.

Improvement:

Metric Before (main) After (fix) Improvement
Mean latency 2,560 µs 304 µs 8.4x
Max latency 2,780 µs 687 µs 4.0x
Last-message latency 2,592 µs 14 µs 185x
Buffer size 1,088,000 bytes 65,536 bytes 16.6x smaller
Throughput 18.31 MB/s 17.11 MB/s ~same

Throughput is nearly identical because the total data transfer rate
is the same — the fix changes when messages are written, not how
fast
the IPC channel moves data.

Before/After: Condvar Wait Behavior

Before fix (main branch): write_data_blocking() and
read_data_blocking() used pthread_cond_wait() with no timeout.
If a condition variable signal was missed (common in cross-process
scenarios, especially in containers with glibc ABI mismatches), the
thread would hang indefinitely — no timeout, no error, no recovery.

After fix: pthread_cond_timedwait with 500µs timeout means a
missed signal costs at most 500µs of latency, not an infinite hang.
If the condvar is completely broken (100 no-ops in <10ms), the code
falls back to polling. If nothing works, a 30-second timeout produces
a clear error message:

Error: Timeout waiting for buffer space
Error: Timeout waiting for data
Error: Timeout waiting for buffer space (polling fallback)
Error: Timeout waiting for data (polling fallback)

Files Changed

File Change
src/ipc/shared_memory_blocking.rs Replace pthread_cond_wait with pthread_cond_timedwait (500µs) in both write_data_blocking and read_data_blocking; add broken-condvar detection; add write_data_polling() and read_data_polling() fallbacks; add mutex lock failure handling; add 30-second overall timeouts; 4 new backpressure/robustness tests
src/benchmark.rs Add SHM-specific fixed 64KB buffer sizing path; downgrade SHM buffer log from warn! to debug!; update existing buffer test; add 2 new SHM buffer sizing tests
src/benchmark_blocking.rs Mirror all async buffer sizing changes for blocking mode; add Duration import; add 3 new blocking buffer sizing tests
README.md Rewrite Buffer Size Configuration with per-mechanism auto-sizing table; update Error Prevention troubleshooting
CONFIG.md Update --buffer-size default to auto; update SHM buffer_size default and optimal settings
.cargo/audit.toml Ignore MSRV-pinned dependency advisories (RUSTSEC-2026-0007, RUSTSEC-2026-0009)

Risk Assessment

  • Low risk. Buffer sizing changes only affect the default when
    --buffer-size is omitted. User-provided values are always
    respected. TCP, UDS, and PMQ behavior is completely unchanged.
  • SHM condvar changes are additive. The primary path still uses
    pthread condvars — just with a timeout instead of infinite wait.
    The polling fallback only activates on detected failures.
  • Backward compatible. No CLI changes, no output format changes.
    Users who were explicitly passing --buffer-size see no difference.
  • All 335 tests pass with zero clippy warnings.
  • 9 new tests specifically cover the buffer sizing logic and SHM
    backpressure behavior introduced by this branch.

Copy link
Copy Markdown
Collaborator

@dustinblack dustinblack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #108 Review

Buffer sizing fix — looks good

The core change (fixed 64KB SHM buffer instead of sizing to fit all messages) is well-motivated and the before/after data is compelling. The buffer sizing logic in benchmark.rs and benchmark_blocking.rs is clean, the warn!debug! downgrade makes sense, and the test coverage is thorough (boundary tests, large messages, duration mode, user override).

Condvar/polling changes must be removed

src/ipc/shared_memory_blocking.rs:286-432 (write path) and src/ipc/shared_memory_blocking.rs:437-598 (read path)

The pthread_cond_timedwait, broken-condvar detection, write_data_polling(), and read_data_polling() are the same container-IPC code that was removed from PR #104 at review (ba2e60c). The container-to-container IPC work is out of scope and will be re-architected in the future, so this code should not be introduced here.

Beyond the scoping concern, this code has two functional problems:

  1. Regresses PR #104's timestamp fix. write_data_polling() has no timestamp_offset parameter, so when the polling fallback triggers, the timestamp is not updated before the write. This undoes the latency accuracy fix from #104.

  2. Missing condvar signal after polling write. write_data_polling() writes data and updates write_pos but never calls pthread_cond_signal(&data_ready). A reader blocked on the condvar path won't wake until its 500µs timed wait expires. PR #103 fixed this same class of issue.

Please scope this PR to issue #107 items 4-5 (buffer sizing and tests). The condvar/polling work (items 1-3) should be excluded — those issue items should be moved to a future PR when the container-IPC architecture is ready.

Minor: overlapping files

.cargo/audit.toml, Cargo.lock, and Cargo.toml changes overlap with PRs #103, #104, and #105. Whichever merges last will need to resolve conflicts. Not a blocker, just a heads-up.

@mcurrier2
Copy link
Copy Markdown
Collaborator Author

Issue #107 updated to reflect this PR only addresses:

Fix SHM automatic buffer sizing to use a fixed 64 KB buffer (or 2x message size for large messages) instead of sizing the ring buffer to fit all messages at once. A small buffer forces natural writer/reader pacing through backpressure, producing latencies representative of steady-state IPC performance.

Update test_transport_config_buffer_size_logic to validate the new SHM-specific fixed-buffer behavior while preserving message-count-based sizing for TCP/UDS.

Here's a summary of what was removed/restored in src/ipc/shared_memory_blocking.rs (net -270 lines):

Removed:

  1. write_data_polling() — the polling fallback for writes (no mutex, no condvar signal)
  2. read_data_polling() — the polling fallback for reads
  3. pthread_cond_timedwait — replaced with pthread_cond_wait in both write and read paths
  4. Broken-condvar detection logic (100-iteration / 10ms heuristic)
  5. Mutex-lock-failure fallbacks (falling back to polling on pthread_mutex_lock != 0)
  6. 30-second timeout counters (wait_count > 60000)
  7. test_high_volume_condvar_stress test

Restored from main:

  • write_data_blocking() signature with data: &mut [u8] and timestamp_offset: Option<Range> — timestamp is refreshed inside the function after the condvar wait, so latency excludes backpressure time
  • read_data_blocking() with clean pthread_cond_wait
  • Call site now passes &mut serialized and Some(Message::timestamp_offset())

All validation passes: cargo fmt, cargo check, cargo test (42 passed), cargo clippy (0 warnings).

Both functional problems are fully resolved by the removal. Here's why:

  1. Timestamp regression: write_data_polling() was the code path that lacked the timestamp_offset parameter. We deleted that function entirely. The only remaining write path is write_data_blocking(), which now takes timestamp_offset: Option<Range> and refreshes the timestamp after the condvar wait, right before the memory write. There is no fallback path that could bypass it.

  2. Missing condvar signal: write_data_polling() was the code path that wrote data without calling pthread_cond_signal(&data_ready). Again, that function no longer exists. The only remaining write path (write_data_blocking) calls pthread_cond_signal(&data_ready) after every write. And on the read side, read_data_polling() is also gone — read_data_blocking() always uses pthread_cond_wait, and always calls pthread_cond_signal(&space_ready) after consuming data.

Both problems existed exclusively in the polling fallback functions and the broken-condvar detection logic that routed into them. Since all of that code is removed, neither issue can occur. The write and read paths now match exactly what's on main (from PR #104), with clean pthread_cond_wait and proper signaling in both directions.

Copy link
Copy Markdown
Collaborator

@dustinblack dustinblack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for removing the condvar/polling code. The buffer sizing fix looks good and is ready to go, but this branch needs a rebase on main before I can approve — it's based on e826b20, missing PRs #103 and #104.

After rebasing, please also clean up the duplication:

  • shared_memory_blocking.rs timestamp fix — already merged via #104. After rebase this should drop out automatically, but verify no duplicate changes remain.
  • .cargo/audit.toml — already on main via #103. Same.
  • README "Streaming Output Columns" and "Test Execution Order" sections — these are out of scope for a buffer-sizing PR. Please remove them; if they're needed, they belong in a separate PR with their own justification.

The MSRV pins in Cargo.toml (uuid <1.21, zmij, quote, syn, unicode-ident, tempfile <3.25) look like legitimate CI fixes and are fine to keep.

@dustinblack dustinblack mentioned this pull request Mar 25, 2026
5 tasks
Previously, standalone SHM mode calculated buffer size to fit ALL
messages (e.g., 6.4MB for 50k messages). This caused the writer to
dump everything instantly while the reader slowly drained using
pthread_cond_timedwait with 500us polling timeouts, leading to huge
accumulated latencies (~489ms for 64B).

The fix uses a fixed 64KB buffer (or 2x message size if larger),
matching container behavior. This enables proper streaming where the
writer blocks when the buffer is full.

Before: 489ms mean latency for 64B/50k messages
After:  1.95ms mean latency for 64B/10k messages (blocking)
        15.85ms mean latency for 64B/10k messages (async)

Also updates test_transport_config_buffer_size_logic to expect the
new fixed-buffer behavior for SHM while keeping the message-count
sizing for TCP/UDS.

Cherry-picked from container-to-container-ipc branch (3b49877).

AI-assisted-by: Claude claude-4.6-opus-high-thinking (Anthropic)
Made-with: Cursor
…entation

- Add 9 new tests covering SHM buffer sizing, backpressure, and
  condvar timed-wait behavior:
  - test_shm_large_message_buffer_sizing: verifies 2x msg size path
    when messages exceed 32KB (async)
  - test_shm_duration_mode_uses_fixed_buffer: verifies SHM gets 64KB
    in duration mode, not 1GB TCP/UDS default (async)
  - test_blocking_transport_config_buffer_size_logic: full buffer
    sizing test for blocking mode (SHM, TCP, PMQ, duration)
  - test_blocking_shm_duration_mode_uses_fixed_buffer: SHM 64KB in
    blocking duration mode
  - test_blocking_shm_large_message_buffer_sizing: 2x path (blocking)
  - test_backpressure_with_small_buffer: exercises timed condvar wait
    with 1KB buffer and 20 messages
  - test_payload_integrity_under_backpressure: byte-level payload
    verification through backpressure-induced blocking writes
  - test_ring_buffer_wrap_around_under_backpressure: write_pos wraps
    the circular buffer multiple times under backpressure
  - test_shutdown_detected_during_blocked_write: server closes while
    client is blocked waiting for buffer space
- Update README.md Buffer Size Configuration with per-mechanism
  auto-sizing table and updated error prevention guidance
- Update CONFIG.md SHM defaults from 8192 to 64KB (auto) and add
  automatic buffer sizing explanation
- All tests passing, clippy clean

AI-assisted-by: Claude claude-4.6-opus-high-thinking (Anthropic)
Made-with: Cursor
- Add test_user_buffer_size_overrides_shm_default (async + blocking):
  verifies user-provided --buffer-size overrides SHM's 64KB default
- Add test_shm_buffer_sizing_at_32kb_boundary (async + blocking):
  tests exact transition where 2*(msg_size+64) crosses 64KB threshold
- Add test_high_volume_condvar_stress: 100 messages through 512-byte
  buffer to stress pthread_cond_timedwait retry loop
- Update create_transport_config_internal doc comment in benchmark.rs
  to describe per-mechanism buffer sizing (SHM, PMQ, TCP/UDS)
- Update Adaptive Buffer Sizing doc in benchmark_blocking.rs to
  describe per-mechanism behavior instead of vague description
- Fix README.md example output: SharedMemory buffer size 10240000 ->
  65536 to reflect new fixed 64KB auto-sizing
- All 340 tests passing, zero clippy warnings

AI-assisted-by: Claude claude-4.6-opus-high-thinking
Made-with: Cursor
…ility

The CI MSRV job removes Cargo.lock and resolves fresh dependencies.
Several transitive dependencies recently bumped their MSRV above 1.70:

- uuid 1.21+ requires Rust 1.85 → pinned to <1.21
- tempfile 3.25+ pulls getrandom >=0.3,<0.5 which resolves to 0.4.x
  (edition 2024, unparseable by Rust 1.70's cargo) → pinned to <3.25
- zmij 1.0.20+ requires Rust 1.71 → pinned to =1.0.19
- quote 1.0.45+ requires Rust 1.71 → pinned to =1.0.44
- syn 2.0.115+ requires Rust 1.71 → pinned to =2.0.114
- unicode-ident 1.0.23+ requires Rust 1.71 → pinned to =1.0.22

Verified: MSRV builds and tests pass both with and without Cargo.lock
in a Rust 1.70 container. Local clippy, fmt, and tests all clean.

AI-assisted-by: Claude claude-4.6-opus-high-thinking
Made-with: Cursor
No code changes. Forces new CI run to pick up dependency pins
from commit cd28295 (uuid <1.21, tempfile <3.25, zmij =1.0.19,
quote =1.0.44, syn =2.0.114, unicode-ident =1.0.22).

AI-assisted-by: Claude claude-4.6-opus-high-thinking
Made-with: Cursor
Remove container-IPC code that was out of scope for issue #107
(buffer sizing fix). This scopes the PR to items 4-5 only.

Removed:
- write_data_polling() and read_data_polling() fallback functions
- pthread_cond_timedwait (reverted to pthread_cond_wait)
- Broken-condvar detection (100-iteration/10ms heuristic)
- Mutex-lock-failure fallbacks to polling paths
- 30-second timeout counters (wait_count > 60000)
- test_high_volume_condvar_stress test

Restored from main (PR #104):
- write_data_blocking() signature with timestamp_offset parameter
  so latency measurement excludes backpressure wait time
- read_data_blocking() with clean pthread_cond_wait
- Proper pthread_cond_signal in both write and read paths

The two functional regressions cited in review are resolved:
1. Timestamp regression: write_data_polling() lacked timestamp_offset,
   but that function no longer exists. The only write path now refreshes
   the timestamp after the condvar wait.
2. Missing condvar signal: write_data_polling() never called
   pthread_cond_signal(&data_ready), but that function no longer exists.
   The only write path signals after every write.

All tests passing (42/42). No clippy warnings.

AI-assisted-by: Claude claude-4.6-opus-high-thinking
Made-with: Cursor
Remove sections that belong in the streaming-timestamps PR (#105),
not this buffer-sizing PR:

- "Streaming Output Columns" table and timestamp_ns accuracy note
- "Test Execution Order" section
- Streaming JSON/CSV description rewording
- Round-trip CLI example comment expansion

README diff now only contains buffer-sizing documentation changes:
auto-sizing table, error prevention updates, and example buffer
size correction.

AI-assisted-by: Claude claude-4.6-opus-high-thinking
Made-with: Cursor
@mcurrier2 mcurrier2 force-pushed the fix/buffer-size-overloading branch from 44aaa3f to c64e906 Compare March 25, 2026 14:47
@mcurrier2
Copy link
Copy Markdown
Collaborator Author

PR #108 is updated with 7 clean commits rebased on main:

a7690b9 — Fix: Use fixed 64KB buffer for standalone SHM
c1e9537 — Tests/docs: Buffer sizing and SHM backpressure tests
cd879f4 — Tests/docs: Coverage tests and stale doc fixes
970e408 — Fix(msrv): Pin transitive dependencies for Rust 1.70
2ecf3e9 — CI: Trigger rebuild with MSRV pins
8925b1d — Fix: Remove out-of-scope condvar/polling code per review
c64e906 — Docs: Remove out-of-scope README sections per review

All items addressed:

No duplicate .cargo/audit.toml (dropped by rebase)
No duplicate shared_memory_blocking.rs timestamp/condvar code (dropped by rebase)
Out-of-scope README sections removed

@mcurrier2 mcurrier2 requested a review from dustinblack March 25, 2026 15:04
dustinblack
dustinblack previously approved these changes Mar 25, 2026
Copy link
Copy Markdown
Collaborator

@dustinblack dustinblack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase looks clean — duplication from #103/#104 dropped out, out-of-scope README sections removed. The remaining diff is focused on buffer sizing with good test coverage.

The beta CI failure (tcp_one_way_blocking_smoke) is a flaky integration test unrelated to these changes — all other platforms and stable pass.

Approved.

@github-actions
Copy link
Copy Markdown

📈 Changed lines coverage: 83.33% (10/12)

🚨 Uncovered lines in this PR

  • src/benchmark.rs: 1681
  • src/benchmark_blocking.rs: 595

📊 Code Coverage Summary

File Line Coverage Uncovered Lines
src/benchmark.rs 81.12%
(477/588)
75, 78, 89, 93, 102, 105, 107, 124, 392, 397-402, 409-414, 481-484, 589, 673, 679-681, 685-687, 707, 776-778, 783, 804, 809, 827, 933, 937-940, 942, 951-954, 956, 1032, 1063, 1066, 1068-1069, 1078-1079, 1132-1136, 1138, 1221, 1234, 1241, 1250, 1270-1274, 1276, 1372, 1381, 1383-1384, 1387-1388, 1394-1395, 1400-1401, 1403, 1408-1409, 1413-1415, 1420-1421, 1424-1425, 1457, 1515-1518, 1520-1526, 1528, 1531, 1681, 1696
src/benchmark_blocking.rs 74.24%
(317/427)
97, 111, 127, 263, 369, 375-377, 380-382, 402, 434, 488, 583, 595, 609, 639-642, 723-726, 745, 749, 764, 806-808, 811, 814-816, 818, 821, 823-827, 829-830, 838-842, 844-848, 851-852, 856-857, 892, 941, 1020, 1031, 1061, 1064, 1131-1135, 1137, 1192-1195, 1200, 1213, 1216-1219, 1223-1225, 1227, 1229-1230, 1232-1233, 1236, 1238-1242, 1244, 1248-1249, 1251, 1253, 1274, 1286-1290, 1292, 1312-1315
src/cli.rs 92.39%
(85/92)
630, 729, 769, 771, 792-794
src/execution_mode.rs 100.00%
(14/14)
``
src/ipc/mod.rs 65.28%
(47/72)
115, 425, 427-430, 740-741, 756-757, 775-776, 807, 810, 813, 818, 845-846, 860, 862, 882, 884, 1007-1009
src/ipc/posix_message_queue.rs 46.09%
(59/128)
139-140, 213-215, 217, 224, 229, 332-335, 337, 345, 437, 441-442, 446, 449-452, 454-458, 539, 679, 782, 789-790, 807-808, 819-820, 831-832, 849-850, 906, 910-911, 914-919, 921-923, 927, 929-931, 933, 935-937, 941-943, 945-947, 994-995, 1017
src/ipc/posix_message_queue_blocking.rs 81.94%
(127/155)
172, 182, 221, 251-255, 274, 325, 368, 387-390, 416-418, 422-423, 425-426, 436, 455, 457-458, 460-461
src/ipc/shared_memory.rs 69.36%
(163/235)
61, 141, 145, 246-247, 257-258, 262, 390-391, 417-419, 421, 439-441, 443-444, 446-450, 467, 474, 480, 483-484, 488, 492, 496-497, 502-503, 666-667, 670-671, 674, 676, 681-682, 709-710, 713-714, 721-723, 725, 727-732, 734-735, 738-739, 741-745, 752, 782, 784-785, 787, 791
src/ipc/shared_memory_blocking.rs 77.74%
(206/265)
196-198, 200-201, 204-206, 209-210, 212, 217, 219, 223-225, 230, 238-240, 243-245, 248-249, 251, 254, 257-258, 261-262, 266-267, 269, 273-274, 276, 310-312, 318-319, 378-379, 403-407, 498, 506, 556, 573, 660, 726, 789, 798, 808, 830
src/ipc/shared_memory_direct.rs 83.80%
(150/179)
372-375, 444-451, 455, 482, 506-509, 513-514, 556-557, 569, 598, 605-606, 629-630, 636
src/ipc/tcp_socket.rs 59.43%
(63/106)
31-32, 61, 96, 113-114, 118, 124-125, 129, 136-137, 141, 147-148, 152, 171-172, 175-177, 184-185, 188, 362-363, 366-367, 370-371, 376-377, 422, 429, 447-449, 478, 480-482, 484, 487
src/ipc/tcp_socket_blocking.rs 97.62%
(82/84)
134, 159
src/ipc/unix_domain_socket.rs 59.43%
(63/106)
29-30, 58, 93, 103, 122-123, 127, 133-134, 138, 145-146, 150, 156-157, 161, 180-181, 184-186, 193-194, 197, 346-347, 350-351, 354-355, 360-361, 412-414, 443, 445-447, 449, 452, 468
src/ipc/unix_domain_socket_blocking.rs 94.34%
(100/106)
276-277, 283-285, 287
src/logging.rs 100.00%
(13/13)
``
src/main.rs 42.39%
(142/335)
83-85, 87, 124-125, 135-139, 143-145, 147-148, 150-151, 171-174, 198-202, 210, 216, 219, 224-227, 232-233, 239, 245, 247-249, 251, 257-258, 264, 269, 272-273, 277, 279-280, 284-285, 287, 293, 297-298, 300-305, 307-308, 311, 320, 323-324, 327, 374-377, 384, 386-390, 393-396, 398-399, 401-402, 404, 406-412, 416, 418-421, 424, 428-430, 434, 436, 439, 443, 448-451, 457-458, 464-465, 471, 473-474, 478, 480, 485-487, 491, 494-495, 497-498, 503, 505-507, 511-512, 514, 521, 526-527, 529-534, 536-537, 541, 550, 553-554, 557, 559, 578, 585, 589-591, 593, 623-624, 632, 665, 716, 720, 723-726, 775-778, 815-816, 823-824, 827, 854-855, 858, 912-913, 917-920, 960, 969, 974, 979-980
src/metrics.rs 79.79%
(150/188)
455-460, 493-494, 552, 558, 579-582, 732-734, 736, 768, 788, 833, 838, 881, 904, 923-924, 926-927, 930-932, 952, 980, 984, 1005, 1007-1008, 1013
src/results.rs 56.92%
(255/448)
717, 726-728, 730-731, 734-735, 738, 760, 763-764, 767, 769, 772, 776-781, 791-792, 795-800, 817, 829-830, 832, 834, 837-838, 840, 844, 871, 895-897, 900-901, 905-907, 910, 936, 941, 946, 952, 971, 973-974, 976, 978-982, 984, 986-987, 1021, 1062-1063, 1066, 1072-1073, 1077, 1081-1083, 1085-1086, 1110-1114, 1117-1120, 1123-1130, 1140-1141, 1160-1161, 1163-1167, 1169, 1186-1187, 1189-1194, 1196, 1214, 1216-1221, 1239, 1242, 1258-1259, 1274-1276, 1278-1280, 1282-1283, 1285-1286, 1288-1289, 1291, 1293-1294, 1296-1299, 1301-1303, 1305-1307, 1310, 1314-1315, 1323-1328, 1330-1331, 1335-1336, 1340-1342, 1344, 1348-1349, 1358-1361, 1365-1367, 1371, 1373, 1376, 1381-1382, 1387, 1394-1398, 1400, 1598-1599, 1819-1820, 1822-1823, 1828
src/results_blocking.rs 95.48%
(296/310)
489-490, 492-493, 544, 769, 774, 779, 815, 818-819, 827-828, 886
src/utils.rs 70.73%
(29/41)
71, 143, 147-149, 153, 159, 198-202
Total 72.92%
(2838/3892)

@sberg-rh
Copy link
Copy Markdown

Question: Intentional difference in message overhead values?

I noticed that MESSAGE_OVERHEAD = 64 is defined and used for SHM buffer sizing in both benchmark.rs and benchmark_blocking.rs, but there are two other places that use different hardcoded values in the same function:

  1. SHM sizing uses the constant:

    const MESSAGE_OVERHEAD: usize = 64;
    let min_for_message = (self.config.message_size + MESSAGE_OVERHEAD) * 2;
  2. TCP/UDS sizing uses a hardcoded 64:

    self.get_msg_count() * (self.config.message_size + 64)
  3. Validation/logging check uses a hardcoded 32:

    self.get_msg_count() * (self.config.message_size + 32)

Is the 32 in the validation path intentionally different from the 64 used elsewhere? I wasn't sure if that reflects a different overhead calculation for logging purposes or if it should be using MESSAGE_OVERHEAD as well. Might be worth a quick comment if it's intentional.

Applies to:

  • src/benchmark.rs (lines ~1658, ~1677)
  • src/benchmark_blocking.rs (lines ~574, ~591)

Replace hardcoded 64 and 32 values with the MESSAGE_OVERHEAD
constant in both benchmark.rs and benchmark_blocking.rs:

- TCP/UDS msg-count sizing: was hardcoded 64, now MESSAGE_OVERHEAD
- SHM logging/validation: was hardcoded 32, now MESSAGE_OVERHEAD
- Add comment explaining what MESSAGE_OVERHEAD covers: 8 (id) +
  8 (timestamp) + 8 (bincode vec length) + 1 (message type) +
  4 (ring buffer length prefix) = 29 bytes, rounded to 64

Addresses review feedback about inconsistent overhead values.

AI-assisted-by: Claude claude-4.6-opus-high-thinking
Made-with: Cursor
@mcurrier2
Copy link
Copy Markdown
Collaborator Author

It should always be 64 bytes.
This has been changed to specify the 64 byte constant - MESSAGE_OVERHEAD, where it was being hardcoded as either 32 or 64. A comment is added describing what makes up the "overhead" in each program.

@github-actions
Copy link
Copy Markdown

📈 Changed lines coverage: 88.89% (16/18)

🚨 Uncovered lines in this PR

  • src/benchmark.rs: 1686
  • src/benchmark_blocking.rs: 600

📊 Code Coverage Summary

File Line Coverage Uncovered Lines
src/benchmark.rs 81.15%
(478/589)
75, 78, 89, 93, 102, 105, 107, 124, 392, 397-402, 409-414, 481-484, 589, 673, 679-681, 685-687, 707, 776-778, 783, 804, 809, 827, 933, 937-940, 942, 951-954, 956, 1032, 1063, 1066, 1068-1069, 1078-1079, 1132-1136, 1138, 1221, 1234, 1241, 1250, 1270-1274, 1276, 1372, 1381, 1383-1384, 1387-1388, 1394-1395, 1400-1401, 1403, 1408-1409, 1413-1415, 1420-1421, 1424-1425, 1457, 1515-1518, 1520-1526, 1528, 1531, 1686, 1701
src/benchmark_blocking.rs 74.30%
(318/428)
97, 111, 127, 263, 369, 375-377, 380-382, 402, 434, 488, 587, 600, 614, 644-647, 728-731, 750, 754, 769, 811-813, 816, 819-821, 823, 826, 828-832, 834-835, 843-847, 849-853, 856-857, 861-862, 897, 946, 1025, 1036, 1066, 1069, 1136-1140, 1142, 1197-1200, 1205, 1218, 1221-1224, 1228-1230, 1232, 1234-1235, 1237-1238, 1241, 1243-1247, 1249, 1253-1254, 1256, 1258, 1279, 1291-1295, 1297, 1317-1320
src/cli.rs 92.39%
(85/92)
630, 729, 769, 771, 792-794
src/execution_mode.rs 100.00%
(14/14)
``
src/ipc/mod.rs 65.28%
(47/72)
115, 425, 427-430, 740-741, 756-757, 775-776, 807, 810, 813, 818, 845-846, 860, 862, 882, 884, 1007-1009
src/ipc/posix_message_queue.rs 46.09%
(59/128)
139-140, 213-215, 217, 224, 229, 332-335, 337, 345, 437, 441-442, 446, 449-452, 454-458, 539, 679, 782, 789-790, 807-808, 819-820, 831-832, 849-850, 906, 910-911, 914-919, 921-923, 927, 929-931, 933, 935-937, 941-943, 945-947, 994-995, 1017
src/ipc/posix_message_queue_blocking.rs 81.94%
(127/155)
172, 182, 221, 251-255, 274, 325, 368, 387-390, 416-418, 422-423, 425-426, 436, 455, 457-458, 460-461
src/ipc/shared_memory.rs 69.36%
(163/235)
61, 141, 145, 246-247, 257-258, 262, 390-391, 417-419, 421, 439-441, 443-444, 446-450, 467, 474, 480, 483-484, 488, 492, 496-497, 502-503, 666-667, 670-671, 674, 676, 681-682, 709-710, 713-714, 721-723, 725, 727-732, 734-735, 738-739, 741-745, 752, 782, 784-785, 787, 791
src/ipc/shared_memory_blocking.rs 78.87%
(209/265)
196-198, 200-201, 204-206, 209-210, 212, 217, 219, 223-225, 230, 238-240, 243-245, 248-249, 251, 254, 257-258, 261-262, 266-267, 269, 273-274, 276, 311-312, 378-379, 403-407, 498, 506, 556, 573, 660, 726, 789, 798, 808, 830
src/ipc/shared_memory_direct.rs 83.80%
(150/179)
372-375, 444-451, 455, 482, 506-509, 513-514, 556-557, 569, 598, 605-606, 629-630, 636
src/ipc/tcp_socket.rs 59.43%
(63/106)
31-32, 61, 96, 113-114, 118, 124-125, 129, 136-137, 141, 147-148, 152, 171-172, 175-177, 184-185, 188, 362-363, 366-367, 370-371, 376-377, 422, 429, 447-449, 478, 480-482, 484, 487
src/ipc/tcp_socket_blocking.rs 97.62%
(82/84)
134, 159
src/ipc/unix_domain_socket.rs 59.43%
(63/106)
29-30, 58, 93, 103, 122-123, 127, 133-134, 138, 145-146, 150, 156-157, 161, 180-181, 184-186, 193-194, 197, 346-347, 350-351, 354-355, 360-361, 412-414, 443, 445-447, 449, 452, 468
src/ipc/unix_domain_socket_blocking.rs 94.34%
(100/106)
276-277, 283-285, 287
src/logging.rs 100.00%
(13/13)
``
src/main.rs 42.39%
(142/335)
83-85, 87, 124-125, 135-139, 143-145, 147-148, 150-151, 171-174, 198-202, 210, 216, 219, 224-227, 232-233, 239, 245, 247-249, 251, 257-258, 264, 269, 272-273, 277, 279-280, 284-285, 287, 293, 297-298, 300-305, 307-308, 311, 320, 323-324, 327, 374-377, 384, 386-390, 393-396, 398-399, 401-402, 404, 406-412, 416, 418-421, 424, 428-430, 434, 436, 439, 443, 448-451, 457-458, 464-465, 471, 473-474, 478, 480, 485-487, 491, 494-495, 497-498, 503, 505-507, 511-512, 514, 521, 526-527, 529-534, 536-537, 541, 550, 553-554, 557, 559, 578, 585, 589-591, 593, 623-624, 632, 665, 716, 720, 723-726, 775-778, 815-816, 823-824, 827, 854-855, 858, 912-913, 917-920, 960, 969, 974, 979-980
src/metrics.rs 79.79%
(150/188)
455-460, 493-494, 552, 558, 579-582, 732-734, 736, 768, 788, 833, 838, 881, 904, 923-924, 926-927, 930-932, 952, 980, 984, 1005, 1007-1008, 1013
src/results.rs 56.92%
(255/448)
717, 726-728, 730-731, 734-735, 738, 760, 763-764, 767, 769, 772, 776-781, 791-792, 795-800, 817, 829-830, 832, 834, 837-838, 840, 844, 871, 895-897, 900-901, 905-907, 910, 936, 941, 946, 952, 971, 973-974, 976, 978-982, 984, 986-987, 1021, 1062-1063, 1066, 1072-1073, 1077, 1081-1083, 1085-1086, 1110-1114, 1117-1120, 1123-1130, 1140-1141, 1160-1161, 1163-1167, 1169, 1186-1187, 1189-1194, 1196, 1214, 1216-1221, 1239, 1242, 1258-1259, 1274-1276, 1278-1280, 1282-1283, 1285-1286, 1288-1289, 1291, 1293-1294, 1296-1299, 1301-1303, 1305-1307, 1310, 1314-1315, 1323-1328, 1330-1331, 1335-1336, 1340-1342, 1344, 1348-1349, 1358-1361, 1365-1367, 1371, 1373, 1376, 1381-1382, 1387, 1394-1398, 1400, 1598-1599, 1819-1820, 1822-1823, 1828
src/results_blocking.rs 95.48%
(296/310)
489-490, 492-493, 544, 769, 774, 779, 815, 818-819, 827-828, 886
src/utils.rs 70.73%
(29/41)
71, 143, 147-149, 153, 159, 198-202
Total 73.01%
(2843/3894)

Copy link
Copy Markdown

@sberg-rh sberg-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with latest updates! Approved

@mcurrier2 mcurrier2 requested a review from ewchong March 30, 2026 13:39
@ewchong ewchong merged commit f7f8eb9 into main Mar 31, 2026
12 checks passed
@ewchong ewchong deleted the fix/buffer-size-overloading branch March 31, 2026 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix SHM buffer sizing and condvar latency spikes causing unreliable shared memory benchmarks

4 participants