Skip to content

fix(sync): close data race on catchup() count read (F-32, #112)#160

Merged
adequatelimited merged 1 commit intoaudit-fixesfrom
audit/F-32-catchup-count-race
Apr 13, 2026
Merged

fix(sync): close data race on catchup() count read (F-32, #112)#160
adequatelimited merged 1 commit intoaudit-fixesfrom
audit/F-32-catchup-count-race

Conversation

@adequatelimited
Copy link
Copy Markdown
Collaborator

Summary

Closes #112 (F-32) — partial. The underflow concern in the original report was analyzed and determined not to be a real risk; see this comment on the issue for the per-thread decrement trace.

The real issue is the data race on the shared `count` read at sync.c:136:

  • Reader: `while (count && ecode == VEOK)` — unsynchronized.
  • Writers: `count--` at sync.c:148 and sync.c:172 — both inside `OMP_CRITICAL_`.

Per the OpenMP memory model, the flush on critical-exit is not paired with an unsynchronized reader, so the read can observe stale cached values (and TSan will flag it as a race).

Marking the parameter `volatile` forces a memory read each iteration. Combined with the writers' implicit flush on critical-exit, this establishes the needed happens-before. Same treatment applied to the analogous `ecode` read in `tfile.c` under F-10.

Test plan

  • Full build clean under `-Wall -Werror -Wextra -Wpedantic`.
  • One-character change (`word32` → `volatile word32`); no behavioral diff on x86, removes the UB by the C11/OpenMP memory model.

@adequatelimited adequatelimited force-pushed the audit/F-32-catchup-count-race branch from 9512c23 to 7328cd6 Compare April 13, 2026 23:07
The while(count && ecode == VEOK) condition at sync.c:138 reads the
shared count outside any OMP_CRITICAL_ section, while the two
decrements are inside critical sections. Per the OpenMP memory model,
the flush on critical-exit is not paired with an unsynchronized
reader, so the read can observe stale cached values (or be a data
race by C11; TSan will flag it).

Shadow the parameter with a local `volatile word32 count = count_in`.
volatile forces a memory read each iteration; combined with the
writers' implicit flush on critical-exit, this establishes the
needed happens-before. Same treatment applied to the analogous
ecode read in tfile.c under F-10.

Analyzed the underflow claim from the original issue and confirmed it
is not reachable -- each thread's loop body decrements at most once
before breaking. See github.com//issues/112 comment
for the trace.
@adequatelimited adequatelimited force-pushed the audit/F-32-catchup-count-race branch from 7328cd6 to b4c2d47 Compare April 13, 2026 23:07
@adequatelimited adequatelimited merged commit 9d8b31d into audit-fixes Apr 13, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant