lib,src,test: fix race during tracing toggles#441
lib,src,test: fix race during tracing toggles#441santigimeno wants to merge 2 commits intonode-v24.x-nsolid-v6.xfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (9)
🚧 Files skipped from review as they are similar to previous changes (5)
WalkthroughThe diff centralizes trace flags into a cached Changes
Sequence DiagramsequenceDiagram
participant NB as NativeBinding
participant NT as nsolid_trace
participant OC as OTEL_Core
participant TR as Tracer
NB->>NT: toggleTracingFn(flags: uint32)
activate NT
NT->>NT: currentTraceFlags = flags
NT->>OC: emit flagsUpdated(flags)
deactivate NT
activate OC
alt flags > 0
OC->>OC: enable API/instrumentation
else flags == 0
OC->>OC: disable API/instrumentation
end
deactivate OC
TR->>NT: getTraceFlags()
activate NT
NT-->>TR: currentTraceFlags
deactivate NT
alt !options.internal && flags == 0
TR->>TR: return invalidSpanContext
else
TR->>TR: create span using flags bitmask
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/common/nsolid-grpc-agent/client.js`:
- Around line 127-129: The CLI top-level trace handling is missing support for
the "fetch" trace, so invoking the helper with args.values.trace === 'fetch'
no-ops; update the CLI flow that checks args.values.trace (the direct path
starting around where args.values.trace is evaluated) to handle 'fetch' by
invoking the same logic as handleTrace() does for fetch (i.e., call
execFetchTransaction() or delegate to handleTrace('fetch')), ensuring the code
paths that currently only handle other trace values also include a branch for
'fetch' so -t fetch triggers the expected behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f5e7f60a-979e-47db-a74a-f6aa8b1ec80b
📒 Files selected for processing (8)
lib/internal/nsolid_trace.jslib/internal/otel/core.jslib/internal/otel/trace.jssrc/nsolid/nsolid_api.cctest/addons/nsolid-tracing/binding.cctest/addons/nsolid-tracing/test-otel-fetch-enable-disable-race.jstest/agents/test-grpc-tracing-race.mjstest/common/nsolid-grpc-agent/client.js
Keep JS tracing decisions tied to the env-local tracing state materialized by the native toggle path instead of rereading live flags in multiple places. Tracing flags can be updated from any thread, so reading them directly during span creation can observe a mid-flight toggle. That could make Tracer.startSpan() return a NonRecordingSpan after the caller had already passed an internal tracing gate, leading to crashes like TypeError: span._pushSpanDataString is not a function. Pass the full tracing bitmask from EnvList::update_tracing_flags() into JS, cache it in lib/internal/nsolid_trace.js, and make the internal OTel code consume that cached per-thread state. This removes the split-brain behavior where callback binding/unbinding followed flagsUpdated but Tracer.startSpan() could independently observe a later disable and return a NonRecordingSpan. For internal spans, stop Tracer.startSpan() from rechecking the current trace flags after the caller has already crossed an internal tracing gate. That keeps internal span creation locally consistent while tracing is being enabled and disabled and avoids crashes in code paths that expect a real N|Solid span object. Also add targeted repro coverage for the toggle race: - extend the tracing test addon with configurable setupTracing(flags), stopTracing(), and skipExpectedTracesCheck() - add a deterministic fetch-based repro that toggles HTTP client tracing while concurrent fetch() traffic is in flight - add a grpc-based repro harness for tracing reconfiguration and cover both fetch and http traffic - teach the grpc agent test client how to generate fetch transactions Together these changes make trace flags materialize per env/thread in JS, preserve the caller's local tracing decision for internal spans, and add regression coverage for the tracing enable/disable race.
03ccece to
c6ab0ff
Compare
Keep JS tracing decisions tied to the env-local tracing state materialized by the native toggle path instead of rereading live flags in multiple places.
Tracing flags can be updated from any thread, so reading them directly during span creation can observe a mid-flight toggle. That could make Tracer.startSpan() return a NonRecordingSpan after the caller had already passed an internal tracing gate, leading to crashes like TypeError: span._pushSpanDataString is not a function.
Pass the full tracing bitmask from EnvList::update_tracing_flags() into JS, cache it in lib/internal/nsolid_trace.js, and make the internal OTel code consume that cached per-thread state. This removes the split-brain behavior where callback binding/unbinding followed flagsUpdated but Tracer.startSpan() could independently observe a later disable and return a NonRecordingSpan.
For internal spans, stop Tracer.startSpan() from rechecking the current trace flags after the caller has already crossed an internal tracing gate. That keeps internal span creation locally consistent while tracing is being enabled and disabled and avoids crashes in code paths that expect a real N|Solid span object.
Also add targeted repro coverage for the toggle race:
Together these changes make trace flags materialize per env/thread in JS, preserve the caller's local tracing decision for internal spans, and add regression coverage for the tracing enable/disable race.
Summary by CodeRabbit
Refactor
Tests