sdk-telemetry: tighten cross-repo counter contract + auto atexit + scheme handling#91
Closed
mmercuri wants to merge 1 commit into
Closed
sdk-telemetry: tighten cross-repo counter contract + auto atexit + scheme handling#91mmercuri wants to merge 1 commit into
mmercuri wants to merge 1 commit into
Conversation
…heme
Three findings from the cross-PR panel audit of the metrics stack:
F24 — Counter label-set divergence from atlas-app Go declaration
atlas-app/apps/shared/observability/metrics.go declares
atlas_sdk_events_total with EXACTLY two labels (surface, event). The
SDK previously carried an "allowlist" that forwarded command +
resource + outcome + status_code to the OTel counter, producing a
3-or-4-label Prometheus series server-side. Dashboards, cardinality
budgets, and TestEmitSiteArity are all built against the 2-label
declaration — the divergence meant some series didn't match the
intended query shape.
Fix: strip ALL attributes from the counter emission. Keep the
``attributes=`` parameter on the event() signature for forward
compatibility, but drop it before counter.add(). Extra dimensions
belong on OTel SPAN attributes, not the counter — and atlas-app's Go
declaration stays the single source of truth for the label schema.
Test renamed from test_attributes_allowlist to
test_counter_labels_match_atlas_app_declaration and asserts the
exact 2-key contract.
F26 — OTLP endpoint scheme vs LAYERLENS_OTLP_INSECURE ambiguity
Previously:
LAYERLENS_OTLP_ENDPOINT=https://otel.layerlens.ai:4317
LAYERLENS_OTLP_INSECURE=true
would pass both to OTLPMetricExporter, and the gRPC exporter's
resolution was version-dependent — sometimes honouring the scheme,
sometimes the flag. Customers couldn't predict which would win.
Fix: detect scheme from the endpoint. https:// → insecure=False,
http:// → insecure=True. The LAYERLENS_OTLP_INSECURE env var is now
only consulted for scheme-less endpoints (the pure "host:port"
form). Strip the scheme before handing the endpoint to the exporter
since the gRPC exporter expects schemeless host:port.
F27 — Non-CLI SDK consumers lost up to 30s of telemetry on exit
PeriodicExportingMetricReader flushes every 30s and on
meter_provider.shutdown(). CLI callers registered
atexit.register(_telemetry.shutdown) explicitly; library consumers
that just imported Stratix and opted in via env had nothing hooking
their process exit. Up to 30s of init/cmd_run events were silently
dropped on a clean exit.
Fix: auto-register atexit.register(shutdown) on first successful
_try_init(). _atexit_registered guard keeps it idempotent, so the
CLI's explicit atexit.register is still a no-op second registration.
Library consumers now flush automatically.
Verification:
PYTHONPATH=src python -m pytest tests/test_telemetry.py -v
=> 16 passed in 0.52s
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacks on #90. F24+F26+F27 from cross-PR panel audit. See commit message for detail. Tests: 16/16 PASS.