Skip to content

x402d :9090/metrics returns HTTP 500 (method-label corruption) → native metrics scrape down #6

Description

@anvztor

Summary

x402d serves Prometheus metrics on :9090/metrics, but the endpoint returns HTTP 500 Internal Server Error. The process itself is healthy and serving traffic — only the metrics endpoint is broken. Suspected cause: corrupted/invalid method label on a metric (label-cardinality / invalid-label-value), which makes the Prometheus client registry fail to encode the exposition output.

Symptom / Impact

  • GET :9090/metricsHTTP/1.1 500 Internal Server Error (no body served).
  • Prometheus cannot scrape the monitoring/x402d-native target → up=0.
  • Mainnet alerts firing as a correct consequence:
    • X402dMetricsTargetDown (critical → telegram-critical) since 2026-06-03 10:53 UTC
    • generic TargetDown for job="monitoring/x402d-native" (warning) since 2026-06-03 11:00 UTC
  • All native x402d metrics are currently blind on mainnet (no scrape data).

Environment

  • Cluster: mainnet goat-mainnet-usw2 (acct 361027011257, us-west-2)
  • Namespace: goat-network-app
  • Pod (at time of report): goat-x402d-649cdfc67b-jnc5jRunning 1/1, 53d uptime, 0 restarts
  • Instance scraped: 10.102.113.110:9090

Repro

kubectl exec -n goat-network-app goat-x402d-649cdfc67b-jnc5j -c goat-x402d -- \
  sh -c 'wget -S -qO- http://localhost:9090/metrics'
# => HTTP/1.1 500 Internal Server Error

Suspected fix area

  • Metrics registration / labeling path where the method label is set (likely an HTTP middleware/instrumentation recording raw or non-normalized request method, producing an invalid UTF-8 / high-cardinality / empty label value).
  • Check the metrics handler for an error swallowed into a 500 during gather()/encode, and normalize/validate the method label before observing.

Ops note

While this is being fixed, a time-boxed (7-day) Alertmanager silence scoped to job="monitoring/x402d-native" is being applied to suppress X402dMetricsTargetDown + the generic TargetDown for this job. The alert rules are NOT being deleted — they should re-fire if this isn't resolved when the silence expires. Real x402 balance alerts (X402GoatTssBalanceLow, X402OtherChainTssBalanceLow) are intentionally left firing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions