Skip to content

feat: add Prometheus datasource connector for real-time metrics querying#173

Open
RichardoMrMu wants to merge 1 commit intoderisk-ai:mainfrom
RichardoMrMu:feat/prometheus-datasource
Open

feat: add Prometheus datasource connector for real-time metrics querying#173
RichardoMrMu wants to merge 1 commit intoderisk-ai:mainfrom
RichardoMrMu:feat/prometheus-datasource

Conversation

@RichardoMrMu
Copy link
Copy Markdown
Contributor

Summary

Add a PrometheusConnector that enables OpenDerisk agents to query Prometheus HTTP API for real-time metrics data, replacing the dependency on static OpenRCA datasets for metrics analysis.

Motivation

Currently, OpenDerisk's SRE agents rely on static OpenRCA datasets (CSV files) for metrics analysis during root cause diagnosis. This limits the system to offline/historical analysis and prevents real-time monitoring and diagnostics.

By adding a Prometheus datasource connector, agents can:

  • Query live production metrics via PromQL
  • Perform real-time anomaly detection using actual Prometheus data
  • Access scrape targets, alerting rules, and active alerts for comprehensive diagnostics
  • Integrate with existing Prometheus-based monitoring infrastructure

Changes

Action File Lines
Added packages/derisk-ext/src/derisk_ext/datasource/conn_prometheus.py +460
Added packages/derisk-ext/src/derisk_ext/datasource/tests/__init__.py +0
Added packages/derisk-ext/src/derisk_ext/datasource/tests/test_conn_prometheus.py +229

Key Features

  • Instant query — PromQL evaluation at a single point in time
  • Range query — PromQL evaluation over a time range with configurable step
  • Series discovery — Find time series matching label selectors
  • Label enumeration — List label names and values
  • Target and rule inspection — Scrape targets, alerting/recording rules
  • Active alerts listing — Current firing alerts
  • Health check — Prometheus server health verification
  • Metric metadata — Type, help text, and unit information

Design Decisions

  • Inherits from BaseConnector (not RDBMSConnector) since Prometheus is a time-series database, not a relational database
  • Uses requests library for HTTP API calls (already a project dependency)
  • Implements run() method for compatibility with the BaseConnector interface, treating the command as a PromQL expression
  • Supports authentication via basic auth and custom headers
  • SSL/TLS configurable for production environments

Usage Example

from derisk_ext.datasource.conn_prometheus import PrometheusConnector

connector = PrometheusConnector(
    host="prometheus.example.com",
    port=9090,
    scheme="https",
)

# Instant query
results = connector.instant_query('up{job="derisk"}')

# Range query
results = connector.range_query(
    query='rate(http_requests_total[5m])',
    start='2024-01-01T00:00:00Z',
    end='2024-01-01T01:00:00Z',
    step='60s',
)

# Check health
is_healthy = connector.check_health()

# Get active alerts
alerts = connector.get_alerts()

Testing

Comprehensive unit tests with mocked HTTP responses covering:

  • Parameter configuration and URL construction
  • Instant and range query execution
  • run() interface compatibility
  • Health check success/failure
  • Error handling (API errors, connection failures)
  • Result formatting for both instant and range query results

Related

Add PrometheusConnector that enables OpenDerisk agents to query
Prometheus HTTP API for real-time metrics data, replacing the
dependency on static OpenRCA datasets.

Features:
- Instant query (PromQL evaluation at a single point in time)
- Range query (PromQL evaluation over a time range with configurable step)
- Series discovery (find time series matching label selectors)
- Label enumeration (list label names and values)
- Target and rule inspection (scrape targets, alerting/recording rules)
- Active alerts listing
- Health check endpoint
- Metric metadata retrieval
- Basic authentication and custom headers support
- SSL/TLS configuration
- Formatted output compatible with BaseConnector.run() interface

The connector inherits from BaseConnector and can be used by SRE agents
for real-time diagnostics and root cause analysis.

Includes comprehensive unit tests with mocked HTTP responses.
@github-actions github-actions bot added the enhancement New feature or request label Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant