Skip to content

Hotfix/observability monitoring fix#774

Closed
vprashrex wants to merge 7 commits into
mainfrom
hotfix/observability-monitoring-fix
Closed

Hotfix/observability monitoring fix#774
vprashrex wants to merge 7 commits into
mainfrom
hotfix/observability-monitoring-fix

Conversation

@vprashrex
Copy link
Copy Markdown
Collaborator

@vprashrex vprashrex commented Apr 21, 2026

Summary

Target issue is #PLEASE_TYPE_ISSUE_NUMBER
Explain the motivation for making this change. What existing problem does the pull request solve?

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

Please add here if any other information is required for the reviewer.

Summary by CodeRabbit

  • New Features

    • Added comprehensive observability and tracing to improve system monitoring and request tracking.
    • Added health check endpoint for system status verification.
  • Improvements

    • Enhanced logging with service identification for better diagnostics.
    • Improved error tracking and context propagation across asynchronous tasks.
    • Strengthened exception handling in collection and LLM operations with better error reporting.
  • Chores

    • Updated dependencies to support observability infrastructure.
    • Refined error handling behavior in backend services.

- Integrated OpenTelemetry tracing into collection creation and deletion processes to improve observability.
- Added logging context for better traceability during job execution.
- Refactored job execution methods to include detailed span attributes and error handling.
- Updated callback mechanisms to ensure success and failure responses are properly logged and sent.
- Improved error handling in LLM job execution, including telemetry for provider calls and response handling.
- Updated the lock file to reflect changes in Python version requirements.
@vprashrex vprashrex requested a review from AkhileshNegi April 21, 2026 06:24
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR introduces comprehensive OpenTelemetry-based observability across the backend, adding structured logging, distributed tracing, and telemetry metrics. It instruments HTTP handlers, API endpoints, Celery tasks, and service execution flows; integrates Sentry for error tracking; and configures trace context propagation from API requests through async job execution.

Changes

Cohort / File(s) Summary
Configuration & Environment
.env.example, backend/app/core/config.py
Added OTEL_ENABLED and OTEL_SERVICE_NAME environment variables and corresponding Settings fields.
Core Telemetry Setup
backend/app/core/telemetry.py
New module providing centralized telemetry initialization (setup_telemetry), structured logging context (log_context, LogContextFilter), GenAI span attribute helpers, Sentry metric recording, and SQLAlchemy/FastAPI instrumentation utilities.
Sentry Integration
backend/app/core/sentry_filters.py
New module for Sentry transaction/span filtering, removing low-signal data (health checks, DB queries, noise endpoints) while preserving meaningful traces.
Logging Enhancement
backend/app/core/logger.py
Refactored root logger configuration into configure_logging() function; added ServiceNameFilter to inject service name into logs; expanded third-party logger suppression; now includes service_name in output format.
Middleware & Request Tracing
backend/app/core/middleware.py
Enhanced http_request_logger with OpenTelemetry span context (method, route, status, duration); integrated Sentry metrics for request counts and durations; added HTTP route resolution helper.
Database Instrumentation
backend/app/core/db.py
Instrumented SQLAlchemy engine for telemetry via instrument_db_engine.
Langfuse Refactoring
backend/app/core/langfuse/langfuse.py
Renamed extract_output_value to extract_response_output; updated LangfuseTracer method signatures to accept more flexible input/output types (Any instead of Dict[str, Any]); simplified control flow.
Application Initialization
backend/app/main.py
Added logging and telemetry setup; integrated Sentry with OTEL-based instrumentation; added /health endpoint; fixed unique ID generation to handle routes without tags.
Authentication Context
backend/app/api/deps.py
Added _set_tenant_span_attributes() to tag spans with user.id and tenant identifiers after successful authentication across all three auth paths.
Collection API Routes
backend/app/api/routes/collections.py
Wrapped create_collection and delete_collection handlers in log_context telemetry blocks; added collection_id to delete job creation.
LLM API Routes
backend/app/api/routes/llm.py
Added log_context and span attribute setting to llm_call and get_llm_call_status; refactored callback validation and job creation inside telemetry context; updated response handling to tolerate missing llm_call.usage.
Celery Worker Observability
backend/app/celery/celery_app.py
Introduced _initialize_worker_observability() for one-time worker setup; added Sentry initialization with OTEL instrumentation; replaced warm_llm_modules prefork hook with initialize_worker_process; registered task_postrun hook for telemetry flushing.
Celery Task Enqueuing
backend/app/celery/utils.py
Added _enqueue_with_trace_context() helper to inject OTel trace headers into task headers; updated all start_* functions to use this helper instead of direct task.delay().
Celery Task Execution
backend/app/celery/tasks/job_execution.py
Added _extract_parent_context() and _run_with_otel_parent() for OTel trace context propagation; wrapped all exported task execution calls to attach parent context when no current span exists.
Collection Service Tracing
backend/app/services/collections/create_collection.py, backend/app/services/collections/delete_collection.py
Added comprehensive OpenTelemetry tracing with spans for job execution and provider operations; wrapped logic in log_context; updated error handling to record exceptions and re-raise; moved exception paths inside tracing context.
LLM Service Tracing
backend/app/services/llm/jobs.py
Extensive tracing additions: nested spans for config resolution, guardrails, provider execution, and response recording; replaced Langfuse decorator with explicit span hierarchy; added record_llm_call_started/finished and GenAI attribute recording; added telemetry flushing in finally blocks; updated callback and error handling with dedicated spans.
Test Updates
backend/app/tests/services/collections/test_*.py
Updated tests to expect execute_job to re-raise exceptions using pytest.raises(...); added new tests for provider factory and local deletion failure scenarios.
Dependencies
backend/pyproject.toml
Added OpenTelemetry core packages and instrumentation libraries for FastAPI, Celery, httpx, requests, and logging.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

enhancement, ready-for-review

Suggested reviewers

  • AkhileshNegi
  • Prajna1999
  • kartpop

Poem

🐰✨ A rabbit hops through traces bright,
Telemetry's a wondrous sight!
With spans and logs now flowing free,
Observability's the key. 🔍🚀

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.42% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Hotfix/observability monitoring fix' is vague and does not clearly describe the main changes. While observability is mentioned, the title does not specify what was actually implemented (OpenTelemetry tracing, logging context, span attributes, etc.) and could apply to many different improvements. Replace with a more specific title that describes the primary changes, e.g., 'Add OpenTelemetry tracing and structured logging context across collections and LLM services' or 'Implement distributed tracing for collection and LLM job execution'.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch hotfix/observability-monitoring-fix
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch hotfix/observability-monitoring-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@vprashrex vprashrex closed this Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant