-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
ref(llm-detection): Refactor Seer integration to fetch traces via RPC #104485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
|
||
| NUM_TRANSACTIONS_TO_PROCESS = 20 | ||
| LOWER_SPAN_LIMIT = 20 | ||
| UPPER_SPAN_LIMIT = 500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these will be handled on the seer side
c36bdc3 to
f36abaf
Compare
52a714f to
d0ece3c
Compare
| if processed_count >= NUM_TRANSACTIONS_TO_PROCESS: | ||
| break | ||
| seer_request = { | ||
| "telemetry": [{**trace.dict(), "kind": "trace"} for trace in evidence_traces], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like we could use better variable names here since it's just the id/name instead of an actual trace now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree - cleaned it up on the seer side, updating this pr to match
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #104485 +/- ##
========================================
Coverage 80.52% 80.52%
========================================
Files 9330 9330
Lines 400645 400699 +54
Branches 25689 25689
========================================
+ Hits 322624 322669 +45
- Misses 77555 77564 +9
Partials 466 466 |
d0ece3c to
ae4fc8a
Compare
f7a2008 to
ffc52e9
Compare
| raw_response_data = response.json() | ||
| response_data = IssueDetectionResponse.parse_obj(raw_response_data) | ||
| except (ValueError, TypeError, ValidationError) as e: | ||
| raise LLMIssueDetectionError( | ||
| message="Seer response parsing error", | ||
| status=response.status, | ||
| project_id=project_id, | ||
| organization_id=organization_id, | ||
| response_data=response.data.decode("utf-8"), | ||
| error_message=str(e), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Task fails due to unhandled LLMIssueDetectionError from Seer API.
Severity: CRITICAL | Confidence: High
🔍 Detailed Analysis
The detect_llm_issues_for_project task lacks proper error handling for failures originating from the Seer API. If make_signed_seer_api_request (line 249) fails, or if the response status is not 2xx (lines 255-262), or if JSON parsing fails (lines 264-275), an LLMIssueDetectionError is raised uncaught. This causes the entire task to fail, preventing any issue detection for the project, unlike the previous implementation which handled such errors gracefully.
💡 Suggested Fix
Implement a try-except block around the Seer API request and response processing to catch LLMIssueDetectionError and log it, allowing the task to continue or retry gracefully.
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/sentry/tasks/llm_issue_detection/detection.py#L255-L275
Potential issue: The `detect_llm_issues_for_project` task lacks proper error handling
for failures originating from the Seer API. If `make_signed_seer_api_request` (line 249)
fails, or if the response status is not 2xx (lines 255-262), or if JSON parsing fails
(lines 264-275), an `LLMIssueDetectionError` is raised uncaught. This causes the entire
task to fail, preventing any issue detection for the project, unlike the previous
implementation which handled such errors gracefully.
Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 6587752
Problem
The LLM issue detection task was fetching full span data for every trace in Sentry, then sending bits of that telemetry to Seer in individual requests. We want to use EAPTrace instead which would include much more data in a format better optimized for llm analysis. This requires a significant restructuring of the request/response formats between this task and its seer endpoint.
There was also a lil bug in how we were selecting traces for each transaction - cleared that up and introduced a tiny bit of variation to trace selection logic.
Solution
Changed the request/response flow so Sentry sends only trace IDs to Seer in a single bundled request. Now, Seer fetches the full
EAPTracedata itself via Sentry's existingget_trace_waterfallRPC endpoint and uses that as the input for llm detection.Changes to Sentry → Seer Request
Before:
trace_id,project_id,transaction_name,total_spans,spans: list[Span]After:
trace_idand normalizedtransaction_nameEAPTracedata via RPCChanges to Seer → Sentry Response
Updated
DetectedIssuemodel to include context fields:trace_id: str- which trace the issue was found intransaction_name: str- normalized transaction nameTrace Selection Logic
sum(span.duration)over 30-minute windowBreaking Changes
This is a breaking change to the Seer integration. Deployment requires:
issue-detection.llm-detection.enabled = false)This will not impact any customers.