-
Notifications
You must be signed in to change notification settings - Fork 727
feat: add sink dashboard metrics (CM-811) #3670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conventional Commits FTW!
services/libs/tinybird/pipes/cdp_dashboard_metrics_total_sink.pipe
Outdated
Show resolved
Hide resolved
| SELECT countState() AS activitiesTotalState | ||
| FROM activityRelations_enriched_deduplicated_ds | ||
| WHERE | ||
| snapshotId = (SELECT max(snapshotId) FROM activityRelations_enriched_deduplicated_ds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate snapshot queries may cause inconsistent activity counts
Low Severity
The nodes activitiesGlobalAllTimeState and activitiesGlobalLast30State each compute max(snapshotId) via separate subqueries. If the underlying data receives a new snapshot between these evaluations, the two nodes could query different snapshots. This could result in activitiesLast30Days exceeding activitiesTotal, which is logically impossible. Other pipes in the codebase compute max(snapshotId) once per query to avoid this. Extracting the snapshot ID to a shared node would ensure consistency.
🔬 Verification Test
Why verification test was not possible: This is a race condition that requires specific timing where a new snapshot is inserted between the two scalar subquery evaluations during pipe execution. Reproducing this would require access to the Tinybird environment, the ability to insert new snapshots at precise moments during query execution, and control over query timing - none of which are available in this context. The bug is identified through static code analysis comparing this pattern against other pipes in the codebase (like activities_filtered.pipe) which compute max(snapshotId) only once.
Additional Locations (1)
| -- member-based global metrics (single scan over cdp_member_segment_aggregates_MV) | ||
| SELECT | ||
| uniqCombined(memberId) AS membersTotal, | ||
| uniqCombinedIf(memberId, lastActive >= now() - INTERVAL 30 DAY) AS membersLast30Days |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing empty string filter for member and organization counts
Medium Severity
The codebase consistently filters out empty string IDs when counting unique members and organizations using patterns like uniq(case when memberId != '' then memberId else null end) (see active_contributors.pipe and active_organizations.pipe). The new sink uses uniqCombined(memberId) and uniqCombined(organizationId) without this filter. If the underlying MVs contain entries with empty string IDs, they would be incorrectly counted as valid entities, inflating the membersTotal, membersLast30Days, organizationsTotal, and organizationsLast30Days metrics.
🔬 Verification Test
Why verification test was not possible: This bug requires access to the Tinybird environment with the actual cdp_member_segment_aggregates_MV and cdp_organization_segment_aggregates_MV materialized views to verify whether they contain entries with empty string IDs. The bug was identified through static code analysis by comparing the pattern used in this new pipe against the established patterns in active_contributors.pipe (lines 31, 85, 103) and active_organizations.pipe (lines 31, 59), which all use the case when ... != '' then ... else null end pattern that this code omits.
PR: Dashboard Total Metrics Sink + Kafka Connect Image Update
Overview
This pull request introduces improvements to the data pipeline and infrastructure stability:
🚀 Changes
Note
Introduces a Tinybird sink to compute and publish global dashboard metrics.
cdp_segment_metrics_total_sink.pipewith nodes aggregatingactivitiesTotal/activitiesLast30Days,membersTotal/membersLast30Days, andorganizationsTotal/organizationsLast30Daysusingcdp_member_segment_aggregates_MVandcdp_organization_segment_aggregates_MVactivityRelations_enriched_deduplicated_dssnapshot; last-30-day filters applied viatimestamp/lastActivelfx-oracle-kafka-streaming, topiccdp_dashboard_metrics_total_sink, schedule0 9 * * *, formatjson, strategy@newWritten by Cursor Bugbot for commit 6cf808b. This will update automatically on new commits. Configure here.