Skip to content

Conversation

@ulemons
Copy link
Contributor

@ulemons ulemons commented Dec 3, 2025

PR: Dashboard Total Metrics Sink + Kafka Connect Image Update

Overview

This pull request introduces improvements to the data pipeline and infrastructure stability:

🚀 Changes

  • Added a new sink pipeline to aggregate and expose total dashboard metrics
  • Updated the Kafka Connect Docker image to a more reliable and compatible version

Note

Introduces a Tinybird sink to compute and publish global dashboard metrics.

  • New cdp_segment_metrics_total_sink.pipe with nodes aggregating activitiesTotal/activitiesLast30Days, membersTotal/membersLast30Days, and organizationsTotal/organizationsLast30Days using cdp_member_segment_aggregates_MV and cdp_organization_segment_aggregates_MV
  • Activities derived from latest activityRelations_enriched_deduplicated_ds snapshot; last-30-day filters applied via timestamp/lastActive
  • Exports as Kafka sink: connection lfx-oracle-kafka-streaming, topic cdp_dashboard_metrics_total_sink, schedule 0 9 * * *, format json, strategy @new

Written by Cursor Bugbot for commit 6cf808b. This will update automatically on new commits. Configure here.

@ulemons ulemons self-assigned this Dec 3, 2025
@ulemons ulemons added the Feature Created by Linear-GitHub Sync label Dec 3, 2025
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conventional Commits FTW!

@ulemons ulemons changed the title Feat/add sink dashboard metrics feat: add sink dashboard metrics (CM-811) Dec 3, 2025
@ulemons ulemons marked this pull request as ready for review January 7, 2026 09:36
SELECT countState() AS activitiesTotalState
FROM activityRelations_enriched_deduplicated_ds
WHERE
snapshotId = (SELECT max(snapshotId) FROM activityRelations_enriched_deduplicated_ds)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate snapshot queries may cause inconsistent activity counts

Low Severity

The nodes activitiesGlobalAllTimeState and activitiesGlobalLast30State each compute max(snapshotId) via separate subqueries. If the underlying data receives a new snapshot between these evaluations, the two nodes could query different snapshots. This could result in activitiesLast30Days exceeding activitiesTotal, which is logically impossible. Other pipes in the codebase compute max(snapshotId) once per query to avoid this. Extracting the snapshot ID to a shared node would ensure consistency.

🔬 Verification Test

Why verification test was not possible: This is a race condition that requires specific timing where a new snapshot is inserted between the two scalar subquery evaluations during pipe execution. Reproducing this would require access to the Tinybird environment, the ability to insert new snapshots at precise moments during query execution, and control over query timing - none of which are available in this context. The bug is identified through static code analysis comparing this pattern against other pipes in the codebase (like activities_filtered.pipe) which compute max(snapshotId) only once.

Additional Locations (1)

Fix in Cursor Fix in Web

-- member-based global metrics (single scan over cdp_member_segment_aggregates_MV)
SELECT
uniqCombined(memberId) AS membersTotal,
uniqCombinedIf(memberId, lastActive >= now() - INTERVAL 30 DAY) AS membersLast30Days
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing empty string filter for member and organization counts

Medium Severity

The codebase consistently filters out empty string IDs when counting unique members and organizations using patterns like uniq(case when memberId != '' then memberId else null end) (see active_contributors.pipe and active_organizations.pipe). The new sink uses uniqCombined(memberId) and uniqCombined(organizationId) without this filter. If the underlying MVs contain entries with empty string IDs, they would be incorrectly counted as valid entities, inflating the membersTotal, membersLast30Days, organizationsTotal, and organizationsLast30Days metrics.

🔬 Verification Test

Why verification test was not possible: This bug requires access to the Tinybird environment with the actual cdp_member_segment_aggregates_MV and cdp_organization_segment_aggregates_MV materialized views to verify whether they contain entries with empty string IDs. The bug was identified through static code analysis by comparing the pattern used in this new pipe against the established patterns in active_contributors.pipe (lines 31, 85, 103) and active_organizations.pipe (lines 31, 59), which all use the case when ... != '' then ... else null end pattern that this code omits.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature Created by Linear-GitHub Sync

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants