feat(CLP-JSON): Add Messages and Files stats display to CLP-JSON Web UI Ingest page.#2293
feat(CLP-JSON): Add Messages and Files stats display to CLP-JSON Web UI Ingest page.#2293junhaoliao wants to merge 2 commits into
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughThis PR extends the archive metadata pipeline to track and persist ChangesArchive Statistics Tracking Enhancement
Sequence DiagramsequenceDiagram
participant ArchiveWriter
participant ArchiveStats
participant CompressionTask
participant Database
participant DetailsUI as Details UI
ArchiveWriter->>ArchiveWriter: increment m_num_files on range open
ArchiveWriter->>ArchiveStats: construct with m_num_files, m_num_messages
ArchiveStats->>ArchiveStats: serialize to JSON
CompressionTask->>Database: insert archive stats with num_messages, num_files
DetailsUI->>Database: query archives table
Database->>DetailsUI: return aggregated SUM(num_messages), SUM(num_files)
DetailsUI->>DetailsUI: render unified Messages and Files components
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py (1)
23-42:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd an explicit schema migration path for existing
archivestables.
CREATE TABLE IF NOT EXISTSwon’t add new columns to already-existing tables. With the new inserts expectingnum_messagesandnum_files, upgrades can fail at runtime unless these columns are backfilled via migration.Suggested fix direction
def _create_archives_table(db_cursor, archives_table_name: str) -> None: db_cursor.execute( f""" CREATE TABLE IF NOT EXISTS `{archives_table_name}` ( ... `num_messages` BIGINT NULL DEFAULT NULL, `num_files` BIGINT NULL DEFAULT NULL, ... ) """ ) + + # Ensure upgrades add newly introduced nullable columns on existing tables. + # Implement with your existing migration strategy/dialect-safe checks. + # Example: conditionally ALTER TABLE when column is missing.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py` around lines 23 - 42, The current _create_archives_table uses CREATE TABLE IF NOT EXISTS which does not add new columns to existing tables; add an idempotent migration step after the CREATE that checks information_schema (or equivalent) for the presence of the columns `num_messages` and `num_files` on the table named by archives_table_name and, if missing, runs an ALTER TABLE to add them with the intended types (BIGINT NULL DEFAULT NULL) and constraints; ensure this migration is safe to re-run (skip ALTER if column exists), backfill any needed default values if desired, and log/report failures so runtime inserts referencing num_messages/num_files will not error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py`:
- Around line 23-42: The current _create_archives_table uses CREATE TABLE IF NOT
EXISTS which does not add new columns to existing tables; add an idempotent
migration step after the CREATE that checks information_schema (or equivalent)
for the presence of the columns `num_messages` and `num_files` on the table
named by archives_table_name and, if missing, runs an ALTER TABLE to add them
with the intended types (BIGINT NULL DEFAULT NULL) and constraints; ensure this
migration is safe to re-run (skip ALTER if column exists), backfill any needed
default values if desired, and log/report failures so runtime inserts
referencing num_messages/num_files will not error.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7cc1c17f-5180-4fe2-9b6a-bffef29ebbad
📒 Files selected for processing (8)
components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.pycomponents/core/src/clp/streaming_archive/Constants.hppcomponents/core/src/clp_s/ArchiveWriter.cppcomponents/core/src/clp_s/ArchiveWriter.hppcomponents/job-orchestration/job_orchestration/executor/compress/compression_task.pycomponents/webui/client/src/pages/IngestPage/Details/index.tsxcomponents/webui/client/src/pages/IngestPage/Details/sql.tscomponents/webui/client/src/pages/IngestPage/sqlConfig.ts
…um_files` stat. Previously, `m_num_files` was incremented in `add_field_to_current_range()` every time a new range was opened, which over-counted files when a single large input file was split across multiple archives. Now the increment is moved to the callers (`ingest_json`/`ingest_kvier`) where `file_split_number` is available, so only the first split (split 0) of each input file is counted.
Description
Problem
The CLP-JSON (clp-s) web UI Ingest page Details component only shows the TimeRange card, while CLP-Text additionally renders Messages (total log event count) and Files (total file count) cards. The root cause is that CLP-S doesn't populate the same data pipeline that CLP-Text relies on:
num_messagessourceSUM(num_messages)from files table (per-file,NOT NULL)ArchiveStatsomits itnum_filessourceCOUNT(DISTINCT orig_file_id)from files tableArchiveStatsomits itnum_messages,num_filesleft asNULLby C++ inserterNULLarchives+filestablesarchives+filestables, but files table is empty for CLP-SBecause CLP-S never populates the
filestable, the join always yieldsnum_messages=0andnum_files=0.Solution
C++ layer — Extend
ArchiveStatsto includenum_messagesandnum_files, matching how CLP-Text's data ultimately reaches the UI (but via a different source):ArchiveMetadata— nonum_messages/num_filesfieldsArchiveStats— now includes bothnum_messagescomputed byfilestableArchiveWriter::m_next_log_event_id(running log-event count)num_filescomputed byCOUNT(DISTINCT orig_file_id)in SQL fromfilestableArchiveWriter::m_num_files(range-index range count){id, uncompressed_size, size}onlynum_messagesandnum_filesDatabase layer — Add nullable
BIGINTcolumnsnum_messagesandnum_filesto thearchivestable schema. The Python compression task'supdate_archive_metadata()now inserts these values from the C++ stats output, while CLP-Text's C++ inserter continues leaving themNULL(hence nullable).Web UI layer — Rewrite
buildMultiDatasetDetailsSql()to query only from thearchivestable, eliminating the dependency on the always-emptyfilestable:clp_archives+clp_files(join)<prefix>_<dataset>_archivesonly (UNION ALL across datasets)num_messagesSQLSUM(num_messages)from filesCOALESCE(SUM(num_messages), 0)from archivesnum_filesSQLCOUNT(DISTINCT orig_file_id)from filesCOALESCE(SUM(num_files), 0)from archivesUpdate the CLP-S Details component to render Messages and Files cards using the same grid layout as CLP-Text.
Checklist
breaking change.
Validation performed
Build
Task: Verify the full project builds successfully including the new C++ changes and web UI.
Command:
Output:
End-to-end compression test (CLP-S)
Task: Verify the full compression pipeline —
clp-snow outputsnum_messagesandnum_filesin its JSON stats, the Python task writes them to the DB, and the archives table contains non-NULL values.Command:
Output:
Database verification
Task: Verify the
num_messagesandnum_filescolumns in the archives table contain non-NULL, non-zero values after compression.Command:
Output:
Explanation: Both
num_messages(1,000,000 log events) andnum_files(1 file) are populated with non-NULL values, confirming the C++ → Python → DB pipeline works correctly.Web UI browser validation (CLP-S Ingest page)
Task: Verify the CLP-JSON Ingest page now displays Messages and Files cards alongside the TimeRange card, using a browser.
Navigated to
http://localhost:4000/and confirmed the Details section renders all three cards with correct values:Explanation: The screenshot shows the Time Range, Messages (1,000,000), and Files (1) cards all rendering correctly. The grid layout has 3 children (timeRange spanning 2 columns, Messages and Files each in 1 column), matching the CLP-Text layout.
Details grid layout verification
Task: Verify the CSS grid layout structure matches CLP-Text (2-column grid with Time Range spanning full width).
Inspected the computed styles of the details grid element in the browser:
Explanation: The grid uses a 2-column layout with 8px gap. The Time Range child spans both columns (via
grid-column: span 2CSS), while Messages and Files each occupy one column — matching the CLP-Text Details layout exactly.Summary by CodeRabbit
Release Notes
New Features
UI Changes
Improvements