Skip to content

[Proposal] Specializing Agent Skills for ClickHouse-Native Data Modeling (Time-Series analytics & Materialized Views) #13

@shuvajyotikar13

Description

@shuvajyotikar13

Hi @pjhampton ,

Following the feedback on PR #9, I wanted to transition the discussion to the architectural "why" behind those skills. My goal is to bridge the gap where general-purpose LLMs fail to utilize ClickHouse’s specific strengths, particularly in Time-Series Analytics.

The Problem: The "Generic DDL" Trap

Standard LLMs default to standard SQL patterns. For high-volume time-series data, this leads to:

Query-Time Latency: Agents suggest simple SELECT queries on raw data rather than shifting compute to insert-time.

Storage Inefficiency: Missing opportunities for column-level codecs that can reduce disk footprint by over 80%.

Proposed Architectural Skills (Non-Executable Rules):

These can be implemented as purely descriptive rules or prompts, requiring no code execution in your AI cloud:

  1. data-modeling-materialized-views
    Guides the agent to recognize heavy aggregation workloads (e.g., IoT sensor heartbeats) and enforce the Target Table + MV pattern.

Example Heuristic: If the workload requires COUNT/SUM on time-intervals, the agent must recommend SummingMergeTree or AggregatingMergeTree.

  1. schema-compression-codecs
    Codifies specific heuristics for high-volume columns that go beyond default LZ4.

Timestamp Columns: Enforce DoubleDelta for monotonic sequences.

Metric/Sensor Data: Enforce Gorilla for floating-point values.

Low-Cardinality Strings: Enforce LowCardinality(String) with ZSTD.

Why this matters:

In our testing with IoT-scale time-series data, applying these specific codecs via the agent reduced storage requirements by ~75% compared to generic AI-generated DDL.

I’d love to discuss how we can codify these as "High-Performance Rules" within the new contributing guide you are developing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions