Skip to content

Volo/fix bools autocomplete#7848

Draft
volokluev wants to merge 4 commits intomasterfrom
volo/fix-bools-autocomplete
Draft

Volo/fix bools autocomplete#7848
volokluev wants to merge 4 commits intomasterfrom
volo/fix-bools-autocomplete

Conversation

@volokluev
Copy link
Copy Markdown
Member

@volokluev volokluev commented Mar 27, 2026

This PR is actually accomplishing what #7642 set out to do but failed. No AI was used in writing this PR because migrations are very delicate, as this incident shows.

What happened

#7642 set created a new materialized view to populate attributes_bool. Alongside it, the key_hash field in the materialized view was updated to reflect this. The problem is that this key_hash field was never populated from this materialized view. It was actually populated from the materialized column on the table.

It turns out that using a materialized column in a materialized view does absolutely nothing. I updated the view statement to look like this:

SELECT
    organization_id AS organization_id,
    project_id AS project_id,
    item_type as item_type,
    toMonday(timestamp) AS date,
    retention_days as retention_days,
    arrayConcat({_attr_str_names}) AS attributes_string,
    arrayConcat({_attr_num_names}) AS attributes_float
    mapKeys(attributes_bool) as attribute_bool,
    cityHash64(arraySort(arrayConcat(attributes_string, attributes_float, attributes_bool))) AS key_hash
FROM eap_items_1_local

and it broke inserts! Turns out:

DB::Exception: Cannot insert column key_hash, because it is MATERIALIZED column. 
(ILLEGAL_COLUMN) (version 25.3.6.10034.altinitystable (altinity build))

How this fixes it

This PR does the following:

  1. Updates the materialized column statement to include bools
  2. changes the column type from MATERIALIZED to DEFAULT. This makes it so we don't make this kind of error again. We will be able to have a materialized view update how this column is calculated

How I tested it

This SQL statement on master reproduces the problem:

INSERT INTO default.eap_items_1_local
  (organization_id, project_id, item_type, timestamp, trace_id, item_id,
   sampling_weight, sampling_factor, server_sample_rate, client_sample_rate,
   retention_days, downsampled_retention_days,
   attributes_string_0, attributes_float_0, attributes_bool)
VALUES
  (1, 1, 1, '2026-03-27 00:00:00', generateUUIDv4(), 1,
   1, 1.0, 1.0, 1.0,
   30, 30,
   {'env': 'prod'}, {'count': 420.69}, {'active': true}),
  (1, 1, 1, '2026-03-27 00:00:00', generateUUIDv4(), 2,
   1, 1.0, 1.0, 1.0,
   30, 30,
   {'env': 'prod'}, {'count': 420.69}, {'active': true, 'verified': false}),
  (1, 1, 1, '2026-03-27 00:00:00', generateUUIDv4(), 2,
   1, 1.0, 1.0, 1.0,
   30, 30,
   {'env': 'prod'}, {'count': 420.69}, {'inactive': true, 'verified': false});

Note that each inserted row has the same string and float attributes and different boolean attributes. After this insert, run this query:

SELECT * FROM eap_item_co_occurring_attrs_1_local OPTIMIZE FINAL;
   ┌─organization_id─┬─project_id─┬─item_type─┬───────date─┬─retention_days─┬─attributes_string─┬─attributes_float─┬─attributes_bool─────────┐
1. │               1 │          1 │         1 │ 2026-03-23 │             30 │ ['env']           │ ['count']        │ ['inactive','verified'] │
   └─────────────────┴────────────┴───────────┴────────────┴────────────────┴───────────────────┴──────────────────┴─────────────────────────┘

Switching to this branch, the same insert statement yields the following table:

 SELECT * FROM eap_item_co_occurring_attrs_1_local OPTIMIZE FINAL;

SELECT *
FROM eap_item_co_occurring_attrs_1_local AS OPTIMIZE
FINAL

Query id: 4367ab68-106a-428a-80cf-25440bba7d93

   ┌─organization_id─┬─project_id─┬─item_type─┬───────date─┬─retention_days─┬─attributes_string─┬─attributes_float─┬─attributes_bool─────────┬─────────────key_hash─┐
1. │               1 │          1 │         1 │ 2026-03-23 │             30 │ ['env']           │ ['count']        │ ['active']              │    21113431093458980 │
2. │               1 │          1 │         1 │ 2026-03-23 │             30 │ ['env']           │ ['count']        │ ['active','verified']   │  2877887750331984573 │
3. │               1 │          1 │         1 │ 2026-03-23 │             30 │ ['env']           │ ['count']        │ ['inactive','verified'] │ 14219356809293241088 │
   └─────────────────┴────────────┴───────────┴────────────┴────────────────┴───────────────────┴──────────────────┴─────────────────────────┴──────────────────────┘

Impacts

Even though this table has had the attribute_bool column, differing attribute_bool column values would have been merged away because of this mistake. This means we have to wait another 30 days before we can stop double writing.

@github-actions
Copy link
Copy Markdown

This PR has a migration; here is the generated SQL for ./snuba/migrations/groups.py ()

-- start migrations

-- forward migration events_analytics_platform : 0054_fix_bools_in_autocomplete
Local op: ALTER TABLE eap_item_co_occurring_attrs_1_local ON CLUSTER 'cluster_one_sh' MODIFY COLUMN key_hash UInt64 DEFAULT cityHash64(arraySort(arrayConcat(attributes_string, attributes_float, attributes_bool)));
-- end forward migration events_analytics_platform : 0054_fix_bools_in_autocomplete




-- backward migration events_analytics_platform : 0054_fix_bools_in_autocomplete
Local op: ALTER TABLE eap_item_co_occurring_attrs_1_local ON CLUSTER 'cluster_one_sh' MODIFY COLUMN key_hash UInt64 MATERIALIZED cityHash64(arraySort(arrayConcat(attributes_string, attributes_float)));
-- end backward migration events_analytics_platform : 0054_fix_bools_in_autocomplete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant