fix: Preserve KvlistValue as JSON object when nested_kv_max_depth > 0#317
fix: Preserve KvlistValue as JSON object when nested_kv_max_depth > 0#317sheldor1510 wants to merge 1 commit intostreamfold:mainfrom
Conversation
When nested_kv_max_depth > 0 (default 3), KvlistValue attributes should be stored as JSON objects rather than flattened into dotted paths. The flatten_keyvalues_* functions were always recursing into KvlistValues, which caused them to be silently dropped when sitting alongside scalar siblings under the same parent. Now KvlistValues are only flattened in flat mode (max_depth == 0). In nested mode, they pass through to anyvalue_to_jsontype which correctly handles Object conversion with depth tracking.
b7d732e to
2779b7f
Compare
|
Hey @sheldor1510! So I'm trying to understand this a bit more since I'm not seeing the end behavior that you're noticing here. There's a test in the I've applied the following patch to the main branch which mimics your test scenario: When I then query ClickHouse I see this as the SpanAttributes: I believe that is the correct representation of these fields based on our interpretation of OTLP -> JSON encoding of attributes. I'm curious if you are expecting a different JSON representation in Clickhouse? When you query this data does it appear differently? I see your test is comparing against the intermediary values generated from the transformer before they are sent to Clickhouse. I do think there is a difference there because Clickhouse treats dotted notations as flattened, nested keys by default. However, the end result should be the same. I DO think there may be an issue here in that the original key flattening doesn't respect the depth argument and may actually produce a deeper depth in keys, eg. Anyways, would like to see what you're experiencing when these are encoded into Clickhouse and if the CH representation is coming out different. |
|
@sheldor1510 Curious if you can test against #327 |
Summary
When
nested_kv_max_depth > 0, theflatten_keyvalues_*functions were unconditionally recursing intoKvlistValueentries, expanding them into dotted paths. This causedKvlistValueattributes to be silently dropped when sitting alongside scalar siblings at the same level.The Bug
When a
KvlistValuesits alongside scalar siblings at the same level — for example:The flattening code would unconditionally recurse into the
KvlistValue, producing dotted keys likeuser_message.roleanduser_message.content. These dotted keys then conflicted with the depth-tracking logic fromnested_kv_max_depth > 0, causing the nested attribute to be silently lost in the ClickHouse output.Tweaking
max_depthdoes not help — theKvlistValuewas being destroyed byflatten_keyvaluesbefore depth tracking ever ran.The Fix
Add a guard (
if self.nested_kv_max_depth == 0) to theKvlistValuematch arm in all three flatten variants:flatten_keyvalues_mapflatten_keyvalues_borrowedflatten_keyvalues_ownedWhen
nested_kv_max_depth == 0(flat mode, the default): behavior is unchanged —KvlistValueis recursively expanded into dotted paths.When
nested_kv_max_depth > 0(nested mode):KvlistValueentries fall through toanyvalue_to_jsontype/anyvalue_to_jsontype_owned, which correctly converts them to JSON Objects with depth tracking.Tests
Three new tests demonstrate the fix and verify backwards compatibility:
test_nested_mode_preserves_kvlist_alongside_scalar_siblings— The core bug scenario: aKvlistValue(user_message) next to a scalar sibling (is_active) withnested_kv_max_depth = 3. Verifies theKvlistValueis stored as a JSON Object (not flattened into dotted paths), and checks the inner values (role,content) are correct.test_nested_mode_owned_preserves_kvlist_alongside_scalar_siblings— Same scenario for the owned variant (transform_attrs_kv_owned/flatten_keyvalues_owned), which is used when attributes need to outlive the source protobuf data.test_flat_mode_still_flattens_kvlist_into_dotted_paths— Backwards compatibility: withnested_kv_max_depth == 0(the default),KvlistValueIS still flattened into dotted paths (user_message.role,user_message.content), ensuring no regression for users not using nested mode.All 15 transformer tests pass.