TurboQuant: Inv direction norms (correctness fix)#8049
Draft
connortsui20 wants to merge 3 commits into
Draft
Conversation
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | chunked_varbinview_opt_canonical_into[(1000, 10)] |
225.1 µs | 187.9 µs | +19.81% |
| ❌ | Simulation | new_alp_prim_test_between[f32, 32768] |
154 µs | 182.9 µs | -15.8% |
| ❌ | Simulation | new_alp_prim_test_between[f32, 16384] |
104.6 µs | 119.2 µs | -12.19% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing ct/tq-inv-direction-norms (353a0ee) with develop (f852d72)1
Footnotes
27f4be5 to
accf369
Compare
Signed-off-by: "Connor Tsui" <connor.tsui20@gmail.com>
Signed-off-by: "Connor Tsui" <connor.tsui20@gmail.com>
Fix two correctness bugs in the L2Norm(TQDecode(_)) fast path. (1) The kernel coerced the returned norms to the child's nullability rather than the parent's, so wider-child-validity storage shapes that parse_storage accepts errored out at the dtype invariant. The kernel now coerces to parent.dtype().nullability() and a new test mirrors the malformed.rs shape. (2) The per-row inv_direction_norm computation could store a 0.0 sentinel for finite rows whose squared sum overflowed to +inf in f32 (or a +inf for denormal norm_squared), making decode emit zeros while the kernel returned the nonzero stored norm. Encode now rejects non-finite input norms up front and the denormal recip is guarded by is_normal(); regression tests cover both cases. Several should-fix items go with the must-fix: parse_storage_norms_only lets the kernel skip executing the codes and inv_direction_norms children it does not consume; the parity test pins down the exact new = old * inv_direction_norm[row] relationship rather than asserting "the values differ"; file roundtrip now asserts the new field survives serialization and the kernel still preserves stored norms; tests are parameterized over f16/f32/f64 and across padded vs unpadded dimensions; the kernel result is cross-checked against canonical L2Norm of the materialized decode. The hypothetical defensive metadata check on the kernel is dropped (registry key plus TQDecode signature already enforce shape). The dev-dep on vortex-array switches to workspace = true to match sibling encodings. Over-long doc lines are reflowed. Every type in the crate now has a doc comment, emphasizing the new inv_direction_norms storage child and the 0.0 sentinel semantics. Module docs single-source the storage schema rationale in storage.rs; lib.rs and the scalar-fn modules defer to it. Verified: cargo check, cargo clippy --all-targets --all-features, cargo +nightly fmt --all --check, cargo doc --no-deps, and cargo nextest run (102 tests, +14 new) all clean. Signed-off-by: "Connor Tsui" <connor.tsui20@gmail.com>
accf369 to
353a0ee
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Tracking issue: #7830
TODO
Testing
Still pretty basic testing