`DecimalArray` Logical and Physical type mismatch

We have this documentation on `DecimalArray`:

```rust
/// These are just the maximal ranges for each scalar type, but it is perfectly legal to store
/// values with precision that does not match this exactly. For example, a valid DecimalArray with
/// precision=39 may store its values in an `i8` if all of the actual values fit into it.
///
/// Similarly, a `DecimalArray` can be built that stores a set of precision=2 values in a
/// `Buffer<i256>`.
```

This presents a problem (which is the cause of many of the recent fuzzer crashes): If we try to patch values or use `fill_null` with a decimal value that is valid according to the logical `DType`, but is **not** compatible with the storage `DType`, then we just panic!

This is highly concerning! The physical type has leaked and now governs what we are allowed to do instead of the logical type.

I see 3 ways forward:

1. Keep this behavior. I think this is a bad idea, as it has completely different behavior from how
   `PrimitiveArray` works with `BitPackedArray`, for example.
2. Reallocate the entire array whenever we run into this so that the new storage type can fit the new values. 
   I also think this is a bad idea in terms of performance.
3. We instead require that the storage type must always be able to fit any value in the domain of the logical 
   `DecimalDType`. 

For 3, by requiring this constraint we no longer need to worry about checking every single value during validation, we can just do a single check in the constructor that the `DType` fits the storage.

---

We are about to do a huge migration to the new Vortex operator world (where everything is lazily computed / executed), which means we don't necessarily need to do this migration in the old world: We can just make sure we implement the correct behavior in the new world.

In the new world, we need to make sure that the `DecimalVector` the array tree executes into holds a `DVector` that can store every possible value given by the `DecmialDType`'s `PrecisionScale`. 

In practice, this will mean finding the _smallest_ integer storage type for the given `PrecisionScale`. Note we don't necessarily need to use the minimal storage type, though that will be the most helpful for performance.

The more important thing is that we **do not allow** executing a `DecimalArray` into a `DecimalVector` that cannot fit all possible values according to the `PrecisionScale`.

And also note that we need to handle old files that we were written in the old world with this bad logic. 

---

### Fuzzer Crashes

- https://github.com/vortex-data/vortex/issues/5590
- https://github.com/vortex-data/vortex/issues/5744
- https://github.com/vortex-data/vortex/issues/5806
- https://github.com/vortex-data/vortex/issues/5807
- https://github.com/vortex-data/vortex/issues/5810
- https://github.com/vortex-data/vortex/issues/5811


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`DecimalArray` Logical and Physical type mismatch #5820

Fuzzer Crashes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DecimalArray Logical and Physical type mismatch #5820

Description

Fuzzer Crashes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`DecimalArray` Logical and Physical type mismatch #5820