Skip to content

Implement DFExtensionType for Arrow's Canonical Extension Types #21144

@tobixdev

Description

@tobixdev

Is your feature request related to a problem or challenge?

Now that it seems like we can merge #20312 soon, we should implement the full-range of Arrow's canonical extension types. Currently, only UUID is supported in the PR.

This issue tracks adding the remaining canonincal extension types:

  • Fixed shape tensor
  • Variable shape tensor
  • JSON
  • UUID
  • Opaque
  • 8-bit Boolean
  • Parquet Variant
  • Timestamp With Offset

Describe the solution you'd like

Implement the DFExtensionType similar to UUID.

The question that remains is how we implement pretty-printing for these types.

  • Do we try to pretty-print tensors?
  • Do we pretty-print JSON using newlines?
  • I guess Parquet Variant would benefit from a nice representation in tests/CLIs. @friendlymatthew maybe you have some 2 cents here?

Describe alternatives you've considered

We could implement the formatters within arrow-rs and just use them in DataFusion. But I am unsure where they best fit.

Maybe starting in DataFusion and migrating them to arrow-rs sometimes in the future (depending on a use case) is a good choice.

Additional context

Some (maybe) related issues I've found:

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions