Skip to content

Conversation

@chenghuichen
Copy link
Contributor

@chenghuichen chenghuichen commented Feb 17, 2025

Purpose

Linked issue: close #38

Expose More Metadata in Object APIs to Benefit External Read/Write Optimizations

Tests

  • test_read_type_metadata
  • test_split_metadata

API and Format

Yes.

  • add new object RowType with method as_arrow.
  • add new methods row_count, file_size, file_paths to the object Split.
  • and new method read_type to the object ReadBuilder, which following the design of java paimon.

Documentation

def new_predicate_builder(self) -> 'PredicateBuilder':
return PredicateBuilder(self._j_row_type)

def read_type(self) -> 'RowType':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can return pa.schema directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For API's extensibility, peasonally I think this is better.
image

@yuzelin
Copy link
Contributor

yuzelin commented Feb 19, 2025

+1

@yuzelin yuzelin merged commit 75d00d7 into apache:main Feb 19, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement] Expose More Metadata in Object APIs

2 participants