Skip to content

Conversation

@geruh
Copy link
Contributor

@geruh geruh commented Dec 24, 2025

related to #2775 and #2792

Rationale for this change

This PR adds the Pydantic models for the REST catalog server-side scan planning API focusing on the synchronous use cases first.

There's some redundancy here with things like RESTDataFile vs the existing DataFile in the manifest module.
As mentioned in #2792 the manifest logic with encoding/decoding is dependency on Avro. Rather than trying to solve that unification problem upfront and blocking ourselves, I went with separate REST pydantic types for now. The plan is to eventually add conversion methods or a common interface so these can work together with our existing scan task and content file types.

The models and tests here align with the Java implementation.

Are these changes tested?

Yes, added tests and works against POC

Are there any user-facing changes?

No

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Agreed with the DataFile serialization problem. We can refactor later on and unify the representation.

For now lets move this forward. Excited to see scan planning in action for pyiceberg 😄

"""Position delete file from REST API."""

content: Literal["position-deletes"] = Field(default="position-deletes")
referenced_data_file: str | None = Field(alias="referenced-data-file", default=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
referenced_data_file: str | None = Field(alias="referenced-data-file", default=None)

PositionDeleteFile doesnt have this
https://github.com/apache/iceberg/blob/0651b8913d27c3b1c9aca4a9609bec521905fb36/open-api/rest-catalog-open-api.yaml#L4450-L4466


snapshot_id: int | None = Field(alias="snapshot-id", default=None)
select: list[str] | None = Field(default=None)
filter: BooleanExpression | None = Field(default=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
filter: BooleanExpression | None = Field(default=None)
filter: BooleanExpression | None = Field(default=None)
min_rows_requested: int | None = Field(alias="min-rows-requested", default=None)

missing min_rows_requested https://github.com/apache/iceberg/blob/0651b8913d27c3b1c9aca4a9609bec521905fb36/open-api/rest-catalog-open-api.yaml#L4483-L4507

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants