You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now that Java has added server-side scan planning support PR #14480, I believe Python is a great place to integrate this functionality! We have all the building blocks and they are almost brought to completion. I'm creating this issue to track all of the tasks we need to drive it through
Context
We have some open PRs with some needed model changes, but we have pivoted to using our existing models and ensuring they're properly serializable with pydantic.
DataFile Serialization to ensure we can properly deserialize the data/deletefiles from the server response. Open API uses kebab-case (file-format, file-path) but our models expect snake_case (file_format, file_path).
Feature Request / Improvement
Now that Java has added server-side scan planning support PR #14480, I believe Python is a great place to integrate this functionality! We have all the building blocks and they are almost brought to completion. I'm creating this issue to track all of the tasks we need to drive it through
Context
We have some open PRs with some needed model changes, but we have pivoted to using our existing models and ensuring they're properly serializable with pydantic.
For example, initially we can work on:
Expression Serialization to ensure
BooleanExpressionand subclasses serialize correctly for REST API. Related to @Fokko's work on RemoveGenericfrom expressions #2750 and @rambleraptor Server-side planning models #2435.DataFile Serialization to ensure we can properly deserialize the data/deletefiles from the server response. Open API uses kebab-case (
file-format,file-path) but our models expect snake_case (file_format,file_path).REST API Endpoints to Implement
Based on the Iceberg REST spec:
POST /v1/{prefix}/namespaces/{namespace}/tables/{table}/plan- Submit scan for planningGET /v1/{prefix}/namespaces/{namespace}/tables/{table}/plan/{plan-id}- Fetch planning resultDELETE /v1/{prefix}/namespaces/{namespace}/tables/{table}/plan/{plan-id}- Cancel planningPOST /v1/{prefix}/namespaces/{namespace}/tables/{table}/tasks- Fetch scan tasks for a plan taskTasks
Initially we can start with core sync planning and once that's in place we can add the async support as it looks like it exists in https://github.com/apache/iceberg/blob/main/core/src/test/java/org/apache/iceberg/rest/RESTCatalogAdapter.java
Core Sync Planning
Expressionclasses serialize properly with Pydantic.DataScanbehaviorFileScanTaskobjects (handle Data/DeleteFile construction)plan_table_scan()methodsFull Scan planning support (Follow-up)
Complete the full scan planning API with async operations and pagination.
RestCatalog