[#10685] feat(iceberg-rest): Support vended credentials on planTableScan endpoint#10800
[#10685] feat(iceberg-rest): Support vended credentials on planTableScan endpoint#10800sachinnn99 wants to merge 2 commits into
Conversation
|
@sachinnn99 Could u resolve the conflicts? @laserninja Could u help me review this pull request? |
| LoadCredentialsResponse credentialsResponse = | ||
| icebergCatalogWrapperManager | ||
| .getCatalogWrapper(catalogName) | ||
| .getCredentialsIfEligible(tableIdentifier, true, CredentialPrivilege.READ); |
There was a problem hiding this comment.
[HIGH] Bypasses the dispatcher/event-listener layer
All other credential operations go through tableOperationDispatcher (e.g. getTableCredentials), which triggers event listeners and authorization hooks. Here the endpoint accesses IcebergCatalogWrapperManager directly, so:
- Credential vending on
planTableScanwon't emit events - Authorization hooks won't fire for the credential fetch
The existing loadTable/createTable endpoints delegate credential vending to CatalogWrapperForREST via the dispatcher. The same pattern should be followed here — e.g. add a planTableScan overload in CatalogWrapperForREST that accepts requestCredential, or route through a dispatcher method.
| if (!credentialsResponse.credentials().isEmpty()) { | ||
| return buildScanResponseWithCredentials(scanResponse, credentialsResponse); | ||
| } | ||
| } catch (Exception e) { |
There was a problem hiding this comment.
[HIGH] Overly broad catch (Exception e) silently swallows errors
This catches all exceptions including IllegalArgumentException, NullPointerException, authorization failures, etc. A client requesting vended-credentials that fails due to a misconfiguration would silently get no credentials with only a WARN log — very hard to debug.
The existing getTableCredentials method only catches ServiceUnavailableException. This should follow the same pattern:
} catch (ServiceUnavailableException e) {Other exceptions (auth failures, illegal state, NPE) should propagate so they surface as proper error responses.
| public LoadCredentialsResponse getCredentialsIfEligible( | ||
| TableIdentifier identifier, boolean requestCredential, CredentialPrivilege privilege) { | ||
| try { | ||
| LoadTableResponse loadTableResponse = super.loadTable(identifier); |
There was a problem hiding this comment.
[HIGH] Redundant loadTable call
planTableScan already loads and scans the table via tableOperationDispatcher.planTableScan(...). Then getCredentialsIfEligible calls super.loadTable(identifier) again — a second full table load just to check the location and generate credentials. This is wasteful for every credential-vended scan.
Consider either:
- Passing the table metadata/location obtained during
planTableScaninto the credential vending method, or - Caching or reusing the already-loaded table within the same request
| } | ||
| } | ||
|
|
||
| private Response buildScanResponseWithCredentials( |
There was a problem hiding this comment.
[MEDIUM] JSON-level merge is fragile — add a TODO with issue reference
Manually constructing JSON via ObjectNode/ArrayNode is fine as a temporary workaround for Iceberg 1.10.1's lack of PlanTableScanResponse.Builder.withCredentials(), but:
- Please add a
// TODO(#XXXX): Replace with PlanTableScanResponse.Builder.withCredentials() after Iceberg 1.11.0 upgradecomment so this is tracked for cleanup. - Verify the JSON structure matches the Iceberg REST spec's
CompletedPlanningResultschema — specifically thestorage-credentialsarray format. - This also bypasses the standard
IcebergRESTUtils.ok()serialization path. Any future changes to response serialization (e.g. custom serializers, response wrapping) won't apply here.
| s3ScanJson.has("storage-credentials"), "S3 table should have storage-credentials"); | ||
| JsonNode credentials = s3ScanJson.get("storage-credentials"); | ||
| Assertions.assertTrue(credentials.isArray() && credentials.size() > 0); | ||
| JsonNode firstCred = credentials.get(0); |
There was a problem hiding this comment.
[MEDIUM] Tests don't verify prefix or refresh properties
The tests verify config contains the credential type, but don't verify:
- The
prefixfield is set correctly — currentlygetCredentialsIfEligiblereturns""(empty string), but the existinggetTableCredentials/injectCredentialConfigmethods use the table's metadata location as the prefix. This discrepancy should be tested. - Refresh properties (e.g.
client.refresh-credentials-endpoint) are present —IcebergRESTUtils.toRESTCredential()adds these but this code path uses rawCredentialPropertyUtils.toIcebergProperties()which omits them. - The overall credential structure matches what
loadTablewithvended-credentialsreturns for the same table, ensuring consistency.
Consider adding:
Assertions.assertFalse(firstCred.get("prefix").asText().isEmpty(),
"Credential prefix should be the table location");c4aece4 to
dac1df3
Compare
|
@laserninja Addressed everything in the latest push:
Conflicts with main are resolved. |
Verification: All Requested Changes ✓Excellent follow-up! The latest revision addresses all the HIGH and MEDIUM issues raised in the previous review: Issues Fixed[HIGH] Bypasses dispatcher layer
[HIGH] Broad
[HIGH] Redundant
[MEDIUM] JSON-level merge fragility
[MEDIUM] Test coverage for prefix & refresh properties
📝 Minor ItemsDuplicate Javadoc on no-arg FederatedCatalogWrapper doesn't override
LGTM with one minor cleanup (duplicate Javadoc). |
Code Coverage Report
Files |
…TableScan endpoint
986ed6f to
fe1a580
Compare
|
@laserninja Thanks for the thorough verification! Removed the duplicate Javadoc on the no-arg overload in the latest push. Ready for approval when you get a chance. |
|
Hey @roryqi, I've addressed all of laserninja's feedback and resolved the conflicts. Could you please take a look? Thanks! |
|
@lasdf1234 Could u help me review this pull request? |
Got. |
| Table table = getCatalog().loadTable(tableIdentifier); | ||
| Optional<PlanTableScanResponse> cachedResponse = | ||
| scanPlanCache.get(ScanPlanCacheKey.create(tableIdentifier, table, scanRequest)); | ||
| if (cachedResponse.isPresent()) { |
There was a problem hiding this comment.
ScanPlanCacheKey does not include requestCredentialVending, and we cache the full PlanTableScanResponse (with or without credentials). This can cause:
1、Plan without header → plan with vended-credentials: cache hit returns a response without storage-credentials.
2、 Plan with header → plan without header: cache hit may return stale/extra credentials.
3、Cached credentials may expire while the scan plan remains valid.
Suggestion: Cache only credential-free scan plans. Inject credentials after cache lookup on every request (hit or miss), similar to how loadTable generates credentials per request:
|
@sachinnn99 Another issue is that you need to consider the scenario of a federation IRC. |
What changes were proposed in this pull request?
Add
X-Iceberg-Access-Delegationheader support to theplanTableScanendpoint and return storage credentials in the response whenvended-credentialsis requested. Mirrors the existing credential-vending flow fromcreateTable/loadTable/registerTable(#10684).Changes:
IcebergTableOperations.planTableScan: add@HeaderParam(X_ICEBERG_ACCESS_DELEGATION), computeisCredentialVending, build the 3-argIcebergRequestContext, mergestorage-credentialsinto the response JSON when eligibleIcebergTableOperations: injectIcebergCatalogWrapperManager(same pattern asIcebergConfigOperations) and add a privatebuildScanResponseWithCredentialshelper that serializes the response to anObjectNodeand appends thestorage-credentialsarrayCatalogWrapperForREST: addgetCredentialsIfEligible(identifier, requestCredential, privilege)that wrapsshouldGenerateCredential+getCredentialand returns credentials separately, reusing the same Gravitino→Iceberg credential conversion pattern asgetTableCredentialsMockIcebergTableOperations: update constructor to match the parentTestIcebergTableOperations: add three tests —testPlanTableScanWithCredentialVending(no header / local / S3),testPlanTableScanRemoteSigningNotSupported,testPlanTableScanInvalidAccessDelegationNote:
PlanTableScanResponsein Iceberg 1.10.1 has noconfig/credentialsfield (credentials support was added in Iceberg PR #14518, targeting 1.11.0+). Credentials are therefore merged into the response JSON at the endpoint. When Gravitino upgrades to Iceberg 1.11.0+, the JSON-level merge can be replaced with a directPlanTableScanResponse.Builder.withCredentials(...)call inCatalogWrapperForREST.planTableScan. Credential eligibility (shouldGenerateCredential) is still enforced at the wrapper level so local/HDFS tables correctly receive no credentials.Why are the changes needed?
The Iceberg REST spec defines
X-Iceberg-Access-Delegationas a valid header onplanTableScanandCompletedPlanningResultincludes astorage-credentialsfield. Currently, clients performing server-side scan planning must make a separateGET .../credentialscall to obtain storage access credentials before reading the data files returned in the scan plan.Fixes #10685
Does this PR introduce any user-facing change?
Yes. The
planTableScanREST endpoint now accepts theX-Iceberg-Access-Delegationheader and returns vended credentials in the responsestorage-credentialsfield when requested. Backward compatible — clients that do not send the header get existing behavior.How was this patch tested?
Added unit tests in
TestIcebergTableOperations:testPlanTableScanWithCredentialVending— no vending without header, no vending forfile://location, vending present fors3://location with expectedDUMMY_CREDENTIAL_TYPEtestPlanTableScanRemoteSigningNotSupported— 406 response forremote-signingtestPlanTableScanInvalidAccessDelegation— 400 response for invalid header valuesAll existing
TestIcebergTableOperationstests still pass (no regressions).