[#11642] improvement(fileset): Hide credential properties from fileset catalog properties#11674
[#11642] improvement(fileset): Hide credential properties from fileset catalog properties#11674yuqi1129 wants to merge 8 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR prevents sensitive cloud storage credentials from leaking through outward-facing properties() APIs by marking those keys as hidden in fileset-related property metadata, while keeping credential vending behavior intact via propertiesWithCredentialProviders().
Changes:
- Introduces a shared hidden metadata map for storage credential keys and reuses it across catalog/schema/fileset metadata.
- Adds unit tests verifying credential keys are hidden from
properties()but still available for credential vending. - Confirms non-sensitive connection properties remain visible.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| catalogs/catalog-fileset/src/main/java/org/apache/gravitino/catalog/fileset/FilesetCatalogPropertiesMetadata.java | Adds shared hidden property entries for storage credentials and wires them into catalog properties metadata. |
| catalogs/catalog-fileset/src/main/java/org/apache/gravitino/catalog/fileset/FilesetSchemaPropertiesMetadata.java | Reuses the shared hidden storage-credential entries at the schema level. |
| catalogs/catalog-fileset/src/main/java/org/apache/gravitino/catalog/fileset/FilesetPropertiesMetadata.java | Reuses the shared hidden storage-credential entries at the fileset level. |
| catalogs/catalog-fileset/src/test/java/org/apache/gravitino/catalog/fileset/TestFilesetCatalogPropertiesMetadata.java | Adds tests asserting credential keys are hidden and non-credential keys remain visible. |
| catalogs/catalog-fileset/src/test/java/org/apache/gravitino/catalog/fileset/TestFilesetCatalogCredential.java | Adds end-to-end tests for hidden properties() vs. vending-friendly propertiesWithCredentialProviders(). |
c98adec to
c75ec86
Compare
…fileset catalog properties Mark cloud storage credential properties (S3/OSS access & secret keys, GCS service account file, Azure storage account name/key and client secret) as hidden in FilesetCatalogPropertiesMetadata, FilesetSchemaPropertiesMetadata and FilesetPropertiesMetadata, so BaseCatalog.properties() filters them out, consistent with how the JDBC catalog hides jdbc-user and jdbc-password. Hiding does not break credential vending: the raw credentials remain available to the server-side credential manager via propertiesWithCredentialProviders(), and the matching credential provider is auto-injected so clients (e.g. GVFS) can still obtain vended credentials.
c75ec86 to
36eb9d4
Compare
Code Coverage Report
Files
|
|
I don’t see any usage of client.credential in either the Java or Python client. Is it right? |
I don’t see any usage of client.credential in either the Java or Python client. Is it right?
…end them at the fileset level Hide cloud storage credential properties (S3/OSS/Azure access & secret keys) from the fileset catalog/schema/fileset properties() by reusing the shared hidden PropertyEntry definitions in core, consistent with the JDBC catalog (apache#11149) and Hive/Iceberg/Glue (apache#11264). To keep the "client without credentials, using server-side credentials" flow working after hiding, make fileset-level (path-based) credential vending infer the storage credential provider from static credentials when no explicit credential-providers is set. The detection is extracted into CredentialUtils.getStorageCredentialProviders and reused by BaseCatalog, so it mirrors catalog-level vending: a catalog configured with only static credentials can now vend them at the fileset level. GVFS clients that do not provide credentials must enable credential vending (fs.gravitino.enableCredentialVending=true) to obtain the server-side credentials. When credentials are missing and vending is disabled, GVFS now surfaces an actionable AccessDenied message pointing to the right config.
…issing credentials in GVFS When a GVFS operation is denied because the client provides no credentials and credential vending is disabled, the Python GVFS now wraps the PermissionError with a message guiding the user to set enable_credential_vending=True or provide the storage credentials in the client configuration. This mirrors the Java GVFS behavior.
… over vended credentials When credential vending is enabled, the vended credential provider was merged after (and thus overrode) the client-configured static storage credentials. Make the GVFS clients (Java and Python) skip credential vending when the client already supplied its own storage credentials, so client-provided credentials always win. Detection is delegated to each storage provider/handler (S3/OSS/GCS/Azure) via SupportsCredentialVending#containsClientCredentials (Java) and StorageHandler.contains_client_credentials (Python), checking the Gravitino-style credential keys.
…enabled The client-credential precedence check resolved the storage provider/handler unconditionally, which broke flows where credential vending is disabled and the storage handler cannot be resolved for the path (e.g. the mocked local filesystem in test_gvfs_with_hook). Gate the check behind enableCredentialVending so the provider/handler is only resolved when vending is actually used; when vending is off no credentials are fetched anyway.
|
For the fileset, we need to change the behaviour(by default, |
|
After this PR, users need to explicitly set 'fs.gravitino.enableCredentialVending' to true to utilize credentials provided by the Gravitino server if no credentials like AKSK were set in the Gravitino client. |
What changes were proposed in this pull request?
Hide cloud storage credential properties for the fileset catalog, and make the credentials reachable to clients (e.g. GVFS/Spark) through credential vending so they are never returned in plaintext.
Hide static credentials —
FilesetCatalogPropertiesMetadata,FilesetSchemaPropertiesMetadataandFilesetPropertiesMetadatanow reuse the shared hiddenPropertyEntrydefinitions in core (S3PropertiesMetadata/OSSPropertiesMetadata/AzurePropertiesMetadata/GCSPropertiesMetadata), so S3/OSS/Azure credentials are filtered out ofcatalog/schema/fileset.properties(). Consistent with the JDBC catalog ([#11148] feat(authz): Support credential vending for the JDBC MYSQL connector #11149), Hive/Iceberg/Glue ([#11263] feat(authz): Extend credential vending to Hive/Iceberg/Glue/JDBC catalog types #11264) and Paimon ([Bug report] Paimon catalog exposes sensitive properties (jdbc-password, S3 secret key) in REST API responses #11644).Vend static credentials at the fileset level — Fileset access uses path-based credential vending, which previously only honored an explicit
credential-providerssetting and did not infer a provider from static credentials (unlike catalog-level vending). The detection logic is extracted intoCredentialUtils.getStorageCredentialProvidersand reused byBaseCatalog;CredentialUtils.getCredentialProvidersByOrdernow falls back to inferring the provider (e.g.s3-secret-key) from static credentials when no explicitcredential-providersis configured. This lets a catalog configured with only static credentials vend them at the fileset level, so a GVFS client that provides no credentials (withfs.gravitino.enableCredentialVending=true) obtains the server-side credentials.Actionable error — When a GVFS operation is denied because the client provides no credentials and credential vending is disabled, GVFS now surfaces an
AccessDeniedExceptionwhose message points the user to setfs.gravitino.enableCredentialVending=trueor provide the storage credentials in the client configuration.Why are the changes needed?
For fileset catalogs, credential properties (S3/OSS/Azure keys) were not hidden, so
catalog.properties()exposed them in plaintext. The JDBC catalog already hidesjdbc-user/jdbc-password; the fileset catalog should follow the same approach. Hiding alone would break the "client without credentials relies on server-side credentials" flow, so the fileset-level vending is extended to infer providers from static credentials.Fix: #11642
Does this PR introduce any user-facing change?
catalog/schema/fileset.properties().fs.gravitino.enableCredentialVending=trueto obtain the server-side credentials. Clients that configure their own credentials are unaffected.fs.gravitino.enableCredentialVendingkeeps its default offalse.gravitino.catalog.credential.backfillToProperties=truerestores the previous behavior.How was this patch tested?
TestFilesetCatalogPropertiesMetadata(credentials hidden across catalog/schema/fileset, non-sensitive props visible);TestFilesetCatalogCredential(properties() filters credentials while propertiesWithCredentialProviders() retains them);TestCredentialUtils(static credential inference + explicit providers take precedence).gravitino-docker-test):FilesetS3CatalogIT(catalog.properties() hides credentials, getCredentials() returns S3SecretKeyCredential);GravitinoVirtualFileSystemS3IT.testS3CredentialVendingWithServerStaticCredentials(catalog with only static credentials + client withenableCredentialVending=trueand no credentials reads/writes S3 via vended credentials), plus the existing 9 GVFS S3 tests still pass.