Skip to content

GCS: Throw NotFoundException for inexistent input GCS file#15734

Open
findinpath wants to merge 1 commit intoapache:mainfrom
findinpath:findinpath/gcs-not-found
Open

GCS: Throw NotFoundException for inexistent input GCS file#15734
findinpath wants to merge 1 commit intoapache:mainfrom
findinpath:findinpath/gcs-not-found

Conversation

@findinpath
Copy link
Copy Markdown
Contributor

@findinpath findinpath commented Mar 23, 2026

Signal early to the TableOperations that there is no retry needed for files which do not exist.

Additional context

Relevant apache/iceberg code using this change

Tasks.foreach(newLocation)
.retry(numRetries)
.exponentialBackoff(100, 5000, 600000, 4.0 /* 100, 400, 1600, ... */)
.throwFailureWhenFinished()
.stopRetryOn(NotFoundException.class) // overridden if shouldRetry is non-null
.shouldRetryTest(shouldRetry)
.run(metadataLocation -> newMetadata.set(metadataLoader.apply(metadataLocation)));

Issue found while testing GCS credentials vending on apache/iceberg-rest-fixture:1.10.1 on trinodb/trino trinodb/trino#28423

[qtp1357563986-31] WARN org.apache.iceberg.util.Tasks - Retrying task after failure: sleepTimeMs=403 Failed to read file: gs://trino-ci-test/gcs-vending-rest-test-w9a718ba0b/tpch/test_drop_table_with_missing_metadata_file_a422ditwmf-555e6d30e3834820993299f645ee11c1/metadata/00000-6a7be733-c273-463a-8700-0f02b17561b8.metadata.json
org.apache.iceberg.exceptions.RuntimeIOException: Failed to read file: gs://trino-ci-test/gcs-vending-rest-test-w9a718ba0b/tpch/test_drop_table_with_missing_metadata_file_a422ditwmf-555e6d30e3834820993299f645ee11c1/metadata/00000-6a7be733-c273-463a-8700-0f02b17561b8.metadata.json
	at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:311)
	at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:294)
	at org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$0(BaseMetastoreTableOperations.java:180)
	at org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$1(BaseMetastoreTableOperations.java:199)
	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
	at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
	at org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:199)
	at org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:176)
	at org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:167)
	at org.apache.iceberg.jdbc.JdbcTableOperations.doRefresh(JdbcTableOperations.java:100)
	at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:88)
	at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:71)
	at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:49)
	at org.apache.iceberg.rest.CatalogHandlers.loadTable(CatalogHandlers.java:329)
	at org.apache.iceberg.rest.RESTCatalogAdapter.handleRequest(RESTCatalogAdapter.java:420)
	at org.apache.iceberg.rest.RESTServerCatalogAdapter.handleRequest(RESTServerCatalogAdapter.java:42)
	at org.apache.iceberg.rest.RESTCatalogAdapter.execute(RESTCatalogAdapter.java:628)
	at org.apache.iceberg.rest.RESTCatalogAdapter.execute(RESTCatalogAdapter.java:609)
	at org.apache.iceberg.rest.RESTCatalogServlet.execute(RESTCatalogServlet.java:108)
	at org.apache.iceberg.rest.RESTCatalogServlet.doGet(RESTCatalogServlet.java:66)
	at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:500)
	at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:587)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:764)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:529)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
	at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:822)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1381)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:484)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1303)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
	at org.eclipse.jetty.server.Server.handle(Server.java:563)
	at org.eclipse.jetty.server.HttpChannel$RequestDispatchable.dispatch(HttpChannel.java:1598)
	at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:753)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:501)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:287)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:314)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
	at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:421)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:390)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:277)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:199)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:411)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149)
	at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.io.IOException: com.google.cloud.storage.StorageException: 404 Not Found
GET https://storage.googleapis.com/download/storage/v1/b/trino-ci-test/o/gcs-vending-rest-test-w9a718ba0b%2Ftpch%2Ftest_drop_table_with_missing_metadata_file_a422ditwmf-555e6d30e3834820993299f645ee11c1%2Fmetadata%2F00000-6a7be733-c273-463a-8700-0f02b17561b8.metadata.json?alt=media
No such object: trino-ci-test/gcs-vending-rest-test-w9a718ba0b/tpch/test_drop_table_with_missing_metadata_file_a422ditwmf-555e6d30e3834820993299f645ee11c1/metadata/00000-6a7be733-c273-463a-8700-0f02b17561b8.metadata.json
	at com.google.cloud.storage.BaseStorageReadChannel.read(BaseStorageReadChannel.java:143)
	at org.apache.iceberg.gcp.gcs.GCSInputStream.read(GCSInputStream.java:177)
	at org.apache.iceberg.gcp.gcs.GCSInputStream.read(GCSInputStream.java:141)
	at com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.ensureLoaded(ByteSourceJsonBootstrapper.java:547)
	at com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.detectEncoding(ByteSourceJsonBootstrapper.java:137)
	at com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.constructParser(ByteSourceJsonBootstrapper.java:266)
	at com.fasterxml.jackson.core.JsonFactory._createParser(JsonFactory.java:1874)
	at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:1273)
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3924)
	at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:309)
	... 54 more
Caused by: com.google.cloud.storage.StorageException: 404 Not Found
GET https://storage.googleapis.com/download/storage/v1/b/trino-ci-test/o/gcs-vending-rest-test-w9a718ba0b%2Ftpch%2Ftest_drop_table_with_missing_metadata_file_a422ditwmf-555e6d30e3834820993299f645ee11c1%2Fmetadata%2F00000-6a7be733-c273-463a-8700-0f02b17561b8.metadata.json?alt=media

@findinpath findinpath force-pushed the findinpath/gcs-not-found branch from d2f525d to cdde9bd Compare March 24, 2026 15:24
Signal to the TableOperations that there is no retry needed
for files which do not exist.
final class GCSExceptionUtil {
private GCSExceptionUtil() {}

static void throwNotFoundIfPresent(IOException ioException, BlobId blobId) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep reading this method name and either I'm not understanding or it feels very awkward. Shouldn't it say throwNotFoundIfNotPresent?

pos += 1;
channel.read(singleByteBuffer);
try {
channel.read(singleByteBuffer);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance that openChannel or seek could throw as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants