HIVE-27224: Enhance drop table/partition command #5851

dengzhhu653 · 2025-06-10T00:19:41Z

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

sonarqubecloud · 2025-06-18T10:37:31Z

Quality Gate passed

Issues
18 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

...ne-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java

wecharyu · 2026-01-03T16:28:59Z

...ne-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java

+        part_vals = getPartValsFromName(t, dropPartitionReq.getPartName());
      }
+      partNames.add(Warehouse.makePartName(t.getPartitionKeys(), part_vals));


Suggested change

part_vals = getPartValsFromName(t, dropPartitionReq.getPartName());

}

partNames.add(Warehouse.makePartName(t.getPartitionKeys(), part_vals));

partNames.add(dropPartitionReq.getPartName());

} else {

partNames.add(Warehouse.makePartName(t.getPartitionKeys(), part_vals));

}

...lone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java

...ne-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java

...store-server/src/main/java/org/apache/hadoop/hive/metastore/handler/DropDatabaseHandler.java

wecharyu · 2026-01-04T03:24:00Z

...-server/src/main/java/org/apache/hadoop/hive/metastore/handler/AbstractOperationHandler.java

+  private A result;
+  private boolean async;
+  private Future<A> future;
+  private ExecutorService executor;


Is it better to use a shared thread pool for the operation handler? In the current implementation, the number of threads is not bounded, which could lead to resource exhaustion or even crashes.

The number of handler's threads is limited by the max threads the thrift server can spawn, which is set by hive.metastore.server.max.threads.

In production, I think we shouldn't not have such a high drop databases/table operations happens near the same time, usually the database is the bottleneck before the Metastore hits the limit, if this is the case, we can tune down the hive.metastore.server.max.threads.

In the async mode, a thread may trigger multiple operation handlers, so the hive.metastore.server.max.threads could not limit total threads here. If we configure a fixed size pool for the async operations, it can help limit service traffic to some extent.

Usually the new handler runs inside the same thread as the parent, such as DropDatabaseHandler, involves multiple DropTableHandlers, these DropTableHandlers runs inside the same thread as the DropDatabaseHandler

Usually the new handler runs inside the same thread as the parent, such as DropDatabaseHandler, involves multiple DropTableHandlers, these DropTableHandlers run inside the same thread as the DropDatabaseHandler

Oh I mean that for async request, the HMSHandler thread could create an operation handler where the executor starts a new thread. Then the HMSHandler thread finish this request immediatly and could handle another request where it may produce another new thread.

If such async request is frequent, it may lead to an explosion in the number of threads.

Though Metastore handles the async request in background, the client doesn't, it pings the server for the status until the end:

hive/standalone-metastore/metastore-client/src/main/java/org/apache/hadoop/hive/metastore/client/ThriftHiveMetaStoreClient.java

Lines 1524 to 1529 in 79f63b6

while (!resp.isFinished() && !Thread.currentThread().isInterrupted()) {

resp = client.drop_database_req(req);

if (resp.getMessage() != null) {

LOG.info(resp.getMessage());

}

}

the client will know whether the request is successful or not as usual at the end, and the Metastore needs a handler thread to answer the request.

the HMSHandler thread finish this request immediatly

hive/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/handler/AbstractOperationHandler.java

Line 179 in 79f63b6

result = async ? future.get(timeout, TimeUnit.MILLISECONDS) : future.get();

Now it will wait for 5 seconds before answering the API for long running drop unless the request is satisfied within this timeout, then we can get the result.

There are customized clients based on ThriftHiveMetastore.Iface, which may not guarantee such behavior.

asyncDrop defaults to false, unless it's specified explicitly as true, then the customized client should take care of his case.

...store-server/src/main/java/org/apache/hadoop/hive/metastore/handler/DropDatabaseHandler.java

...ne-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java

saihemanth-cloudera · 2026-01-05T17:39:43Z

A couple of test failures seem to be related to this patch.

wecharyu · 2026-01-06T04:41:49Z

...ne-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java

+          wh.addToChangeManagement(funcCmPath);
+        }
+        if (req.isDeleteData()) {
+          // Moving the data deletion out of the async handler.


I think we should move this into the operation handler, because if a thrift client only calls this api once in async mode, then such cleanup code would never be run.

In async mode, the client still need to ping the server for the operation status until the end, the client needs to know whether the request is a failure or not.
The main reason is the TUGIBasedProcessor/TUGIAssumingProcessor might close the shared FileSystem behind, causing the "java.io.IOException: Filesystem closed" for the handler running in background.

Still we need to address this "Filesystem closed" issue, as we don't know whether there are Filesystem operations in the Metastore listeners.

FileSystem.closeAllForUGI(clientUgi); in TUGIAssumingProcessor seems a bug, assume that there are two requests with same ugi to handle the same path uri concurrently, it may also hit the "Filesystem closed" issue.

This is indeed a tricky problem, not sure if we can only remove cache for inactive ugi to solve it. And for this thread, it still has an issue if the client crush between two pings before the operation handler finished, the cleanup code will not take effect either.

nice catch, we should take the client crash into the picture

...-server/src/main/java/org/apache/hadoop/hive/metastore/handler/AbstractOperationHandler.java

wecharyu · 2026-01-08T16:33:35Z

...e-common/src/main/java/org/apache/hadoop/hive/metastore/security/HadoopThriftAuthBridge.java

+             if (ugiTransport.getClientUGI() == null) {
+               ugiTransport.setClientUGI(clientUgi);
+             }
+             clientUgi = ugiTransport.getClientUGI();


Is this line unnecessary? clientUgi is already initialized.

The ugi is identical: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L483-L491,
reuse the ugi cached in ugiTransport if possible so the connection will get the same FileSystem instance from cache in the whole lifetime

...ne-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java

wecharyu · 2026-01-09T17:16:14Z

...ne-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java

-      if (request.isNeedResult()) {
+      AddPartitionsHandler addPartsOp = AbstractOperationHandler.offer(this, request);
+      if (addPartsOp.success() && request.isNeedResult()) {
+        AddPartitionsHandler.AddPartitionsResult addPartsResult = addPartsOp.getResult();


Can we store the partition list in the AddPartitionsResult and return it directly here?

no enough, the addPartsOp.success() need to check on the state(success or not) of addPartsOp.getResult()

wecharyu · 2026-01-09T17:30:28Z

...-server/src/main/java/org/apache/hadoop/hive/metastore/handler/AbstractOperationHandler.java

+          if (async) {
+            OPID_CLEANER.schedule(() -> OPID_TO_HANDLER.remove(id), 1, TimeUnit.HOURS);
+          }
+          afterExecute(resultV);


If afterExecute() is needed only when the execute() is success, we can check the result here

Suggested change

afterExecute(resultV);

if (resultV != null && resultV.success()) {

afterExecute(resultV);

}```

the afterExecute is also called in case of failure to free up some resources the handler might hold

sonarqubecloud · 2026-01-11T03:15:59Z

Quality Gate passed

Issues
67 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
1.2% Duplication on New Code

See analysis details on SonarQube Cloud

asf-ci-hive added tests pending tests failed and removed tests pending tests failed labels Jun 10, 2025

dengzhhu653 force-pushed the HIVE-27224 branch from 02a1e6a to 148a424 Compare June 11, 2025 08:35

asf-ci-hive added tests unstable tests pending tests failed and removed tests pending tests unstable tests failed labels Jun 11, 2025

dengzhhu653 changed the title ~~[WIP] HIVE-27224: Enhance drop table/partition command~~ HIVE-27224: Enhance drop table/partition command Jun 12, 2025

asf-ci-hive added tests passed tests pending and removed tests pending tests passed labels Jun 12, 2025

asf-ci-hive added tests passed and removed tests pending labels Jun 18, 2025

wecharyu reviewed Jan 4, 2026

View reviewed changes

wecharyu reviewed Jan 6, 2026

View reviewed changes

...-server/src/main/java/org/apache/hadoop/hive/metastore/handler/AbstractOperationHandler.java Show resolved Hide resolved

review-5, fix ut

e54f5a9

asf-ci-hive added tests pending and removed tests unstable labels Jan 7, 2026

dengzhhu653 force-pushed the HIVE-27224 branch from e603abc to e54f5a9 Compare January 7, 2026 05:44

asf-ci-hive added tests failed tests pending tests unstable and removed tests pending tests failed labels Jan 7, 2026

FileSystem closed

7153c33

asf-ci-hive added tests pending tests passed and removed tests unstable tests pending labels Jan 8, 2026

wecharyu reviewed Jan 9, 2026

View reviewed changes

minor

9c1119a

asf-ci-hive added tests pending tests unstable and removed tests passed tests pending tests unstable labels Jan 10, 2026

asf-ci-hive added tests passed and removed tests pending labels Jan 11, 2026

	while (!resp.isFinished() && !Thread.currentThread().isInterrupted()) {
	resp = client.drop_database_req(req);
	if (resp.getMessage() != null) {
	LOG.info(resp.getMessage());
	}
	}

HIVE-27224: Enhance drop table/partition command #5851

Are you sure you want to change the base?

HIVE-27224: Enhance drop table/partition command #5851

Uh oh!

Conversation

dengzhhu653 commented Jun 10, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

sonarqubecloud bot commented Jun 18, 2025

Quality Gate passed

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dengzhhu653 Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dengzhhu653 Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

saihemanth-cloudera commented Jan 5, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dengzhhu653 Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Jan 11, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

dengzhhu653 Jan 5, 2026 •

edited

Loading

dengzhhu653 Jan 6, 2026 •

edited

Loading

dengzhhu653 Jan 6, 2026 •

edited

Loading