Skip to content

[#11706] fix(catalog-hive): replace switch-on-enum with if-else in HiveCatalogCapability.caseSensitiveOnName()#11707

Open
JoegenUSTC wants to merge 6 commits into
apache:mainfrom
JoegenUSTC:fix/hive-capability-switch-to-if-else
Open

[#11706] fix(catalog-hive): replace switch-on-enum with if-else in HiveCatalogCapability.caseSensitiveOnName()#11707
JoegenUSTC wants to merge 6 commits into
apache:mainfrom
JoegenUSTC:fix/hive-capability-switch-to-if-else

Conversation

@JoegenUSTC

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Replace the switch-on-enum with if-else in
HiveCatalogCapability.caseSensitiveOnName(), and add unit tests to
verify both the functional behavior and the absence of the synthetic
$1 class in the compiled output.

Why are the changes needed?

Fix: #11706

SchemaNormalizeDispatcher calls caseSensitiveOnName() after
IsolatedClassLoader.withClassLoader() exits — i.e. outside the catalog's
isolated classloader context. The switch-on-enum causes the Java compiler
to generate a synthetic helper class HiveCatalogCapability$1. In certain
classloader initialization timing windows, the server classloader is involved
in loading $1 but cannot find it in the server classpath. The JVM
permanently caches this load failure for the process lifetime
, causing all
subsequent calls to throw NoClassDefFoundError until restart.

Replacing switch with if-else using == on enum constants compiles to
if_acmpeq (reference comparison) with no synthetic class dependency, making
caseSensitiveOnName() safe to call from any classloader context.

Does this PR introduce any user-facing change?

No. The method behavior is identical; only the compiled representation changes.

How was this patch tested?

Added TestHiveCatalogCapability with three test methods:

  • Case-insensitive scopes (SCHEMA, TABLE, COLUMN) return unsupported
  • Case-sensitive scopes (FILESET, TOPIC, PARTITION, MODEL) return supported
  • HiveCatalogCapability$1 does not exist in the compiled output
    (guards against future regression back to switch)

…Capability.caseSensitiveOnName()

SchemaNormalizeDispatcher calls caseSensitiveOnName() outside the
IsolatedClassLoader.withClassLoader() boundary. The switch-on-enum
causes the Java compiler to generate a synthetic $1 helper class.
In certain classloader initialization timing windows the server
classloader ends up requesting $1 and cannot find it; the JVM then
permanently caches this load failure for the process lifetime, so all
subsequent calls throw NoClassDefFoundError until restart.

Replace switch with if-else using == on enum constants. This compiles
to if_acmpeq (reference comparison) with no synthetic class, making
caseSensitiveOnName() safe to call from any classloader context.

Add TestHiveCatalogCapability covering:
- case-insensitive scopes (SCHEMA, TABLE, COLUMN) return unsupported
- case-sensitive scopes (FILESET, TOPIC, PARTITION, MODEL) return supported
- HiveCatalogCapability$1 does not exist in the compiled output

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses an intermittent NoClassDefFoundError caused by the compiler-generated synthetic $1 helper class from a switch-on-enum in HiveCatalogCapability.caseSensitiveOnName(), which can be invoked outside the catalog’s isolated classloader context.

Changes:

  • Replaced switch-on-enum with if/else enum reference comparisons in HiveCatalogCapability.caseSensitiveOnName() to avoid generating HiveCatalogCapability$1.
  • Added unit tests validating case sensitivity behavior for different Capability.Scope values.
  • Added a regression test asserting HiveCatalogCapability$1 is absent from compiled output.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
catalogs/catalog-hive/src/main/java/org/apache/gravitino/catalog/hive/HiveCatalogCapability.java Removes enum-switch bytecode pattern to prevent synthetic $1 generation and classloader-related failures.
catalogs/catalog-hive/src/test/java/org/apache/gravitino/catalog/hive/TestHiveCatalogCapability.java Adds behavior tests plus a guard test ensuring the synthetic $1 class is not generated.

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown

Code Coverage Report

Overall Project 67.16% 🟢
Files changed 71.43% 🟢

Module Coverage
aliyun 1.72% 🔴
api 46.82% 🟢
authorization-common 85.96% 🟢
aws 3.66% 🔴
azure 2.47% 🔴
catalog-common 10.4% 🔴
catalog-fileset 80.23% 🟢
catalog-glue 66.91% 🟢
catalog-hive 79.89% -0.07% 🟢
catalog-jdbc-clickhouse 80.02% 🟢
catalog-jdbc-common 44.22% 🟢
catalog-jdbc-doris 80.28% 🟢
catalog-jdbc-hologres 54.03% 🟢
catalog-jdbc-mysql 79.23% 🟢
catalog-jdbc-oceanbase 78.38% 🟢
catalog-jdbc-postgresql 82.29% 🟢
catalog-jdbc-starrocks 78.51% 🟢
catalog-kafka 77.01% 🟢
catalog-lakehouse-generic 58.53% 🟢
catalog-lakehouse-hudi 79.1% 🟢
catalog-lakehouse-iceberg 85.86% 🟢
catalog-lakehouse-paimon 82.14% 🟢
catalog-model 77.72% 🟢
cli 44.51% 🟢
client-java 78.01% 🟢
common 50.17% 🟢
core 82.59% 🟢
filesystem-hadoop3 77.27% 🟢
flink 0.0% 🔴
flink-common 47.12% 🟢
flink-runtime 0.0% 🔴
gcp 14.12% 🔴
hadoop-common 10.88% 🔴
hive-metastore-common 53.77% 🟢
iceberg-common 58.15% 🟢
iceberg-rest-server 73.98% 🟢
idp-basic 86.2% 🟢
integration-test-common 0.0% 🔴
jobs 66.17% 🟢
lance-common 20.83% 🔴
lance-rest-server 60.13% 🟢
lineage 53.02% 🟢
optimizer 82.95% 🟢
optimizer-api 21.95% 🔴
server 85.96% 🟢
server-common 74.18% 🟢
spark 28.57% 🔴
spark-common 41.66% 🟢
trino-connector 40.25% 🟢
Files
Module File Coverage
catalog-hive HiveCatalogCapability.java 71.43% 🟢

JoegenUSTC and others added 2 commits June 17, 2026 17:16
…ean test signature

- Add explicit null check in caseSensitiveOnName() to throw NPE for null
  scope, preserving the same fail-fast behaviour the original switch-on-enum
  had (switch throws NPE on null by JLS spec).
- Remove redundant 'throws ClassNotFoundException' from
  testCaseSensitiveOnNameSyntheticClassAbsent(): the exception is fully
  handled inside assertThrows and cannot escape the test method.

@diqiu50 diqiu50 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution The fix is correct, but it's a workaround rather than a root fix. The real issue is that CatalogWrapper.capabilities() returns a catalog-classloader-loaded object that is then used outside withClassLoader() — any method on a Capability implementation that touches plugin-jar classes will hit the same problem. Worth a follow-up issue to ensure all Capability method calls are made within the correct classloader context.

@yuqi1129

yuqi1129 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

If this problem did exist for the Hive catalog, we may need to check all catalogs to avoid similar problem.

Replace manual 'throw new NullPointerException' with
Preconditions.checkArgument(scope != null, ...) to follow the
Gravitino coding convention. Also add a dedicated test
testCaseSensitiveOnNameNullScopeThrows to verify the
IllegalArgumentException is thrown for null scope.
@JoegenUSTC

Copy link
Copy Markdown
Contributor Author

same

Thanks for the thorough review! Agreed — this is a workaround rather than a root fix. I've opened a follow-up issue #11726 to track ensuring all Capability method calls are made within the correct classloader context.
@diqiu50

@JoegenUSTC

JoegenUSTC commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

If this problem did exist for the Hive catalog, we may need to check all catalogs to avoid similar problem.

Thanks for the note! That's a valid concern — any catalog Capability implementation that references plugin-jar classes could hit the same issue. I've opened #11726 to track a proper root fix. Looking forward to your investigation, and happy to help if needed.

@JoegenUSTC

Copy link
Copy Markdown
Contributor Author

Hi @diqiu50 @yuqi1129, thanks for the valuable feedback!

Two updates:

  1. Null check: Updated to use Preconditions.checkArgument(scope != null, "scope cannot be null") per the Gravitino coding convention, with a dedicated test added to cover the null scope case.
  2. Root fix: Opened follow-up issue [Improvement] Ensure all Capability method calls are made within the correct classloader context #11726 to track ensuring all Capability method calls are made within the correct classloader context — agreed this PR is a workaround and the broader fix deserves its own issue.

CI is now passing. Could this PR be merged as-is, with the root fix tracked in #11726?

@JoegenUSTC JoegenUSTC requested a review from diqiu50 June 22, 2026 06:32
@yuqi1129

Copy link
Copy Markdown
Contributor

I have no more comments.

@diqiu50

diqiu50 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Hi @diqiu50 @yuqi1129, thanks for the valuable feedback!

Two updates:

  1. Null check: Updated to use Preconditions.checkArgument(scope != null, "scope cannot be null") per the Gravitino coding convention, with a dedicated test added to cover the null scope case.
  2. Root fix: Opened follow-up issue [Improvement] Ensure all Capability method calls are made within the correct classloader context #11726 to track ensuring all Capability method calls are made within the correct classloader context — agreed this PR is a workaround and the broader fix deserves its own issue.

CI is now passing. Could this PR be merged as-is, with the root fix tracked in #11726?

I don’t think this workaround should be merged. We should fix the real problem instead.

@JoegenUSTC

Copy link
Copy Markdown
Contributor Author

Hi @diqiu50 @yuqi1129, thanks for the valuable feedback!
Two updates:

  1. Null check: Updated to use Preconditions.checkArgument(scope != null, "scope cannot be null") per the Gravitino coding convention, with a dedicated test added to cover the null scope case.
  2. Root fix: Opened follow-up issue [Improvement] Ensure all Capability method calls are made within the correct classloader context #11726 to track ensuring all Capability method calls are made within the correct classloader context — agreed this PR is a workaround and the broader fix deserves its own issue.

CI is now passing. Could this PR be merged as-is, with the root fix tracked in #11726?

I don’t think this workaround should be merged. We should fix the real problem instead.

Thanks for the clear direction, @diqiu50!

I've implemented the root fix in a new PR: #11755

The approach wraps the Capability delegate in a JDK dynamic proxy inside CapabilityHelpers.getCapability(). The proxy's InvocationHandler executes every method call inside classLoader.withClassLoader(), so the correct classloader context is always active regardless of the Capability implementation style — no per-call-site changes needed.

This makes the switch-on-enum → if-else change here unnecessary. I'll close this PR once the root fix is accepted. Would appreciate your review on the new PR when you have a moment!

@yuqi1129

Copy link
Copy Markdown
Contributor

@JoegenUSTC
Could you share which JDK you are using? so we could reproduce the problem you raise.

@JoegenUSTC

Copy link
Copy Markdown
Contributor Author

@JoegenUSTC Could you share which JDK you are using? so we could reproduce the problem you raise.

The issue was observed with JDK 17 (the version required by Gravitino's build).

@yuqi1129

Copy link
Copy Markdown
Contributor

@JoegenUSTC Could you share which JDK you are using? so we could reproduce the problem you raise.

The issue was observed with JDK 17 (the version required by Gravitino's build).

More details are preferable, and we have never run into the problem in JDK17(JDK17 is also our debugging JDK version). Can you paste the result of java --version here?

@JoegenUSTC

Copy link
Copy Markdown
Contributor Author

@JoegenUSTC Could you share which JDK you are using? so we could reproduce the problem you raise.

The issue was observed with JDK 17 (the version required by Gravitino's build).

More details are preferable, and we have never run into the problem in JDK17(JDK17 is also our debugging JDK version). Can you paste the result of java --version here?
@yuqi1129
/gravitino$ java --version
openjdk 17.0.15 2025-04-15
OpenJDK Runtime Environment Temurin-17.0.15+6 (build 17.0.15+6)
OpenJDK 64-Bit Server VM Temurin-17.0.15+6 (build 17.0.15+6, mixed mode, sharing)

@JoegenUSTC

Copy link
Copy Markdown
Contributor Author

Hi @diqiu50 @yuqi1129, thanks for the valuable feedback!
Two updates:

  1. Null check: Updated to use Preconditions.checkArgument(scope != null, "scope cannot be null") per the Gravitino coding convention, with a dedicated test added to cover the null scope case.
  2. Root fix: Opened follow-up issue [Improvement] Ensure all Capability method calls are made within the correct classloader context #11726 to track ensuring all Capability method calls are made within the correct classloader context — agreed this PR is a workaround and the broader fix deserves its own issue.

CI is now passing. Could this PR be merged as-is, with the root fix tracked in #11726?

I don’t think this workaround should be merged. We should fix the real problem instead.

@diqiu50 @yuqi1129 — Thanks again for the valuable feedback on both PRs. I wanted to share a new development and get your thoughts.

Following @diqiu50's direction to address the root problem, I opened #11755 with a JDK proxy approach. During review, @yuqi1129 raised a concern — supported by a small standalone test — that the proxy's withClassLoader wrapping may not be sufficient for the $1 case, since the synthetic class is resolved by the defining classloader rather than the TCCL. His test showed the if-else version handles this more reliably, even in edge cases where the classloader has already been closed.

This suggests the classloader lifecycle fix (ensuring IsolatedClassLoader isn't closed while Capability instances are still in use) is a broader problem that deserves its own careful design, now tracked in #11726.

Given this, I'd like to gently revisit whether this PR could be merged as a well-scoped, targeted fix for the immediate problem. I fully respect @diqiu50's preference to avoid landing workarounds, and I remain committed to the lifecycle root fix in #11726. That said, since the root fix will take more time to design and validate properly, would you both be open to landing this as a stable interim improvement while #11726 progresses?

Happy to hear your thoughts — and thank you both for the continued engagement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants