HIVE-29308: Exception when JDBC table names are case-sensitive #6197

soumyakanti3578 · 2025-11-18T07:59:33Z

What changes were proposed in this pull request?

In GenericJdbcDatabaseAccessor.getQualifiedTableName() quote schema and table names, as the qualified name is used to run queries against different DBs.
In CalcitePlanner.genTableLogicalPlan() remove the double quotes (unescape identifier) from Constants.JDBC_SCHEMA and Constants.JDBC_TABLE
In JdbcStorageHandler.getURIForAuth() unescape Constants.JDBC_TABLE, as otherwise we generate the jdbc string with "
Store quoting characters for different databases in DatabaseType
Helper methods to unescape identifiers
Ignore SqlException from parameterMetaData.getParameterType - jdbc_table_with_schema_oracle.q was failing because of this
Tests for different DBs with case sensitive schemas and tables.

Why are the changes needed?

We see Exceptions, as shown here:
https://issues.apache.org/jira/browse/HIVE-29308

Does this PR introduce any user-facing change?

No

How was this patch tested?

mvn test -pl itests/qtest -Pitests -Dtest=TestMiniLlapLocalCliDriver -Dtest.output.overwrite=true -Dqfile="jdbc_case_sensitive_postgres.q,jdbc_case_sensitive_mssql.q,jdbc_case_sensitive_mariadb.q,jdbc_case_sensitive_oracle.q,jdbc_table_with_schema_oracle.q"

sonarqubecloud · 2025-12-30T06:29:01Z

Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

zabetak

Left quite a few comments/questions but those that are the most important are:

the potential breaking changes
the conflicts/ambiguity between case sensitive and insensitive identifiers.

zabetak · 2026-01-02T16:07:44Z

jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/dao/GenericJdbcDatabaseAccessor.java

      return null;
    }
-    String schemaName = conf.get(Constants.JDBC_SCHEMA);
+    String schemaName = quoteIdentifier(unescapeHiveJdbcIdentifier(conf.get(Constants.JDBC_SCHEMA)), dbType);


If we always quote the schema/table name then we are making all code relying on this method case-sensitive. In other words, if previously the user had defined "hive.sql.table" = "Country" then queries over this table may stop working due to case sensitivity. Quoting everything seems to be a breaking change.

This method getQualifiedTableName is used to build queries that are run in the backend db. We have to make sure to quote the table and schema names here, if we want to support case sensitivity. Currently, if the user defines "hive.sql.table" = "Country", the table name in the query that is run in the backend db is "essentially" country. And if we quote all the time we will be changing the table name to Country, which might break existing jobs.

I think it's easy to check if the table and schema names were "escaped" by the user before we "unescape" and quote. If they were not escaped, we could skip it.

Conditional escaping is a good idea. I have the impression though that we may also need to make quoting conditional. For the conditional we have various options:

Based on the content/value of the hive.sql.table property as proposed above

Based on a new global config property (e.g., hive.jdbc.identifiers.casesensitive)

Based on table level property (e.g., hive.sql.identifiers.casesensitive)

Not sure whats the best option to move forward. Maybe we can take some inspiration by checking what other engines (Presto/Trino/Spark/etc) are doing.

zabetak · 2026-01-02T16:13:45Z

ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java

+   * @return the unescaped identifier
+   */
+  public static String unescapeHiveJdbcIdentifier(String identifier) {
+    return unescapeIdentifier(identifier, '"');


The choice of " as the "Hive" identifier seems a bit arbitrary. Who/Where do we define the values that are allowed inside hive.sql.table, hive.sql.schema, etc?

What prevents a MSSQL user to write "hive.sql.table"="[WorldData]". Was this working before? Is this working now?

I used " as a POC for now, but I think we just have to decide on a quoting character for Hive. I think either ' or " should be fine, but we definitely should support just one. An MSSQL user can create a table in the backend using [..], but when creating an external Hive table, they should use the quoting character that Hive supports. What do you think?

At this stage, I am trying to understand the impact on existing use-cases. Does "hive.sql.table"="[WorldData]" work without the changes in this PR? Was there any kind of quoting that was working even without the changes in this PR?

zabetak · 2026-01-02T16:16:55Z

ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java

+            final String schemaName = unescapeHiveJdbcIdentifier(tabMetaData.getProperty(Constants.JDBC_SCHEMA));
+            final String tableName = unescapeHiveJdbcIdentifier(tabMetaData.getProperty(Constants.JDBC_TABLE));


What happens if the user defines two tables with the following props:

"hive.sql.table"="[WorldData]"

"hive.sql.table"="WorldData"
The underlying DBMS may also have two tables defined as such.

Aren't we risking having conflicts/ambiguity if we "unescape" the content of the properties?

As discussed above in another comment, we should maybe decide on a quoting character for Hive.

zabetak · 2026-01-02T16:19:55Z

ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java

@@ -137,7 +138,7 @@ public void setJobProperties(Map<String, String> jobProperties) {
    this.jobProperties = jobProperties;
  }

-  @Explain(displayName = "jobProperties", explainLevels = { Level.EXTENDED })
+  @Explain(displayName = ExplainTask.JOB_PROPERTIES, explainLevels = { Level.EXTENDED })


This changes seems unrelated to the PR. From a quick search it seems that whenever we use @Explain(displayName we always have a plain string so for consistency reasons I would leave it as is.

It's related, but tangentially :)
I think I had to add CONFIG_USERNAME in ‎JdbcStorageConfigManager.java‎ to support case sensitivity for one of the backend DBs (maybe Oracle, but I have to check). And we don't want to print usernames and passwords in the explain plans. I removed the username from the explain plans in ExplainTask.java, where I am also using the variable ExplainTask.JOB_PROPERTIES.

It's better to use the variable ExplainTask.JOB_PROPERTIES here to avoid bugs in the future arising from changes in the displayName.

Without these changes we will see username in some plans.

Using a variable creates dependencies (import) to classes that were not present before. Another point that is a bit worrisome is that we rely on the displayName (that should be relevant only for an end-user) to perform some actions in the code. I don't know if this pattern is already present in the code but overall seems shaky and we should better avoid it if possible.

Anyways, as mentioned elsewhere this change should land separately so we can continue the discussion on the respective PR.

zabetak · 2026-01-02T16:27:19Z

jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/JdbcStorageHandler.java

      "jdbc:metastore://" : tableProperties.get(Constants.JDBC_URL);
-    String table_name = tableProperties.get(Constants.JDBC_TABLE);
-    return new URI(host_url+"/"+table_name);
+    String tableName = unescapeHiveJdbcIdentifier(tableProperties.get(Constants.JDBC_TABLE));


If we "unescape" some content of the property aren't we risking conflicts/ambiguity.
For instance:

"hive.sql.table"="\"Country\""

"hive.sql.table"="Country"

Moreover, it seems that the current unescaping is not gonna work for other quoting chars:

"hive.sql.table"="[Country]"

"hive.sql.table"="`Country`"

Yes, I checked that we do run into conflicts here. I added

CREATE TABLE "WorldData".Country ( id int, name varchar(20) ); INSERT INTO "WorldData".Country VALUES (5, 'France'), (6, 'Italy'), (7, 'Spain'), (8, 'Portugal');

to q_test_case_sensitive.postgres.sql

While creating the external table, if we pass "hive.sql.table" = "country", the select outputs the new data shown above, but if we pass "hive.sql.table" = "Country" or "hive.sql.table" = "\"Country\"", the select returns data from the case sensitive table.

I think we can check if the value of "hive.sql.table" is escaped or not. If not, then we can probably always just use lower case. Let me know what you think about this.

Using lower case may not always be an option since as far as I recall there are DBMS that store unquoted identifers using different conventions (e.g., upper case).

Apart from ambiguity quoted identifiers may contain arbitrary characters inside the quotes that are not URI friendly (e.g., ?%+-/&$). We may have to come up with a more robust normalization strategy but this depends on how getURIForAuth is used and if we care about supporting such use-cases.

zabetak · 2026-01-02T17:10:18Z

ql/src/test/queries/clientpositive/jdbc_case_sensitive_postgres.q

+
+
+-- Test Case-Sensitive Query Field Names
+-- (Should fail in SerDe/Iterator with Column not found)


If the test should fail it should be in the negative directory.

I think the comments are incorrect/from an earlier version. I will remove these too

zabetak · 2026-01-02T17:13:10Z

ql/src/test/queries/clientpositive/jdbc_case_sensitive_postgres.q

+-- Cleanup
+DROP TABLE country_test;
+DROP TABLE cities_test;
+DROP TABLE geography_test;


I have the impression that the framework takes care of the cleanup so these DROP statements are redundant.

zabetak · 2026-01-02T17:18:43Z

jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/conf/DatabaseType.java

+      throw new IllegalArgumentException("Database type string cannot be null");
+    }
+    // METASTORE must be handled before valueOf
+    if (METASTORE.name().equalsIgnoreCase(dbType)) {


Why does METASTORE need special handling?

zabetak · 2026-01-02T17:19:23Z

jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/conf/DatabaseType.java

+   * @return The matching DatabaseType.
+   * @throws IllegalArgumentException if the dbType is null or not a valid type.
+   */
+  public static DatabaseType from(String dbType) {


Why do we need this method and not simply use valueOf?

zabetak · 2026-01-02T17:27:10Z

ql/src/test/queries/clientpositive/jdbc_case_sensitive_postgres.q

+    "hive.sql.jdbc.url" = "${system:hive.test.database.qdb.jdbc.url}",
+    "hive.sql.dbcp.username" = "${system:hive.test.database.qdb.jdbc.username}",
+    "hive.sql.dbcp.password" = "${system:hive.test.database.qdb.jdbc.password}",
+    "hive.sql.schema" = "\"WorldData\"",


I have the impression that we also support single quoted table properties so it may be a bit more readable to use those instead of escaping double quotes:

'hive.sql.schema'='"WorldData"'

asf-ci-hive added tests pending tests unstable and removed tests pending labels Nov 18, 2025

asf-ci-hive added tests pending tests unstable and removed tests unstable tests pending labels Dec 18, 2025

soumyakanti3578 added 2 commits December 19, 2025 14:12

HIVE-29308: Exception when JDBC table names are case-sensitive

ed0011d

Ignore SqlException from parameterMetaData.getParameterType

7b5e97a

soumyakanti3578 force-pushed the HIVE-29308 branch from b44ea6f to 82c4ade Compare December 19, 2025 22:21

asf-ci-hive added tests pending tests unstable and removed tests unstable tests pending labels Dec 19, 2025

soumyakanti3578 force-pushed the HIVE-29308 branch from 82c4ade to c88fdfa Compare December 29, 2025 20:43

asf-ci-hive added tests pending tests unstable and removed tests unstable tests pending labels Dec 29, 2025

Stop printing JDBC username in Explain plan

1415a39

soumyakanti3578 force-pushed the HIVE-29308 branch from c88fdfa to 1415a39 Compare December 30, 2025 05:03

asf-ci-hive added tests pending and removed tests unstable labels Dec 30, 2025

asf-ci-hive added tests passed and removed tests pending labels Dec 30, 2025

soumyakanti3578 marked this pull request as ready for review December 30, 2025 17:29

zabetak reviewed Jan 2, 2026

View reviewed changes

		final String schemaName = unescapeHiveJdbcIdentifier(tabMetaData.getProperty(Constants.JDBC_SCHEMA));
		final String tableName = unescapeHiveJdbcIdentifier(tabMetaData.getProperty(Constants.JDBC_TABLE));



		-- Test Case-Sensitive Query Field Names
		-- (Should fail in SerDe/Iterator with Column not found)

HIVE-29308: Exception when JDBC table names are case-sensitive #6197

Are you sure you want to change the base?

HIVE-29308: Exception when JDBC table names are case-sensitive #6197

Conversation

soumyakanti3578 commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

sonarqubecloud bot commented Dec 30, 2025

Quality Gate passed

Uh oh!

zabetak left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

soumyakanti3578 commented Nov 18, 2025 •

edited

Loading