[CALCITE-7279] Resolve ClickHouse identifier resolution error by aliasing nested JOIN projections #4692

Dwrite · 2025-12-17T14:58:44Z

ClickHouse does not support standard nested JOIN syntax and has strict scoping rules for identifiers in subqueries. When a JOIN's right side is another JOIN, it must be wrapped in a subquery.

This commit ensures that the ClickHouse dialect:

Wraps nested JOINs in a subquery to satisfy syntax requirements.

Generates explicit column aliases (e.g., AS col_name) for all projected columns within the subquery. This prevents "Unknown Identifier" errors by ensuring internal table qualifiers are flattened and resolvable by the outer query block.

Avoids redundant subquery nesting to maintain optimal SQL structure.

…ialect

core/src/main/java/org/apache/calcite/sql/SqlDialect.java

core/src/main/java/org/apache/calcite/sql/dialect/ClickHouseSqlDialect.java

core/src/test/java/org/apache/calcite/rel/rel2sql/RelToSqlConverterTest.java

core/src/main/java/org/apache/calcite/rel/rel2sql/RelToSqlConverter.java

core/src/main/java/org/apache/calcite/sql/SqlDialect.java

xiedeyantu · 2025-12-18T01:05:25Z

core/src/test/java/org/apache/calcite/rel/rel2sql/RelToSqlConverterTest.java

+        .withClickHouse()
+        .exec();
+
+    assertThat(sql, containsString("LEFT JOIN (SELECT *"));


I think use following is fine. Just using a expected string.

sql(query) .schema(CalciteAssert.SchemaSpec.JDBC_SCOTT) .withClickHouse() .ok(expected);

Thanks for the suggestion. I agree that using ok(expected) is generally preferred
and is what we should use whenever possible.

In this case, however, the generated SQL contains auto-generated subquery aliases
(e.g. t220414 / t220415) coming from Calcite’s alias generation, and these IDs are
not stable across different runs (for example, gradlew test vs IDE runs).
This makes a strict ok(expected) assertion flaky.

To avoid introducing non-deterministic tests, this test verifies the structural
properties of the generated SQL instead (i.e. whether the required wrapper
SELECT * FROM (...) is present or absent), rather than asserting exact alias
names.

I’ve added a comment in the test to document this reasoning. If alias generation
becomes stable in the future, we can switch this back to ok(expected).

I feel that if a fixed SQL statement is processed in a specific way, the alias should be stable. Can you explain why it might be unstable?

I believe this issue remains to be discussed and has not been resolved.

xiedeyantu · 2025-12-18T01:11:12Z

Tip: The commit message, Jira title and PR title should be consistent. I think the commit message title is fine. PR title may lose something.

core/src/test/java/org/apache/calcite/rel/rel2sql/RelToSqlConverterTest.java

core/src/main/java/org/apache/calcite/rel/rel2sql/RelToSqlConverter.java

core/src/test/java/org/apache/calcite/rel/rel2sql/RelToSqlConverterTest.java

core/src/main/java/org/apache/calcite/rel/rel2sql/RelToSqlConverter.java

…cit column aliases

dssysolyatin · 2025-12-22T07:53:07Z

core/src/main/java/org/apache/calcite/rel/rel2sql/RelToSqlConverter.java

+      String targetName = fieldNames.get(i);
+
+      // If the expression is already aliased, strip the AS to get the raw expression.
+      if (expr.getKind() == SqlKind.AS) {


It is quite strange. What if there are ambiguous fields?
What will happen if I want to have something like t1.id as customer_id, t2.id as product_id?
It will become t1.id as id, t2.id as id

Also, can you point me to ClickHouse documentation that shows this limitation?
Have you tried creating an issue in the ClickHouse repository as well ?

I have updated the PR description with the specific error log from ClickHouse 25.12.1.

To answer your concern about ambiguity: Calcite's RelDataType already ensures that all field names in the row type are unique. My implementation uses these unique names for explicit aliasing (e.g., t1.id AS id, t2.id AS id_0). This guarantees that the subquery 'exports' a clean and deterministic schema to the outer query, resolving the ClickHouse scoping limitation without risking field collisions.

I also renamed the methods to shouldWrapNestedJoin and wrapNestedJoinWithAliases to better reflect their purpose.

A test might be a good way to illustrate this.

dssysolyatin · 2025-12-22T08:04:13Z

core/src/main/java/org/apache/calcite/sql/dialect/ClickHouseSqlDialect.java

    writer.endList(frame);
  }
+
+  @Override public boolean supportWrapNestedJoin(RelNode rel) {


Why dialect itself detect if join is nested ? Why dialect just don't have:

boolean supportWrapNestedJoin() { return true // just true for clickhouse without logic }

Similar to requiresAliasForFromItems

Then RelToSqlConverter detect whether the join is nested or not, instead of moving such logic on dialect level

That is a great point regarding architectural consistency. The reason I passed the RelNode to the dialect instead of using a simple boolean flag is that not all nested JOINs require wrapping in ClickHouse.

Context-Specific Wrapping: ClickHouse's syntax restriction primarily targets cases where the right side of a JOIN is another JOIN. If we used a global supportWrapNestedJoin() boolean, RelToSqlConverter might end up wrapping every single JOIN or every nested structure indiscriminately, which would lead to redundant subqueries and potential performance regressions for simple linear joins (e.g., A JOIN B JOIN C).

Scoping & Aliasing Complexity: As discussed, the fix isn't just about adding a SELECT * wrapper; it's about explicitly aliasing the projected columns based on the specific RelDataType of that join node. By passing the RelNode, the dialect (or the converter guided by the dialect) can inspect the row type to ensure identifiers are flattened correctly.

Flexibility for Future Dialects: Some dialects might only need wrapping for specific types of JOINs (e.g., RIGHT JOIN or OUTER JOIN). Passing the RelNode provides the necessary metadata (like JoinRelType) to make an informed decision, whereas a simple boolean flag like requiresAliasForFromItems is too "blunt" for this specific scoping issue.

However, I agree we should keep the common logic in RelToSqlConverter. My implementation aims to keep the "When to wrap" decision in the Dialect (since it's a dialect-specific quirk) while keeping the "How to wrap" logic in the Converter.

I think @dssysolyatin makes sense. Are there other databases that have similar problems?

I have verified this against PostgreSQL, and as expected, PostgreSQL handles nested join identifiers natively without requiring any subquery wrapping or explicit aliasing. It can correctly resolve identifiers like d2.loc across nested join boundaries.

This confirms that the issue is specific to certain dialects like ClickHouse, which enforce stricter subquery isolation and scope boundaries. In ClickHouse, the internal table qualifiers are effectively "hidden" from the outer join scope unless they are flattened via an explicit AS alias in a subquery.

By introducing dialect.shouldWrapNestedJoin(), we can keep the generated SQL clean and standard for databases like PostgreSQL, while providing the necessary "scoping fix" for dialects with these specific constraints.

mihaibudiu · 2025-12-22T20:00:12Z

core/src/main/java/org/apache/calcite/rel/rel2sql/RelToSqlConverter.java

+    Result rightResult = visitInput(e, 1).resetAlias();
+
+    if (dialect.shouldWrapNestedJoin(e)) {
+      String newAlias = "t" + e.getId(); // alias


can you explain why this identifier is safe and won't for example hide another identifier with the same name that happens to exist?

Regarding the safety of the identifier t + e.getId():

In the RelToSqlConverter phase, original SQL aliases (like 'j') are often not preserved or can become ambiguous after multiple logical transformations. Using rel.getId() is a standard practice in Calcite for generating synthetic aliases because:

Global Uniqueness: RelNode IDs are globally unique within a single planning session. This ensures that the generated alias t{id} will not collide with any other synthetic aliases generated by the converter for different nodes in the same query tree.

Deterministic Scoping: Since we are creating a new subquery scope to resolve the ClickHouse scoping issue, this unique identifier provides a clean boundary. Even if a user-defined table named t123 exists, the risk of collision is significantly lower than attempting to reuse or guess original aliases which might have been pruned or renamed by optimization rules.

Avoids Identifier Shadowing: By forcing this unique alias at the join boundary, we ensure that the outer scope has a deterministic way to reference the flattened columns without worrying about shadowing identifiers from deeper nested structures.

While using SqlValidatorUtil.uniquify is another option, it requires maintaining a usedNames set across the entire implementation. Given Calcite's internal conventions, t + id provides the best balance between implementation robustness and identifier safety for this dialect-specific fix.

Significantly lower != 0.
A compiler has to work for all user programs.
The worst case is that some programs are compiled to produce incorrect results. These kinds of bugs are also very hard to diagnose when they happen. So why not avoid them?
I can easily see a program that has many tables named T0, T1, etc. where such a collision can happen.
For column names Calcite automatically generates fresh names if there is a collision.
If that's not true for relation names (I don't know myself), I think the implementation has to make the extra effort to generate correct code, for example by computing the used names before generating a fresh identifier. If uniquify does what you need, why not use it?

core/src/main/java/org/apache/calcite/sql/dialect/ClickHouseSqlDialect.java

sonarqubecloud · 2025-12-24T01:53:07Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
83.3% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

[CALCITE-7279] Add subquery wrappers for nested JOINs in ClickHouse d…

a62efc6

…ialect

mihaibudiu reviewed Dec 17, 2025

View reviewed changes

xiedeyantu reviewed Dec 18, 2025

View reviewed changes

xuzifu666 reviewed Dec 18, 2025

View reviewed changes

Dwrite changed the title ~~[CALCITE-7279] Add subquery wrappers for nested JOINs in ClickHouse d…~~ [CALCITE-7279] Add subquery wrappers for nested JOINs in ClickHouse dialect Dec 19, 2025

dssysolyatin reviewed Dec 20, 2025

View reviewed changes

core/src/main/java/org/apache/calcite/rel/rel2sql/RelToSqlConverter.java Outdated Show resolved Hide resolved

[CALCITE-7279] ClickHouse dialect should wrap nested JOINs with expli…

6a0ce52

…cit column aliases

dssysolyatin reviewed Dec 22, 2025

View reviewed changes

Dwrite changed the title ~~[CALCITE-7279] Add subquery wrappers for nested JOINs in ClickHouse dialect~~ [CALCITE-7279] Resolve ClickHouse identifier resolution error by aliasing nested JOIN projections Dec 22, 2025

Dwrite requested a review from mihaibudiu December 22, 2025 15:12

Update func name

0f0a280

mihaibudiu reviewed Dec 22, 2025

View reviewed changes

core/src/main/java/org/apache/calcite/sql/dialect/ClickHouseSqlDialect.java Outdated Show resolved Hide resolved

krooswu added 3 commits December 24, 2025 09:19

Update comment

eb97c36

Update comment

4d15d72

Update comment

b32d386

Dwrite requested review from dssysolyatin and mihaibudiu December 25, 2025 14:10

[CALCITE-7279] Resolve ClickHouse identifier resolution error by aliasing nested JOIN projections #4692

Are you sure you want to change the base?

[CALCITE-7279] Resolve ClickHouse identifier resolution error by aliasing nested JOIN projections #4692

Uh oh!

Conversation

Dwrite commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xiedeyantu commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dwrite Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sonarqubecloud bot commented Dec 24, 2025

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Dwrite commented Dec 17, 2025 •

edited

Loading

xiedeyantu commented Dec 18, 2025 •

edited

Loading

Dwrite Dec 22, 2025 •

edited

Loading