Skip to content

feat(tesseract): Support SQL API grouped sub-query joins#11138

Open
ovr wants to merge 3 commits into
masterfrom
feat/tesseract-support-subquery-joins
Open

feat(tesseract): Support SQL API grouped sub-query joins#11138
ovr wants to merge 3 commits into
masterfrom
feat/tesseract-support-subquery-joins

Conversation

@ovr

@ovr ovr commented Jun 22, 2026

Copy link
Copy Markdown
Member

The SQL API encodes grouped join push-downs (e.g.
JOIN (SELECT ... GROUP BY ... ORDER BY ... LIMIT n) t ON ...) as a subqueryJoins entry on the query. Only the legacy planner consumed it (via customSubQueryJoins); buildSqlAndParamsRust dropped it before the native call and Tesseract had no way to receive it. As a result the joined sub-query was omitted entirely: queries returned unfiltered rows (the inner LIMIT/ORDER vanished) and projections referencing the sub-query alias failed with "missing FROM-clause entry".

Thread subqueryJoins end to end:

  • BaseQuery.js: pass subqueryJoins into both native queryParams.
  • New SubqueryJoin bridge (sql/joinType/alias + on member expression).
  • QueryPropertiesCompiler compiles each on into a SqlCall; stored on QueryProperties as LogicalSubqueryJoinItems.
  • SimpleQueryPlanner folds them into the query's LogicalJoin.
  • Physical builder emits them as SingleSource::RawSubquerySql joins; the no-join fast path now also checks subquery_joins so alias-only projections still materialize the join.

The opaque sub-query SQL (incl. its ORDER BY/LIMIT) is emitted verbatim, so top-N semantics are preserved. The SQL API sends the join alias pre-quoted and references it verbatim in the ON condition, so the raw sub-query source is emitted with its alias as-is (no re-quoting). Changes are gated on a non-empty subquery_joins, leaving existing plans untouched.

The SQL API encodes grouped join push-downs (e.g.
`JOIN (SELECT ... GROUP BY ... ORDER BY ... LIMIT n) t ON ...`) as a
`subqueryJoins` entry on the query. Only the legacy planner consumed it
(via `customSubQueryJoins`); `buildSqlAndParamsRust` dropped it before
the native call and Tesseract had no way to receive it. As a result the
joined sub-query was omitted entirely: queries returned unfiltered rows
(the inner LIMIT/ORDER vanished) and projections referencing the
sub-query alias failed with "missing FROM-clause entry".

Thread `subqueryJoins` end to end:
- BaseQuery.js: pass `subqueryJoins` into both native queryParams.
- New `SubqueryJoin` bridge (sql/joinType/alias + `on` member expression).
- QueryPropertiesCompiler compiles each `on` into a `SqlCall`; stored on
  QueryProperties as `LogicalSubqueryJoinItem`s.
- SimpleQueryPlanner folds them into the query's LogicalJoin.
- Physical builder emits them as `SingleSource::RawSubquerySql` joins; the
  no-join fast path now also checks `subquery_joins` so alias-only
  projections still materialize the join.

The opaque sub-query SQL (incl. its ORDER BY/LIMIT) is emitted verbatim,
so top-N semantics are preserved. The SQL API sends the join alias
pre-quoted and references it verbatim in the ON condition, so the raw
sub-query source is emitted with its alias as-is (no re-quoting). Changes
are gated on a non-empty `subquery_joins`, leaving existing plans untouched.
@ovr ovr requested review from a team as code owners June 22, 2026 16:41
@github-actions github-actions Bot added rust Pull requests that update Rust code javascript Pull requests that update Javascript code labels Jun 22, 2026
@claude

claude Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Claude finished @ovr's task in 2m 38s —— View job


Review complete — 1 high, 1 medium, 4 low

Expand for full review

End-to-end threading of subqueryJoins from BaseQuery.js through the native query params, into a new SubqueryJoin bridge, compiled into LogicalSubqueryJoinItems, folded into LogicalJoin, and emitted as SingleSource::RawSubquerySql reads cleanly. The no-join fast path correctly checks subquery_joins, and the derive-root-from-ON-hints branch in simple_query_planer.rs is a nice touch for the alias-only projection case. Integration tests exercise the two main shapes.

Findings

# Sev File:Line Issue
1 High physical_plan_builder/processors/logical_join.rs:99-103 Unknown join_type (RIGHT/FULL/CROSS/...) silently downgraded to LEFT — match it explicitly and error on the rest.
2 Medium multi_fact_join_groups.rs:1 Unused JoinHintItem import — will fail -D warnings.
3 Low cube_bridge/subquery_join.rs:17-21 alias consumed verbatim and sql emitted verbatim — document the pre-quoted/trusted-producer contract on the trait.
4 Low query_properties.rs:1173-1182 PartialEq compares only (sql, alias, join_type), not on_sql — two joins with the same triple but different ON conditions compare equal; risk for plan-cache reuse.
5 Low planners/simple_query_planer.rs:83-108 Degenerate case subqueryJoins with empty ON dependencies → make_join_logical_plan_with_join_hints(empty, …); worth a guard or test.
6 Low tests/integration/member_expressions.rs:208-304 Tests cover only INNER + substring asserts; add LEFT path, an insta snapshot, and a "RIGHT must error" case once #1 is fixed.

Other observations

  • BaseQuery.js:965, 1015: forwarded in both buildSqlAndParamsRust and findPreAggregationForQueryRust — good, both rust entrypoints see it.
  • physical_plan/from.rs:104 correctly uses query_aliased_prequoted only for RawSubquerySql; template variable quoted_alias matches the existing template definition.
  • references_builder.rs:272-279: returning None for RawSubquerySql is correct since members are referenced as alias-prefixed literals, not resolved against a schema.
  • No new unit tests for query_properties_compiler::compile_subquery_joins (the Struct expression rejection path, the join_type default fallback) — covered indirectly via integration tests but a focused unit test would be cheap.
  • The LogicalSubqueryJoinItem struct uses public fields, while its sibling LogicalJoinItem exposes accessors — minor style inconsistency, not blocking.

No security issues from the cubesql producer side, but the trust boundary on sql/alias should be documented (finding #3).

· [Branch](https://github.com/cube-js/cube/tree/feat/tesseract-support-subquery-joins)

Comment thread rust/cube/cubesqlplanner/cubesqlplanner/src/planner/multi_fact_join_groups.rs Outdated
Comment on lines +17 to +21
pub sql: String,
#[serde(rename = "joinType")]
pub join_type: Option<String>,
pub alias: String,
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alias contract is trust-based. alias is consumed verbatim (no quoting) and the inner sql is emitted verbatim into the outer query. The PR description states the SQL API sends a pre-quoted alias and a controlled inner SELECT, which is fine for the cubesql producer, but the trait itself doesn't enforce or document the invariant. If any other caller of buildSqlAndParamsRust ever populates subqueryJoins (the JS BaseQuery now forwards this.options.subqueryJoins from any caller), an unquoted/attacker-controlled alias would land in the FROM clause untouched.

Consider tightening the doc-comment on this struct to state explicitly that:

  1. alias MUST be a fully-quoted identifier produced by the caller, and
  2. sql is emitted verbatim and MUST come from a trusted, planner-side path.

Optionally validate at compile time that alias starts/ends with the quote character (" or the dialect-specific one). Worth at least a // SAFETY: …-style note here so the next reader doesn't reintroduce quoting at a different layer.

Comment on lines +1173 to 1182
// Compare sub-query joins by their request-derived identity (the
// compiled `on_sql` is derived deterministically from these).
&& subquery_joins.len() == other.subquery_joins.len()
&& subquery_joins
.iter()
.zip(other.subquery_joins.iter())
.all(|(a, b)| {
a.sql == b.sql && a.alias == b.alias && a.join_type == b.join_type
})
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PartialEq ignores on_sql. The justification (the compiled SqlCall is derived deterministically from sql/alias/join_type) isn't quite right — on_sql is compiled from the on member expression, not from the three fields compared here. Two subqueryJoins entries can share (sql, alias, join_type) while having different on conditions and still compare equal.

In practice QueryProperties::eq seems to be used for plan-caching/equivalence — a false positive here could let one query reuse another's cached plan. Either:

  • include on_sql in the comparison (e.g. via SqlCall's structural eq if available), or
  • explicitly call this out as "identity, not semantic equality" in the doc.

Comment on lines +83 to 108
let subquery_joins = self.query_properties.subquery_joins();
let source = if let Some(join) = &join {
self.join_planner
.make_join_logical_plan(join, subquery_dimension_queries)
} else if !subquery_joins.is_empty() {
// No selected member resolves the base cube, but the sub-query joins'
// ON conditions reference it (e.g. `SELECT t.col FROM Cube JOIN (...) t
// ON Cube.x = t.x`). Derive the join root from those ON references so
// the base cube — and the sub-query joins below — are emitted.
let mut hints = JoinHints::new();
for subquery_join in subquery_joins {
for dep in subquery_join.on_sql.get_dependencies() {
hints.extend(&collect_join_hints(&dep)?);
}
}
self.join_planner
.make_join_logical_plan_with_join_hints(hints, subquery_dimension_queries)?
} else {
self.join_planner.make_empty_join_logical_plan()
};
let source = if subquery_joins.is_empty() {
source
} else {
source.with_subquery_joins(subquery_joins.clone())
};
Ok(source)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor smells in the assembly:

  1. subquery_joins.clone() is called even though with_subquery_joins already takes ownership of a Vec. Since subquery_joins here is a &Vec, the clone is needed — but the resulting source then carries a copy of items that the QueryProperties already owns. Not a correctness issue, just a small allocation per query.

  2. When join.is_none() and subquery_joins.is_empty() you fall through to make_empty_join_logical_plan(). If someone sends a subqueryJoins with an empty ON dependency set (degenerate but possible), hints ends up empty and make_join_logical_plan_with_join_hints(empty_hints, …) is called — worth a quick check that that path doesn't blow up. Adding a test for "subqueryJoins with no cube reference in ON" would close that gap.

@claude

claude Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Claude finished @ovr's task in 42s —— View job


Re-review after chore: fixes (51fe01e)

One-line verdict: 1 high, 4 low still open — the unused-import nit is fixed.

Expand for details

Addressed in 51fe01e:

  • Medium — Unused JoinHintItem import in multi_fact_join_groups.rs removed. Thread resolved.
  • Test helper make_subquery_join now returns Result and propagates MockMemberSql::new failures cleanly — small but nice.

Still open from the prior review:

  • Highphysical_plan_builder/processors/logical_join.rs:99-103 still silently downgrades unknown join_type (RIGHT/FULL/CROSS) to LEFT. Recommend matching INNER/LEFT explicitly and erroring on the rest.
  • Low — cube_bridge/subquery_join.rs:17-21 — trust contract on alias/sql (verbatim, pre-quoted) still undocumented on the trait.
  • Low — query_properties.rs:1173-1182PartialEq still excludes on_sql; two entries with the same (sql, alias, join_type) but different ON conditions compare equal.
  • Low — planners/simple_query_planer.rs:83-108 — degenerate subqueryJoins with empty ON dependencies is still unguarded/untested.
  • Low — tests/integration/member_expressions.rs — coverage now passes via ? but is still INNER-only with assert!(... .contains(...)); LEFT path + an insta snapshot + a "RIGHT must error" case would all strengthen this.

No new findings introduced by this commit.

· [Branch](https://github.com/cube-js/cube/tree/feat/tesseract-support-subquery-joins)

@codecov

codecov Bot commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.12%. Comparing base (bd8b6e6) to head (5f8da2b).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #11138      +/-   ##
==========================================
- Coverage   83.63%   79.12%   -4.51%     
==========================================
  Files         256      473     +217     
  Lines       79019    96514   +17495     
  Branches        0     3558    +3558     
==========================================
+ Hits        66085    76368   +10283     
- Misses      12934    19628    +6694     
- Partials        0      518     +518     
Flag Coverage Δ
cube-backend 58.74% <ø> (?)
cubesql 83.63% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@claude

claude Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Claude finished @ovr's task in 1m 1s —— View job


Re-review after chore: review (52f9357)

One-line verdict: 0 high, 3 low still open — High and one Low resolved.

Expand for details

Addressed in 52f9357:

  • Highphysical_plan_builder/processors/logical_join.rs:99-110 now matches INNER/LEFT explicitly and returns CubeError::user("Unsupported join type ...") for anything else. Thread resolved.
  • Low — tests/integration/member_expressions.rs now exercises the LEFT path (test_subquery_join_grouped_left), uses insta::assert_snapshot! on all three sub-query-join tests, and adds test_subquery_join_unknown_join_type covering the RIGHT rejection. Thread resolved.

Still open from the prior review:

  • Low — cube_bridge/subquery_join.rs:11-21 — doc-comment describes shape but not the trust contract on alias/sql (alias must be pre-quoted, sql is emitted verbatim and must come from a trusted producer). A // SAFETY:-style note would prevent future re-introduction of quoting at the wrong layer.
  • Low — query_properties.rs:1173-1182PartialEq still excludes on_sql. The new doc-comment says on_sql is "derived deterministically from these" three fields, but that's not quite right — on_sql is compiled from the on member expression, so two entries with the same (sql, alias, join_type) and different on still compare equal. Either include on_sql or reword the comment to call this out as identity-not-semantic equality.
  • Low — planners/simple_query_planer.rs:83-108 — degenerate subqueryJoins with no cube reference in ON (empty hints) is still unguarded/untested.

No new findings introduced by this commit. Snapshot files look clean and match the verbatim-alias / verbatim-inner-SQL invariants.

· [Branch](https://github.com/cube-js/cube/tree/feat/tesseract-support-subquery-joins)

@ovr ovr force-pushed the feat/tesseract-support-subquery-joins branch from 52f9357 to 5f8da2b Compare June 22, 2026 17:45
@claude

claude Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Claude finished @ovr's task in 50s —— View job


Re-review (no new commits since 5f8da2b)

One-line verdict: no change — 3 low still open, all tracked in existing unresolved threads.

Details

HEAD is still 5f8da2b chore: review. No new commits have landed since the previous re-review, so findings are identical:

  • Low — cube_bridge/subquery_join.rs:21 — trust contract on alias (pre-quoted) / sql (verbatim) still undocumented on the trait. Tracked in unresolved thread.
  • Low — query_properties.rs:1180PartialEq still excludes on_sql; the doc-comment's "derived deterministically from these three" claim isn't quite right (on_sql is compiled from the on member expression, not from sql/alias/join_type). Tracked in unresolved thread.
  • Low — simple_query_planer.rs:83-108 — degenerate subqueryJoins with no cube reference in ON (empty hintsmake_join_logical_plan_with_join_hints(empty, …)) still unguarded/untested. Tracked in unresolved thread.

Re-affirmed the 3 prior threads still apply; no new inline comments needed. Resolved threads from prior rounds (high join_type validation, unused import, INNER-only tests) remain addressed.

· Branch ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

javascript Pull requests that update Javascript code rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant