feat(tesseract): Support SQL API grouped sub-query joins#11138
Conversation
The SQL API encodes grouped join push-downs (e.g. `JOIN (SELECT ... GROUP BY ... ORDER BY ... LIMIT n) t ON ...`) as a `subqueryJoins` entry on the query. Only the legacy planner consumed it (via `customSubQueryJoins`); `buildSqlAndParamsRust` dropped it before the native call and Tesseract had no way to receive it. As a result the joined sub-query was omitted entirely: queries returned unfiltered rows (the inner LIMIT/ORDER vanished) and projections referencing the sub-query alias failed with "missing FROM-clause entry". Thread `subqueryJoins` end to end: - BaseQuery.js: pass `subqueryJoins` into both native queryParams. - New `SubqueryJoin` bridge (sql/joinType/alias + `on` member expression). - QueryPropertiesCompiler compiles each `on` into a `SqlCall`; stored on QueryProperties as `LogicalSubqueryJoinItem`s. - SimpleQueryPlanner folds them into the query's LogicalJoin. - Physical builder emits them as `SingleSource::RawSubquerySql` joins; the no-join fast path now also checks `subquery_joins` so alias-only projections still materialize the join. The opaque sub-query SQL (incl. its ORDER BY/LIMIT) is emitted verbatim, so top-N semantics are preserved. The SQL API sends the join alias pre-quoted and references it verbatim in the ON condition, so the raw sub-query source is emitted with its alias as-is (no re-quoting). Changes are gated on a non-empty `subquery_joins`, leaving existing plans untouched.
|
Claude finished @ovr's task in 2m 38s —— View job Review complete — 1 high, 1 medium, 4 lowExpand for full reviewEnd-to-end threading of Findings
Other observations
No security issues from the cubesql producer side, but the trust boundary on |
| pub sql: String, | ||
| #[serde(rename = "joinType")] | ||
| pub join_type: Option<String>, | ||
| pub alias: String, | ||
| } |
There was a problem hiding this comment.
Alias contract is trust-based. alias is consumed verbatim (no quoting) and the inner sql is emitted verbatim into the outer query. The PR description states the SQL API sends a pre-quoted alias and a controlled inner SELECT, which is fine for the cubesql producer, but the trait itself doesn't enforce or document the invariant. If any other caller of buildSqlAndParamsRust ever populates subqueryJoins (the JS BaseQuery now forwards this.options.subqueryJoins from any caller), an unquoted/attacker-controlled alias would land in the FROM clause untouched.
Consider tightening the doc-comment on this struct to state explicitly that:
aliasMUST be a fully-quoted identifier produced by the caller, andsqlis emitted verbatim and MUST come from a trusted, planner-side path.
Optionally validate at compile time that alias starts/ends with the quote character (" or the dialect-specific one). Worth at least a // SAFETY: …-style note here so the next reader doesn't reintroduce quoting at a different layer.
| // Compare sub-query joins by their request-derived identity (the | ||
| // compiled `on_sql` is derived deterministically from these). | ||
| && subquery_joins.len() == other.subquery_joins.len() | ||
| && subquery_joins | ||
| .iter() | ||
| .zip(other.subquery_joins.iter()) | ||
| .all(|(a, b)| { | ||
| a.sql == b.sql && a.alias == b.alias && a.join_type == b.join_type | ||
| }) | ||
| } |
There was a problem hiding this comment.
PartialEq ignores on_sql. The justification (the compiled SqlCall is derived deterministically from sql/alias/join_type) isn't quite right — on_sql is compiled from the on member expression, not from the three fields compared here. Two subqueryJoins entries can share (sql, alias, join_type) while having different on conditions and still compare equal.
In practice QueryProperties::eq seems to be used for plan-caching/equivalence — a false positive here could let one query reuse another's cached plan. Either:
- include
on_sqlin the comparison (e.g. viaSqlCall's structural eq if available), or - explicitly call this out as "identity, not semantic equality" in the doc.
| let subquery_joins = self.query_properties.subquery_joins(); | ||
| let source = if let Some(join) = &join { | ||
| self.join_planner | ||
| .make_join_logical_plan(join, subquery_dimension_queries) | ||
| } else if !subquery_joins.is_empty() { | ||
| // No selected member resolves the base cube, but the sub-query joins' | ||
| // ON conditions reference it (e.g. `SELECT t.col FROM Cube JOIN (...) t | ||
| // ON Cube.x = t.x`). Derive the join root from those ON references so | ||
| // the base cube — and the sub-query joins below — are emitted. | ||
| let mut hints = JoinHints::new(); | ||
| for subquery_join in subquery_joins { | ||
| for dep in subquery_join.on_sql.get_dependencies() { | ||
| hints.extend(&collect_join_hints(&dep)?); | ||
| } | ||
| } | ||
| self.join_planner | ||
| .make_join_logical_plan_with_join_hints(hints, subquery_dimension_queries)? | ||
| } else { | ||
| self.join_planner.make_empty_join_logical_plan() | ||
| }; | ||
| let source = if subquery_joins.is_empty() { | ||
| source | ||
| } else { | ||
| source.with_subquery_joins(subquery_joins.clone()) | ||
| }; | ||
| Ok(source) |
There was a problem hiding this comment.
Two minor smells in the assembly:
-
subquery_joins.clone()is called even thoughwith_subquery_joinsalready takes ownership of aVec. Sincesubquery_joinshere is a&Vec, the clone is needed — but the resulting source then carries a copy of items that theQueryPropertiesalready owns. Not a correctness issue, just a small allocation per query. -
When
join.is_none()andsubquery_joins.is_empty()you fall through tomake_empty_join_logical_plan(). If someone sends asubqueryJoinswith an empty ON dependency set (degenerate but possible),hintsends up empty andmake_join_logical_plan_with_join_hints(empty_hints, …)is called — worth a quick check that that path doesn't blow up. Adding a test for "subqueryJoins with no cube reference in ON" would close that gap.
|
Claude finished @ovr's task in 42s —— View job Re-review after
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #11138 +/- ##
==========================================
- Coverage 83.63% 79.12% -4.51%
==========================================
Files 256 473 +217
Lines 79019 96514 +17495
Branches 0 3558 +3558
==========================================
+ Hits 66085 76368 +10283
- Misses 12934 19628 +6694
- Partials 0 518 +518
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
Claude finished @ovr's task in 1m 1s —— View job Re-review after
|
52f9357 to
5f8da2b
Compare
|
Claude finished @ovr's task in 50s —— View job Re-review (no new commits since
|
The SQL API encodes grouped join push-downs (e.g.
JOIN (SELECT ... GROUP BY ... ORDER BY ... LIMIT n) t ON ...) as asubqueryJoinsentry on the query. Only the legacy planner consumed it (viacustomSubQueryJoins);buildSqlAndParamsRustdropped it before the native call and Tesseract had no way to receive it. As a result the joined sub-query was omitted entirely: queries returned unfiltered rows (the inner LIMIT/ORDER vanished) and projections referencing the sub-query alias failed with "missing FROM-clause entry".Thread
subqueryJoinsend to end:subqueryJoinsinto both native queryParams.SubqueryJoinbridge (sql/joinType/alias +onmember expression).oninto aSqlCall; stored on QueryProperties asLogicalSubqueryJoinItems.SingleSource::RawSubquerySqljoins; the no-join fast path now also checkssubquery_joinsso alias-only projections still materialize the join.The opaque sub-query SQL (incl. its ORDER BY/LIMIT) is emitted verbatim, so top-N semantics are preserved. The SQL API sends the join alias pre-quoted and references it verbatim in the ON condition, so the raw sub-query source is emitted with its alias as-is (no re-quoting). Changes are gated on a non-empty
subquery_joins, leaving existing plans untouched.