Skip to content

Add relational algebra module (rel) #33

@julianhyde

Description

@julianhyde

Add a rel module implementing a relational algebra layer for Morel, modelled on Apache Calcite's RelNode / RelBuilder API. The module provides a clean intermediate representation for query plans that is independent of Morel's surface syntax.

Motivation

Morel already compiles from/where/yield expressions into an internal step-based representation (core::StepKind) and performs structural optimizations in from_builder.rs. This works, but the representation is tightly coupled to the surface syntax. A proper relational algebra layer would:

  • Give us a clean, SQL-agnostic IR that can be optimized independently of the surface syntax.
  • Make it straightforward to target Morel from SQL front-ends, or to emit SQL.
  • Provide the foundation for a cost-based query optimizer in the future.
  • Align Morel's architecture with the well-understood vocabulary of Apache Calcite.

What we're building

Core operators (Rel enum)

A Rel enum whose variants correspond to the standard logical relational operators, mirroring Calcite's RelNode hierarchy:

Variant Calcite equivalent Description
Rel::TableScan LogicalTableScan Scan a named table from a schema
Rel::Filter LogicalFilter Select rows matching a boolean condition
Rel::Project LogicalProject Compute a new row from expressions over the input
Rel::Aggregate LogicalAggregate Group by key fields and apply aggregate functions
Rel::Sort LogicalSort Order rows; also used for LIMIT/OFFSET
Rel::Join LogicalJoin Inner, left, right, full, semi, and anti joins
Rel::Union LogicalUnion Union of two or more relations (all or distinct)
Rel::Intersect LogicalIntersect Intersection
Rel::Minus LogicalMinus Set difference
Rel::Values LogicalValues Inline constant rows

Rel is a Rust enum, not a trait object hierarchy. The set of operators is closed, so exhaustive match is more idiomatic than Box<dyn RelNode> and avoids vtable overhead.

RelBuilder

A fluent builder that maintains an internal stack of Rel nodes, mirroring Calcite's RelBuilder:

let mut b = RelBuilder::new(scott_schema());
b.scan(&["scott", "EMP"])
 .filter(b.gt(b.field("SAL"), b.literal(Val::Int(1000), int())))
 .project(&[b.field("ENAME"), b.field("SAL")])
 .sort(&[b.desc(b.field("SAL"))]);
let plan = b.build();

Builder methods apply structural simplifications eagerly (e.g. filter(false) → empty Values, identity projections are eliminated, duplicate sort keys are dropped).

Expressions

Scalar expressions inside Rel nodes reuse compile::core::Expr — Morel's existing typed expression tree. There is no separate RexNode hierarchy. The one new variant needed is Expr::Scalar(Box<Rel>) for scalar subqueries.

Types

Row types reuse compile::types::Type. A relation's row type is Type::Record(false, …) and the relation's own type is Type::Bag(Box<row_type>) (unordered) or Type::List(Box<row_type>) after a Sort. No new SqlType, RowType, or TypeFactory is introduced.

Acceptance criteria

Port Calcite's RelBuilderTest to Rust (in tests/rel_builder_test.rs) and pass ≥ 80 % of the ~230 test methods. The tests cover all core operators, the builder's simplification rules, alias handling, error conditions, and plan-display output.

Out of scope

  • Row-level execution (planning layer only)
  • Cost-based optimisation / rule engine
  • Physical conventions (EnumerableConvention, etc.)
  • Window functions, MATCH_RECOGNIZE, PIVOT, UNPIVOT
  • Table functions, temporal tables, dynamic parameters
  • Correlated subqueries (Correlate node)
  • Integration with the Morel compiler (separate issue)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions