|
| 1 | +# Window Functions and CTE Implementation for StackQL Parser |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +This document describes the experimental implementation of SQL window functions and CTEs (Common Table Expressions) in the stackql-parser fork. This is a proof-of-concept implementation to validate the approach before implementing it properly in the main stackql-parser repository. |
| 6 | + |
| 7 | +## Summary of Changes |
| 8 | + |
| 9 | +### Files Modified |
| 10 | + |
| 11 | +The following files in `internal/stackql-parser-fork/go/vt/sqlparser/` were modified: |
| 12 | + |
| 13 | +1. **ast.go** - Added AST types for window functions and CTEs |
| 14 | +2. **sql.y** - Added grammar rules for parsing |
| 15 | +3. **token.go** - Added keyword mappings |
| 16 | +4. **constants.go** - Added constants for frame types |
| 17 | +5. **external_visitor.go** - Added Accept methods for new types |
| 18 | + |
| 19 | +### New Test Files Created |
| 20 | + |
| 21 | +- `window_test.go` - Unit tests for window function parsing |
| 22 | +- `cte_test.go` - Unit tests for CTE parsing |
| 23 | + |
| 24 | +## Implementation Details |
| 25 | + |
| 26 | +### Window Functions |
| 27 | + |
| 28 | +#### AST Types Added (ast.go) |
| 29 | + |
| 30 | +```go |
| 31 | +// OverClause represents an OVER clause for window functions |
| 32 | +OverClause struct { |
| 33 | + WindowName ColIdent |
| 34 | + WindowSpec *WindowSpec |
| 35 | +} |
| 36 | + |
| 37 | +// WindowSpec represents a window specification |
| 38 | +WindowSpec struct { |
| 39 | + PartitionBy Exprs |
| 40 | + OrderBy OrderBy |
| 41 | + Frame *FrameClause |
| 42 | +} |
| 43 | + |
| 44 | +// FrameClause represents a frame clause (ROWS/RANGE) |
| 45 | +FrameClause struct { |
| 46 | + Unit string // ROWS or RANGE |
| 47 | + Start *FramePoint |
| 48 | + End *FramePoint |
| 49 | +} |
| 50 | + |
| 51 | +// FramePoint represents a frame boundary |
| 52 | +FramePoint struct { |
| 53 | + Type string // UNBOUNDED PRECEDING, CURRENT ROW, etc. |
| 54 | + Expr Expr // for N PRECEDING or N FOLLOWING |
| 55 | +} |
| 56 | +``` |
| 57 | + |
| 58 | +The `FuncExpr` struct was extended with an `Over *OverClause` field. |
| 59 | + |
| 60 | +#### Grammar Rules Added (sql.y) |
| 61 | + |
| 62 | +- `over_clause_opt` - Optional OVER clause after function calls |
| 63 | +- `window_spec` - Window specification (PARTITION BY, ORDER BY, frame) |
| 64 | +- `partition_by_opt` - Optional PARTITION BY clause |
| 65 | +- `frame_clause_opt` - Optional frame specification (ROWS/RANGE) |
| 66 | +- `frame_point` - Frame boundary points |
| 67 | + |
| 68 | +#### Tokens Added |
| 69 | + |
| 70 | +- `OVER`, `ROWS`, `RANGE`, `UNBOUNDED`, `PRECEDING`, `FOLLOWING`, `CURRENT`, `ROW` |
| 71 | + |
| 72 | +#### Supported Syntax |
| 73 | + |
| 74 | +```sql |
| 75 | +-- Simple window function |
| 76 | +SELECT SUM(count) OVER () FROM t |
| 77 | + |
| 78 | +-- With ORDER BY |
| 79 | +SELECT RANK() OVER (ORDER BY count DESC) FROM t |
| 80 | + |
| 81 | +-- With PARTITION BY |
| 82 | +SELECT SUM(count) OVER (PARTITION BY category) FROM t |
| 83 | + |
| 84 | +-- With PARTITION BY and ORDER BY |
| 85 | +SELECT SUM(count) OVER (PARTITION BY category ORDER BY name) FROM t |
| 86 | + |
| 87 | +-- With frame clause |
| 88 | +SELECT SUM(count) OVER (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM t |
| 89 | + |
| 90 | +-- Multiple window functions |
| 91 | +SELECT SUM(x) OVER (), COUNT(*) OVER (ORDER BY y) FROM t |
| 92 | + |
| 93 | +-- Window functions with aggregates |
| 94 | +SELECT serviceName, COUNT(*) as count, SUM(COUNT(*)) OVER () as total FROM t GROUP BY serviceName |
| 95 | +``` |
| 96 | + |
| 97 | +### CTEs (Common Table Expressions) |
| 98 | + |
| 99 | +#### AST Types Added (ast.go) |
| 100 | + |
| 101 | +```go |
| 102 | +// With represents a WITH clause (CTE) |
| 103 | +With struct { |
| 104 | + Recursive bool |
| 105 | + CTEs []*CommonTableExpr |
| 106 | +} |
| 107 | + |
| 108 | +// CommonTableExpr represents a single CTE definition |
| 109 | +CommonTableExpr struct { |
| 110 | + Name TableIdent |
| 111 | + Columns Columns |
| 112 | + Subquery *Subquery |
| 113 | +} |
| 114 | +``` |
| 115 | + |
| 116 | +The `Select` struct was extended with a `With *With` field. |
| 117 | + |
| 118 | +#### Grammar Rules Added (sql.y) |
| 119 | + |
| 120 | +- `cte_list` - List of CTE definitions |
| 121 | +- `cte` - Single CTE definition |
| 122 | +- Extended `base_select` with WITH clause alternatives |
| 123 | + |
| 124 | +#### Tokens Added |
| 125 | + |
| 126 | +- `RECURSIVE` |
| 127 | + |
| 128 | +#### Supported Syntax |
| 129 | + |
| 130 | +```sql |
| 131 | +-- Simple CTE |
| 132 | +WITH cte AS (SELECT id FROM t) SELECT * FROM cte |
| 133 | + |
| 134 | +-- CTE with column list |
| 135 | +WITH cte (col1, col2) AS (SELECT id, name FROM t) SELECT * FROM cte |
| 136 | + |
| 137 | +-- Multiple CTEs |
| 138 | +WITH cte1 AS (SELECT id FROM t1), cte2 AS (SELECT id FROM t2) SELECT * FROM cte1 JOIN cte2 |
| 139 | + |
| 140 | +-- Recursive CTE |
| 141 | +WITH RECURSIVE cte AS (SELECT 1 AS n UNION ALL SELECT n + 1 FROM cte WHERE n < 10) SELECT * FROM cte |
| 142 | + |
| 143 | +-- CTE with window function |
| 144 | +WITH sales AS (SELECT product, amount FROM orders) |
| 145 | +SELECT product, SUM(amount) OVER (ORDER BY product) FROM sales |
| 146 | +``` |
| 147 | + |
| 148 | +## Key Design Decisions |
| 149 | + |
| 150 | +### Window Functions |
| 151 | + |
| 152 | +1. **OVER clause placement**: Added `over_clause_opt` to the `function_call_generic` rule to allow OVER on any generic function call. |
| 153 | + |
| 154 | +2. **Frame specification**: Supports both ROWS and RANGE frame types with: |
| 155 | + - UNBOUNDED PRECEDING |
| 156 | + - UNBOUNDED FOLLOWING |
| 157 | + - CURRENT ROW |
| 158 | + - N PRECEDING |
| 159 | + - N FOLLOWING |
| 160 | + |
| 161 | +3. **Named windows**: The grammar supports `OVER window_name` syntax for referencing named windows (though WINDOW clause definition is not yet implemented). |
| 162 | + |
| 163 | +### CTEs |
| 164 | + |
| 165 | +1. **Grammar approach**: Instead of using an optional `with_clause_opt` rule that includes an empty alternative (which caused grammar conflicts), we directly added WITH alternatives to the `base_select` rule. |
| 166 | + |
| 167 | +2. **Recursive CTEs**: Supported via the `WITH RECURSIVE` syntax. |
| 168 | + |
| 169 | +3. **Column lists**: Optional column list specification for CTEs is supported. |
| 170 | + |
| 171 | +## Parser Conflicts |
| 172 | + |
| 173 | +The implementation increases reduce/reduce conflicts from 461 to 464. This is acceptable for an experimental implementation. |
| 174 | + |
| 175 | +## Testing |
| 176 | + |
| 177 | +### Unit Tests |
| 178 | + |
| 179 | +All parser unit tests pass: |
| 180 | +- 8 window function tests |
| 181 | +- 5 CTE tests |
| 182 | + |
| 183 | +### Running Tests |
| 184 | + |
| 185 | +```bash |
| 186 | +cd /home/user/stackql-devel/internal/stackql-parser-fork/go/vt/sqlparser |
| 187 | +go test -run "TestWindowFunctions|TestCTEs" -v |
| 188 | +``` |
| 189 | + |
| 190 | +## Next Steps for Production Implementation |
| 191 | + |
| 192 | +1. **Upstream the changes**: Apply these changes to the main `stackql-parser` repository. |
| 193 | + |
| 194 | +2. **Execution layer**: Implement window function and CTE execution in the SQLite backend: |
| 195 | + - SQLite already supports window functions and CTEs natively |
| 196 | + - Need to ensure the parsed AST is correctly converted to SQL for execution |
| 197 | + |
| 198 | +3. **Named Windows**: Add support for the `WINDOW` clause to define named windows: |
| 199 | + ```sql |
| 200 | + SELECT SUM(x) OVER w FROM t WINDOW w AS (PARTITION BY y ORDER BY z) |
| 201 | + ``` |
| 202 | + |
| 203 | +4. **Additional window functions**: The parser supports any function name with OVER. Consider adding specific handling for: |
| 204 | + - ROW_NUMBER() |
| 205 | + - RANK() |
| 206 | + - DENSE_RANK() |
| 207 | + - LEAD() |
| 208 | + - LAG() |
| 209 | + - FIRST_VALUE() |
| 210 | + - LAST_VALUE() |
| 211 | + - NTH_VALUE() |
| 212 | + - NTILE() |
| 213 | + |
| 214 | +5. **Robot tests**: Add integration tests that verify window functions and CTEs work end-to-end with actual cloud provider data. |
| 215 | + |
| 216 | +## Known Limitations |
| 217 | + |
| 218 | +1. **No execution support**: This implementation only adds parsing support. The execution layer still needs to be updated to handle window functions and CTEs. |
| 219 | + |
| 220 | +2. **Pre-existing test failures**: The parser has some pre-existing test failures unrelated to window functions/CTEs (table name quoting, OR operator rendering). These should be addressed separately. |
| 221 | + |
| 222 | +3. **Build complexity**: The local fork approach with replace directive in go.mod can cause directory conflicts when building the main stackql binary. For production, the changes should be upstreamed to stackql-parser. |
| 223 | + |
| 224 | +## Files to Review |
| 225 | + |
| 226 | +- `internal/stackql-parser-fork/go/vt/sqlparser/ast.go` - AST type definitions |
| 227 | +- `internal/stackql-parser-fork/go/vt/sqlparser/sql.y` - Grammar rules (lines ~3074-3410) |
| 228 | +- `internal/stackql-parser-fork/go/vt/sqlparser/token.go` - Keyword mappings |
| 229 | +- `internal/stackql-parser-fork/go/vt/sqlparser/constants.go` - Frame type constants |
| 230 | +- `internal/stackql-parser-fork/go/vt/sqlparser/window_test.go` - Window function tests |
| 231 | +- `internal/stackql-parser-fork/go/vt/sqlparser/cte_test.go` - CTE tests |
0 commit comments