|
| 1 | +# Grammar Extension Patterns for SqlScriptDOM |
| 2 | + |
| 3 | +This guide documents common patterns for extending the SqlScriptDOM parser grammar to support new syntax or enhance existing functionality. |
| 4 | + |
| 5 | +## Pattern 1: Extending Literals to Expressions |
| 6 | + |
| 7 | +### When to Use |
| 8 | +When existing grammar rules only accept literal values but need to support dynamic expressions like parameters, variables, or computed values. |
| 9 | + |
| 10 | +### Example Problem |
| 11 | +Functions or constructs that currently accept only: |
| 12 | +- `IntegerLiteral` (e.g., `TOP_N = 10`) |
| 13 | +- `StringLiteral` (e.g., `VALUE = 'literal'`) |
| 14 | + |
| 15 | +But need to support: |
| 16 | +- Parameters: `@parameter` |
| 17 | +- Variables: `@variable` |
| 18 | +- Column references: `table.column` |
| 19 | +- Outer references: `outerref.column` |
| 20 | +- Function calls: `FUNCTION(args)` |
| 21 | +- Computed expressions: `value + 1` |
| 22 | + |
| 23 | +### ⚠️ Critical Warning: Avoid Modifying Shared Grammar Rules |
| 24 | + |
| 25 | +**DO NOT** modify existing shared grammar rules like `identifierColumnReferenceExpression` that are used throughout the codebase. This can cause unintended side effects and break other functionality. |
| 26 | + |
| 27 | +**Instead**, create specialized rules for your specific context. |
| 28 | + |
| 29 | +### Solution Template |
| 30 | + |
| 31 | +#### Step 1: Update AST Definition (`Ast.xml`) |
| 32 | +```xml |
| 33 | +<!-- Before: --> |
| 34 | +<Member Name="PropertyName" Type="IntegerLiteral" Summary="Description" /> |
| 35 | + |
| 36 | +<!-- After: --> |
| 37 | +<Member Name="PropertyName" Type="ScalarExpression" Summary="Description" /> |
| 38 | +``` |
| 39 | + |
| 40 | +#### Step 2: Create Context-Specific Grammar Rule (`TSql*.g`) |
| 41 | +```antlr |
| 42 | +// Create a specialized rule for your context |
| 43 | +yourContextColumnReferenceExpression returns [ColumnReferenceExpression vResult = this.FragmentFactory.CreateFragment<ColumnReferenceExpression>()] |
| 44 | +{ |
| 45 | + MultiPartIdentifier vMultiPartIdentifier; |
| 46 | +} |
| 47 | + : |
| 48 | + vMultiPartIdentifier=multiPartIdentifier[2] // Allows table.column syntax |
| 49 | + { |
| 50 | + vResult.ColumnType = ColumnType.Regular; |
| 51 | + vResult.MultiPartIdentifier = vMultiPartIdentifier; |
| 52 | + } |
| 53 | + ; |
| 54 | +
|
| 55 | +// Use the specialized rule in your custom grammar |
| 56 | +yourContextParameterRule returns [ScalarExpression vResult] |
| 57 | + : vResult=signedInteger |
| 58 | + | vResult=variable |
| 59 | + | vResult=yourContextColumnReferenceExpression // Context-specific rule |
| 60 | + ; |
| 61 | +``` |
| 62 | + |
| 63 | +#### Step 3: Verify Script Generator |
| 64 | +Most script generators using `GenerateNameEqualsValue()` or similar methods work automatically with `ScalarExpression`. No changes typically needed. |
| 65 | + |
| 66 | +#### Step 4: Add Test Coverage |
| 67 | +```sql |
| 68 | +-- Test parameter |
| 69 | +FUNCTION_NAME(PARAM = @parameter) |
| 70 | + |
| 71 | +-- Test outer reference |
| 72 | +FUNCTION_NAME(PARAM = outerref.column) |
| 73 | + |
| 74 | +-- Test computed expression |
| 75 | +FUNCTION_NAME(PARAM = value + 1) |
| 76 | +``` |
| 77 | + |
| 78 | +### Real-World Example: VECTOR_SEARCH TOP_N |
| 79 | + |
| 80 | +**Problem**: `VECTOR_SEARCH` TOP_N parameter only accepted integer literals. |
| 81 | + |
| 82 | +**❌ Wrong Approach**: Modify `identifierColumnReferenceExpression` to use `multiPartIdentifier[2]` |
| 83 | +- **Result**: Broke `CreateIndexStatementErrorTest` because other grammar rules started accepting invalid syntax |
| 84 | + |
| 85 | +**✅ Correct Approach**: Create `vectorSearchColumnReferenceExpression` specialized for VECTOR_SEARCH |
| 86 | +- **Result**: VECTOR_SEARCH supports multi-part identifiers without affecting other functionality |
| 87 | + |
| 88 | +**Final Implementation**: |
| 89 | +```antlr |
| 90 | +signedIntegerOrVariableOrColumnReference returns [ScalarExpression vResult] |
| 91 | + : vResult=signedInteger |
| 92 | + | vResult=variable |
| 93 | + | vResult=vectorSearchColumnReferenceExpression // VECTOR_SEARCH-specific rule |
| 94 | + ; |
| 95 | +
|
| 96 | +vectorSearchColumnReferenceExpression returns [ColumnReferenceExpression vResult = ...] |
| 97 | + : |
| 98 | + vMultiPartIdentifier=multiPartIdentifier[2] // Allows table.column syntax |
| 99 | + { |
| 100 | + vResult.ColumnType = ColumnType.Regular; |
| 101 | + vResult.MultiPartIdentifier = vMultiPartIdentifier; |
| 102 | + } |
| 103 | + ; |
| 104 | +``` |
| 105 | + |
| 106 | +**Result**: Now supports dynamic TOP_N values: |
| 107 | +```sql |
| 108 | +-- Parameters |
| 109 | +VECTOR_SEARCH(..., TOP_N = @k) AS ann |
| 110 | + |
| 111 | +-- Outer references |
| 112 | +VECTOR_SEARCH(..., TOP_N = outerref.max_results) AS ann |
| 113 | +``` |
| 114 | + |
| 115 | +## Pattern 2: Adding New Enum Members |
| 116 | + |
| 117 | +### When to Use |
| 118 | +When adding new operators, keywords, or options to existing constructs. |
| 119 | + |
| 120 | +### Solution Template |
| 121 | + |
| 122 | +#### Step 1: Update Enum in AST (`Ast.xml`) |
| 123 | +```xml |
| 124 | +<Enum Name="ExistingEnumType"> |
| 125 | + <Member Name="ExistingValue1" /> |
| 126 | + <Member Name="ExistingValue2" /> |
| 127 | + <Member Name="NewValue" /> <!-- Add this --> |
| 128 | +</Enum> |
| 129 | +``` |
| 130 | + |
| 131 | +#### Step 2: Update Grammar Rule (`TSql*.g`) |
| 132 | +```antlr |
| 133 | +// Add new token matching |
| 134 | +| tNewValue:Identifier |
| 135 | +{ |
| 136 | + Match(tNewValue, CodeGenerationSupporter.NewValue); |
| 137 | + vResult.EnumProperty = ExistingEnumType.NewValue; |
| 138 | +} |
| 139 | +``` |
| 140 | + |
| 141 | +#### Step 3: Update Script Generator |
| 142 | +```csharp |
| 143 | +// Add mapping in appropriate generator file |
| 144 | +private static readonly Dictionary<EnumType, string> _enumGenerators = |
| 145 | + new Dictionary<EnumType, string>() |
| 146 | +{ |
| 147 | + { EnumType.ExistingValue1, CodeGenerationSupporter.ExistingValue1 }, |
| 148 | + { EnumType.ExistingValue2, CodeGenerationSupporter.ExistingValue2 }, |
| 149 | + { EnumType.NewValue, CodeGenerationSupporter.NewValue }, // Add this |
| 150 | +}; |
| 151 | +``` |
| 152 | + |
| 153 | +## Pattern 3: Adding New Function or Statement |
| 154 | + |
| 155 | +### When to Use |
| 156 | +When adding completely new T-SQL functions or statements. |
| 157 | + |
| 158 | +### Solution Template |
| 159 | + |
| 160 | +#### Step 1: Define AST Node (`Ast.xml`) |
| 161 | +```xml |
| 162 | +<Class Name="NewFunctionCall" Base="PrimaryExpression"> |
| 163 | + <Member Name="Parameter1" Type="ScalarExpression" /> |
| 164 | + <Member Name="Parameter2" Type="StringLiteral" /> |
| 165 | +</Class> |
| 166 | +``` |
| 167 | + |
| 168 | +#### Step 2: Add Grammar Rule (`TSql*.g`) |
| 169 | +```antlr |
| 170 | +newFunctionCall returns [NewFunctionCall vResult = FragmentFactory.CreateFragment<NewFunctionCall>()] |
| 171 | +{ |
| 172 | + ScalarExpression vParam1; |
| 173 | + StringLiteral vParam2; |
| 174 | +} |
| 175 | + : |
| 176 | + tFunction:Identifier LeftParenthesis |
| 177 | + { |
| 178 | + Match(tFunction, CodeGenerationSupporter.NewFunction); |
| 179 | + UpdateTokenInfo(vResult, tFunction); |
| 180 | + } |
| 181 | + vParam1 = expression |
| 182 | + { |
| 183 | + vResult.Parameter1 = vParam1; |
| 184 | + } |
| 185 | + Comma vParam2 = stringLiteral |
| 186 | + { |
| 187 | + vResult.Parameter2 = vParam2; |
| 188 | + } |
| 189 | + RightParenthesis |
| 190 | + ; |
| 191 | +``` |
| 192 | + |
| 193 | +#### Step 3: Integrate with Existing Rules |
| 194 | +Add the new rule to appropriate places in the grammar (e.g., `functionCall`, `primaryExpression`, etc.). |
| 195 | + |
| 196 | +#### Step 4: Create Script Generator |
| 197 | +```csharp |
| 198 | +public override void ExplicitVisit(NewFunctionCall node) |
| 199 | +{ |
| 200 | + GenerateIdentifier(CodeGenerationSupporter.NewFunction); |
| 201 | + GenerateSymbol(TSqlTokenType.LeftParenthesis); |
| 202 | + GenerateFragmentIfNotNull(node.Parameter1); |
| 203 | + GenerateSymbol(TSqlTokenType.Comma); |
| 204 | + GenerateFragmentIfNotNull(node.Parameter2); |
| 205 | + GenerateSymbol(TSqlTokenType.RightParenthesis); |
| 206 | +} |
| 207 | +``` |
| 208 | + |
| 209 | +## Best Practices |
| 210 | + |
| 211 | +### 1. Backward Compatibility |
| 212 | +- Always ensure existing syntax continues to work |
| 213 | +- Extend rather than replace existing rules |
| 214 | +- Test both old and new syntax |
| 215 | + |
| 216 | +### 2. Testing Strategy |
| 217 | +- Add comprehensive test cases in `TestScripts/` |
| 218 | +- Update baseline files with expected output |
| 219 | +- Test edge cases and error conditions |
| 220 | + |
| 221 | +### 3. Documentation |
| 222 | +- Update grammar comments with new syntax |
| 223 | +- Add examples in code comments |
| 224 | +- Document any limitations or requirements |
| 225 | + |
| 226 | +### 4. Version Targeting |
| 227 | +- Add new features to the appropriate SQL Server version grammar |
| 228 | +- Consider whether feature should be backported to earlier versions |
| 229 | +- Update all relevant grammar files if syntax is version-independent |
| 230 | + |
| 231 | +## Common Pitfalls |
| 232 | + |
| 233 | +### 1. Forgetting Script Generator Updates |
| 234 | +- Grammar changes often require corresponding script generator changes |
| 235 | +- Test the round-trip: parse → generate → parse again |
| 236 | + |
| 237 | +### 2. Incomplete Test Coverage |
| 238 | +- Test all supported expression types when extending to `ScalarExpression` |
| 239 | +- Include error cases and boundary conditions |
| 240 | + |
| 241 | +### 3. Missing Version Updates |
| 242 | +- New syntax should be added to all relevant grammar versions |
| 243 | +- Consider SQL Server version compatibility |
| 244 | + |
| 245 | +### 4. AST Design Issues |
| 246 | +- Choose appropriate base classes for new AST nodes |
| 247 | +- Consider reusing existing AST patterns where possible |
| 248 | +- Ensure proper inheritance hierarchy |
| 249 | + |
| 250 | +## Reference Examples |
| 251 | + |
| 252 | +- **VECTOR_SEARCH TOP_N Extension**: Literal to expression pattern |
| 253 | +- **REGEXP_LIKE Predicate**: Boolean parentheses recognition pattern |
| 254 | +- **EVENT SESSION Predicates**: Function-style vs operator-style predicates |
| 255 | + |
| 256 | +For detailed step-by-step examples, see [BUG_FIXING_GUIDE.md](BUG_FIXING_GUIDE.md). |
0 commit comments