Skip to content

Commit 52523ab

Browse files
authored
Fixes a bug in VECTOR_SEARCH TopN should also accept parameters and outer references (#167)
1 parent cd170a8 commit 52523ab

File tree

8 files changed

+454
-12
lines changed

8 files changed

+454
-12
lines changed

.github/BUG_FIXING_GUIDE.md

Lines changed: 102 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,29 +16,126 @@ The process of fixing a bug, especially one that involves adding new syntax, fol
1616
3. **Script Generation Update**:
1717
* Update the script generator to handle the new AST node or enum. This typically involves modifying files in `SqlScriptDom/ScriptDom/SqlServer/ScriptGenerator/`. For the `NOT LIKE` example, this meant adding an entry to the `_booleanComparisonTypeGenerators` dictionary in `SqlScriptGeneratorVisitor.CommonPhrases.cs`.
1818

19-
4. **Build the Project**:
19+
5. **Build the Project**:
2020
* After making code changes, run a build to regenerate the parser and ensure everything compiles correctly:
2121
```bash
2222
dotnet build
2323
```
2424

25-
5. **Add a Unit Test**:
25+
6. **Add a Unit Test**:
2626
* Create a new `.sql` file in `Test/SqlDom/TestScripts/` that contains the specific syntax for the new test case.
2727

28-
6. **Define the Test Case**:
28+
7. **Define the Test Case**:
2929
* Add a new `ParserTest` entry to the appropriate `Only<version>SyntaxTests.cs` files (e.g., `Only130SyntaxTests.cs`). This entry points to your new test script and defines the expected number of parsing errors for each SQL Server version.
3030

31-
7. **Generate and Verify Baselines**:
31+
8. **Generate and Verify Baselines**:
3232
This is a critical and multi-step process:
3333
* **a. Create Placeholder Baseline Files**: Before running the test, create empty or placeholder baseline files in the corresponding `Test/SqlDom/Baselines<version>/` directories. The filename must match the test script's filename.
3434
* **b. Run the Test to Get the Generated Script**: Run the specific test that you just added. It is *expected to fail* because the placeholder baseline will not match the script generated by the parser.
3535
```bash
3636
# Example filter for running a specific test
3737
dotnet test --filter "FullyQualifiedName~YourTestMethodName"
3838
```
39-
* **c. Update the Baseline Files**: Copy the "Actual" output from the test failure log. This is the correctly formatted script generated from the AST. Paste this content into all the baseline files you created in step 7a.
39+
* **c. Update the Baseline Files**: Copy the "Actual" output from the test failure log. This is the correctly formatted script generated from the AST. Paste this content into all the baseline files you created in step 8a.
4040
* **d. Re-run the Tests**: Run the same test command again. This time, the tests should pass, confirming that the generated script matches the new baseline.
4141
42+
9. **⚠️ CRITICAL: Run Full Test Suite**:
43+
* **Always run the complete test suite** to ensure your changes didn't break existing functionality:
44+
```bash
45+
dotnet test Test/SqlDom/UTSqlScriptDom.csproj -c Debug
46+
```
47+
* **Why this is critical**: Grammar changes can have unintended side effects on other parts of the parser. Shared grammar rules are used in multiple contexts.
48+
* **Common issues**: Modifying shared rules like `identifierColumnReferenceExpression` can cause other tests to fail because they now accept syntax that should be rejected.
49+
* **Solution**: If tests fail, create context-specific grammar rules instead of modifying shared ones.
50+
51+
By following these steps, you can ensure that new syntax is correctly parsed, represented in the AST, generated back into a script, and fully validated by the testing framework without breaking existing functionality.
52+
53+
## Special Case: Extending Grammar Rules from Literals to Expressions
54+
55+
A common type of bug involves extending existing grammar rules that only accept literal values (like integers or strings) to accept full expressions (parameters, variables, outer references, etc.). This pattern was used to fix the VECTOR_SEARCH TOP_N parameter issue.
56+
57+
### Example: VECTOR_SEARCH TOP_N Parameter Extension
58+
59+
**Problem**: VECTOR_SEARCH's TOP_N parameter only accepted integer literals (`TOP_N = 10`) but needed to support parameters (`TOP_N = @k`) and outer references (`TOP_N = outerref.col`).
60+
61+
**Solution Steps**:
62+
63+
1. **Update AST Definition** (`SqlScriptDom/Parser/TSql/Ast.xml`):
64+
```xml
65+
<!-- Before: -->
66+
<Member Name="TopN" Type="IntegerLiteral" Summary="..." />
67+
68+
<!-- After: -->
69+
<Member Name="TopN" Type="ScalarExpression" Summary="..." />
70+
```
71+
72+
2. **Update Grammar Rule** (`SqlScriptDom/Parser/TSql/TSql170.g`):
73+
```antlr
74+
// Before - Variable declaration:
75+
IntegerLiteral vTopN;
76+
77+
// After - Variable declaration:
78+
ScalarExpression vTopN;
79+
80+
// Before - Parsing rule:
81+
vTopN = integer
82+
83+
// After - Parsing rule:
84+
vTopN = expression
85+
```
86+
87+
3. **Script Generator**: Usually no changes needed if using `GenerateNameEqualsValue()` or similar generic methods that work with `TSqlFragment`.
88+
89+
4. **Test Cases**: Add tests covering the new expression types:
90+
```sql
91+
-- Parameter test
92+
TOP_N = @k
93+
94+
-- Outer reference test
95+
TOP_N = outerref.max_results
96+
```
97+
98+
### When to Apply This Pattern
99+
100+
Use this pattern when:
101+
- ✅ Existing grammar accepts only literal values (integer, string, etc.)
102+
- ✅ Users need to pass dynamic values (parameters, variables, computed expressions)
103+
- ✅ The SQL Server syntax actually supports expressions in that position
104+
- ✅ Backward compatibility must be maintained (literals still work)
105+
106+
### ⚠️ Critical Warning: Shared Grammar Rules
107+
108+
**DO NOT modify shared grammar rules** like `identifierColumnReferenceExpression` that are used throughout the codebase. This can cause:
109+
- ✅ Other tests to fail unexpectedly
110+
- ✅ Unintended syntax acceptance in different contexts
111+
- ✅ Breaking changes in existing functionality
112+
113+
**Instead, create specialized rules** for your specific use case:
114+
```antlr
115+
// ❌ WRONG: Modifying shared rule
116+
identifierColumnReferenceExpression: multiPartIdentifier[2] // Affects ALL usage
117+
118+
// ✅ CORRECT: Create specialized rule
119+
vectorSearchColumnReferenceExpression: multiPartIdentifier[2] // Only for VECTOR_SEARCH
120+
```
121+
122+
### Common Expression Types to Support
123+
124+
When extending from literals to expressions, consider supporting:
125+
- **Parameters**: `@parameter`
126+
- **Variables**: `@variable`
127+
- **Column references**: `table.column`
128+
- **Outer references**: `outerref.column`
129+
- **Function calls**: `FUNCTION(args)`
130+
- **Arithmetic expressions**: `value + 1`
131+
- **Case expressions**: `CASE WHEN ... END`
132+
133+
### Files Typically Modified
134+
135+
1. **`Ast.xml`**: Change member type from specific literal to `ScalarExpression`
136+
2. **`TSql*.g`**: Update variable declarations and parsing rules
137+
3. **Test files**: Add comprehensive test coverage
138+
4. **Script generators**: Usually no changes needed for well-designed generators
42139
By following these steps, you can ensure that new syntax is correctly parsed, represented in the AST, generated back into a script, and fully validated by the testing framework.
43140
44141
## Special Case: Parser Predicate Recognition Issues
Lines changed: 256 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,256 @@
1+
# Grammar Extension Patterns for SqlScriptDOM
2+
3+
This guide documents common patterns for extending the SqlScriptDOM parser grammar to support new syntax or enhance existing functionality.
4+
5+
## Pattern 1: Extending Literals to Expressions
6+
7+
### When to Use
8+
When existing grammar rules only accept literal values but need to support dynamic expressions like parameters, variables, or computed values.
9+
10+
### Example Problem
11+
Functions or constructs that currently accept only:
12+
- `IntegerLiteral` (e.g., `TOP_N = 10`)
13+
- `StringLiteral` (e.g., `VALUE = 'literal'`)
14+
15+
But need to support:
16+
- Parameters: `@parameter`
17+
- Variables: `@variable`
18+
- Column references: `table.column`
19+
- Outer references: `outerref.column`
20+
- Function calls: `FUNCTION(args)`
21+
- Computed expressions: `value + 1`
22+
23+
### ⚠️ Critical Warning: Avoid Modifying Shared Grammar Rules
24+
25+
**DO NOT** modify existing shared grammar rules like `identifierColumnReferenceExpression` that are used throughout the codebase. This can cause unintended side effects and break other functionality.
26+
27+
**Instead**, create specialized rules for your specific context.
28+
29+
### Solution Template
30+
31+
#### Step 1: Update AST Definition (`Ast.xml`)
32+
```xml
33+
<!-- Before: -->
34+
<Member Name="PropertyName" Type="IntegerLiteral" Summary="Description" />
35+
36+
<!-- After: -->
37+
<Member Name="PropertyName" Type="ScalarExpression" Summary="Description" />
38+
```
39+
40+
#### Step 2: Create Context-Specific Grammar Rule (`TSql*.g`)
41+
```antlr
42+
// Create a specialized rule for your context
43+
yourContextColumnReferenceExpression returns [ColumnReferenceExpression vResult = this.FragmentFactory.CreateFragment<ColumnReferenceExpression>()]
44+
{
45+
MultiPartIdentifier vMultiPartIdentifier;
46+
}
47+
:
48+
vMultiPartIdentifier=multiPartIdentifier[2] // Allows table.column syntax
49+
{
50+
vResult.ColumnType = ColumnType.Regular;
51+
vResult.MultiPartIdentifier = vMultiPartIdentifier;
52+
}
53+
;
54+
55+
// Use the specialized rule in your custom grammar
56+
yourContextParameterRule returns [ScalarExpression vResult]
57+
: vResult=signedInteger
58+
| vResult=variable
59+
| vResult=yourContextColumnReferenceExpression // Context-specific rule
60+
;
61+
```
62+
63+
#### Step 3: Verify Script Generator
64+
Most script generators using `GenerateNameEqualsValue()` or similar methods work automatically with `ScalarExpression`. No changes typically needed.
65+
66+
#### Step 4: Add Test Coverage
67+
```sql
68+
-- Test parameter
69+
FUNCTION_NAME(PARAM = @parameter)
70+
71+
-- Test outer reference
72+
FUNCTION_NAME(PARAM = outerref.column)
73+
74+
-- Test computed expression
75+
FUNCTION_NAME(PARAM = value + 1)
76+
```
77+
78+
### Real-World Example: VECTOR_SEARCH TOP_N
79+
80+
**Problem**: `VECTOR_SEARCH` TOP_N parameter only accepted integer literals.
81+
82+
**❌ Wrong Approach**: Modify `identifierColumnReferenceExpression` to use `multiPartIdentifier[2]`
83+
- **Result**: Broke `CreateIndexStatementErrorTest` because other grammar rules started accepting invalid syntax
84+
85+
**✅ Correct Approach**: Create `vectorSearchColumnReferenceExpression` specialized for VECTOR_SEARCH
86+
- **Result**: VECTOR_SEARCH supports multi-part identifiers without affecting other functionality
87+
88+
**Final Implementation**:
89+
```antlr
90+
signedIntegerOrVariableOrColumnReference returns [ScalarExpression vResult]
91+
: vResult=signedInteger
92+
| vResult=variable
93+
| vResult=vectorSearchColumnReferenceExpression // VECTOR_SEARCH-specific rule
94+
;
95+
96+
vectorSearchColumnReferenceExpression returns [ColumnReferenceExpression vResult = ...]
97+
:
98+
vMultiPartIdentifier=multiPartIdentifier[2] // Allows table.column syntax
99+
{
100+
vResult.ColumnType = ColumnType.Regular;
101+
vResult.MultiPartIdentifier = vMultiPartIdentifier;
102+
}
103+
;
104+
```
105+
106+
**Result**: Now supports dynamic TOP_N values:
107+
```sql
108+
-- Parameters
109+
VECTOR_SEARCH(..., TOP_N = @k) AS ann
110+
111+
-- Outer references
112+
VECTOR_SEARCH(..., TOP_N = outerref.max_results) AS ann
113+
```
114+
115+
## Pattern 2: Adding New Enum Members
116+
117+
### When to Use
118+
When adding new operators, keywords, or options to existing constructs.
119+
120+
### Solution Template
121+
122+
#### Step 1: Update Enum in AST (`Ast.xml`)
123+
```xml
124+
<Enum Name="ExistingEnumType">
125+
<Member Name="ExistingValue1" />
126+
<Member Name="ExistingValue2" />
127+
<Member Name="NewValue" /> <!-- Add this -->
128+
</Enum>
129+
```
130+
131+
#### Step 2: Update Grammar Rule (`TSql*.g`)
132+
```antlr
133+
// Add new token matching
134+
| tNewValue:Identifier
135+
{
136+
Match(tNewValue, CodeGenerationSupporter.NewValue);
137+
vResult.EnumProperty = ExistingEnumType.NewValue;
138+
}
139+
```
140+
141+
#### Step 3: Update Script Generator
142+
```csharp
143+
// Add mapping in appropriate generator file
144+
private static readonly Dictionary<EnumType, string> _enumGenerators =
145+
new Dictionary<EnumType, string>()
146+
{
147+
{ EnumType.ExistingValue1, CodeGenerationSupporter.ExistingValue1 },
148+
{ EnumType.ExistingValue2, CodeGenerationSupporter.ExistingValue2 },
149+
{ EnumType.NewValue, CodeGenerationSupporter.NewValue }, // Add this
150+
};
151+
```
152+
153+
## Pattern 3: Adding New Function or Statement
154+
155+
### When to Use
156+
When adding completely new T-SQL functions or statements.
157+
158+
### Solution Template
159+
160+
#### Step 1: Define AST Node (`Ast.xml`)
161+
```xml
162+
<Class Name="NewFunctionCall" Base="PrimaryExpression">
163+
<Member Name="Parameter1" Type="ScalarExpression" />
164+
<Member Name="Parameter2" Type="StringLiteral" />
165+
</Class>
166+
```
167+
168+
#### Step 2: Add Grammar Rule (`TSql*.g`)
169+
```antlr
170+
newFunctionCall returns [NewFunctionCall vResult = FragmentFactory.CreateFragment<NewFunctionCall>()]
171+
{
172+
ScalarExpression vParam1;
173+
StringLiteral vParam2;
174+
}
175+
:
176+
tFunction:Identifier LeftParenthesis
177+
{
178+
Match(tFunction, CodeGenerationSupporter.NewFunction);
179+
UpdateTokenInfo(vResult, tFunction);
180+
}
181+
vParam1 = expression
182+
{
183+
vResult.Parameter1 = vParam1;
184+
}
185+
Comma vParam2 = stringLiteral
186+
{
187+
vResult.Parameter2 = vParam2;
188+
}
189+
RightParenthesis
190+
;
191+
```
192+
193+
#### Step 3: Integrate with Existing Rules
194+
Add the new rule to appropriate places in the grammar (e.g., `functionCall`, `primaryExpression`, etc.).
195+
196+
#### Step 4: Create Script Generator
197+
```csharp
198+
public override void ExplicitVisit(NewFunctionCall node)
199+
{
200+
GenerateIdentifier(CodeGenerationSupporter.NewFunction);
201+
GenerateSymbol(TSqlTokenType.LeftParenthesis);
202+
GenerateFragmentIfNotNull(node.Parameter1);
203+
GenerateSymbol(TSqlTokenType.Comma);
204+
GenerateFragmentIfNotNull(node.Parameter2);
205+
GenerateSymbol(TSqlTokenType.RightParenthesis);
206+
}
207+
```
208+
209+
## Best Practices
210+
211+
### 1. Backward Compatibility
212+
- Always ensure existing syntax continues to work
213+
- Extend rather than replace existing rules
214+
- Test both old and new syntax
215+
216+
### 2. Testing Strategy
217+
- Add comprehensive test cases in `TestScripts/`
218+
- Update baseline files with expected output
219+
- Test edge cases and error conditions
220+
221+
### 3. Documentation
222+
- Update grammar comments with new syntax
223+
- Add examples in code comments
224+
- Document any limitations or requirements
225+
226+
### 4. Version Targeting
227+
- Add new features to the appropriate SQL Server version grammar
228+
- Consider whether feature should be backported to earlier versions
229+
- Update all relevant grammar files if syntax is version-independent
230+
231+
## Common Pitfalls
232+
233+
### 1. Forgetting Script Generator Updates
234+
- Grammar changes often require corresponding script generator changes
235+
- Test the round-trip: parse → generate → parse again
236+
237+
### 2. Incomplete Test Coverage
238+
- Test all supported expression types when extending to `ScalarExpression`
239+
- Include error cases and boundary conditions
240+
241+
### 3. Missing Version Updates
242+
- New syntax should be added to all relevant grammar versions
243+
- Consider SQL Server version compatibility
244+
245+
### 4. AST Design Issues
246+
- Choose appropriate base classes for new AST nodes
247+
- Consider reusing existing AST patterns where possible
248+
- Ensure proper inheritance hierarchy
249+
250+
## Reference Examples
251+
252+
- **VECTOR_SEARCH TOP_N Extension**: Literal to expression pattern
253+
- **REGEXP_LIKE Predicate**: Boolean parentheses recognition pattern
254+
- **EVENT SESSION Predicates**: Function-style vs operator-style predicates
255+
256+
For detailed step-by-step examples, see [BUG_FIXING_GUIDE.md](BUG_FIXING_GUIDE.md).

0 commit comments

Comments
 (0)