-
Notifications
You must be signed in to change notification settings - Fork 181
Use Calcite's validation system for type checking & coercion #4892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughIntroduces a Calcite-based validation pipeline with implicit type coercion for PPL: converts RelNode→SqlNode, validates with a new PplValidator and type coercion, then SqlNode→RelNode. Adds dialect, convertlet table, converters, shuttles, operator table provider, and type utilities. Updates UDF operand metadata, removes legacy coercion/type-checkers, and revises tests/docs. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant QueryService
participant RelToSqlRel as PplRelToSqlRelShuttle
participant RelToSql as PplRelToSqlNodeConverter
participant Validator as PplValidator(+PplTypeCoercion)
participant SqlRewrite as SqlRewriteShuttle
participant SqlToRel as PplSqlToRelConverter
participant Optimizer
Client->>QueryService: executeWithCalcite(RelNode)
QueryService->>RelToSqlRel: prepare RelNode (normalize literals/intervals)
RelToSqlRel-->>QueryService: RelNode'
QueryService->>RelToSql: convert RelNode' -> SqlNode
RelToSql-->>QueryService: SqlNode
QueryService->>SqlRewrite: rewrite (COUNT()->COUNT(*), IN tuple, unqualify)
SqlRewrite-->>QueryService: SqlNode'
QueryService->>Validator: validate(SqlNode') with convertlets/dialect
Validator-->>QueryService: Validated SqlNode'' (implicit CASTs applied)
QueryService->>SqlToRel: convert back to RelNode (SEMI/ANTI join fixups)
SqlToRel-->>QueryService: RelNode''
QueryService->>Optimizer: optimize(RelNode'')
Optimizer-->>QueryService: Physical plan
QueryService-->>Client: Result/plan
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
e085f81 to
fc6dd27
Compare
…checking Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> # Conflicts: # core/src/main/java/org/opensearch/sql/executor/QueryService.java # Conflicts: # core/src/main/java/org/opensearch/sql/executor/QueryService.java
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
… logics Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- 2 more ITs passed in PPLBuiltinFunctionIT Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> # Conflicts: # core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
- this fix testRand, where desrialization of sarg does not restore its type - todo: update the toRex in ExtendedRelJson to the align with the latest version Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…estamp; (time, timestamp) -> timestamp (1240/1599) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…2/1872) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- allow type cast - rewrite call to sql compare to custom ip comapre Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> # Conflicts: # core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…1356/1599 | 1476/1915) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…d in mvindex's implementation (1580/2015) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…iting (1579/2015) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…pe hint Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…e inference (1701/2015) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
| // TODO: Refactor this match method | ||
| public boolean match(FunctionName functionName, List<RelDataType> argTypes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does match() method is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's no longer used. I removed it.
| * internally. | ||
| */ | ||
| private final ImmutableMap<BuiltinFunctionName, List<Pair<CalciteFuncSignature, FunctionImp>>> | ||
| private final ImmutableMap<BuiltinFunctionName, Pair<CalciteFuncSignature, FunctionImp>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is a FunctionRegistry required? It is needed because CalciteRelNodeVisitor must resolve functions during analysis and translation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's still used when building logical plans from PPL's AST. yes, it's used in CalviteRelNodeVisitor. Now it's only name-based resolve
| import org.opensearch.sql.calcite.utils.OpenSearchTypeFactory; | ||
|
|
||
| @UtilityClass | ||
| public class ValidationUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add comments/javadoc if some function is copy from Calcite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I annotated every method I copied from Calcite's implementation
| * }</pre> | ||
| */ | ||
| @Override | ||
| protected Result visitAntiOrSemiJoin(Join e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add qualifier can also solve the issue? e.g. t0.name. Then we can avoid the ANTI JOIN visit in PplSqlToRelConverter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, adding a qualifier does solve the problem.
RCA
I investigated this query: source = opensearch-sql_test_index_state_country | where country = 'Canada' OR country = 'England' | left semi join left=a, right=b ON a.name = b.name opensearch-sql_test_index_occupation | sort a.age. It's original logical plan from CalciteRelNodeVisitor is:
LogicalSort(sort0=[$5], dir0=[ASC-nulls-first])
LogicalJoin(condition=[=($0, $6)], joinType=[semi])
LogicalProject(name=[$0], country=[$1], state=[$2], month=[$3], year=[$4], age=[$5])
LogicalFilter(condition=[SEARCH($1, Sarg['Canada':VARCHAR, 'England':VARCHAR]:VARCHAR)])
CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_state_country]])
LogicalProject(name=[$0], country=[$1], occupation=[$2], month=[$3], salary=[$4], year=[$5])
CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_occupation]])
Its SQL is however
SELECT `name`, `country`, `state`, `month`, `year`, `age`
FROM `OpenSearch`.`opensearch-sql_test_index_state_country` AS `t0`
WHERE `country` IN ('Canada', 'England') AND EXISTS (SELECT 1
FROM (SELECT `name`, `country`, `occupation`, `month`, `salary`, `year`
FROM `OpenSearch`.`opensearch-sql_test_index_occupation`) AS `t1`
WHERE `name` = `t1`.`name`) -- left `name` is unqualified
ORDER BY `age`Why is the first name unqualified
In RelToSqlConverter, the condition is initially created with qualifier WHERE t0.name = t1.name:
SqlNode sqlCondition =
convertConditionToSqlNode(e.getCondition(), leftContext, rightContext);The following lines removes the qualifier because it's attempting to create a correlated subquery, and in standard SQL, outer query references in correlated subqueries are typically unqualified.
if (leftResult.neededAlias != null) {
SqlShuttle visitor =
new AliasReplacementShuttle(leftResult.neededAlias,
e.getLeft().getRowType(), sqlSelect.getSelectList());
sqlCondition = sqlCondition.accept(visitor);
}The visitor replaces the identifiers whose first qualifier equals to leftResult.neededAlias:
@Override public SqlNode visit(SqlIdentifier id) {
...
if (tableAlias.equals(id.names.get(0))) {
int index =
requireNonNull(tableType.getField(id.names.get(1), false, false),
() -> "field " + id.names.get(1) + " is not found in "
+ tableType)
.getIndex();
SqlNode selectItem = source.get(index);
if (selectItem.getKind() == SqlKind.AS) {
selectItem = ((SqlCall) selectItem).operand(0);
}
return selectItem.clone(id.getParserPosition());
...
}This replacement shuttle is there because the alias of the outer query t0 does not exist under some conditions. However, SQL's name resolution rules resolve the unqualified name to the innermost scope first, and there happen to exist such name in the innermost scope, so the name resolution is falsely resolved to t1.name == t1.name in the end.
Resolution
It can be resolved by either removing the replacement shuttle or enabling ANTI / SEMI join syntax. But the former solution risks encountering an non-existing t0 alias if the following condition is not met:
if (leftResult.neededAlias != null && sqlSelect.getFrom() != null) {
sqlSelect.setFrom(as(sqlSelect.getFrom(), leftResult.neededAlias));
}(Note: the above condition is updated by CALCITE-3810: Render ANTI and SEMI join to NOT EXISTS and EXISTS in the JDBC adapter. The latest update may have resolved this problem, but the update is not included in Calcite 1.41)
I think adopting the second solution, enabling ANTI/SEMI join, is better:
-
It's safer because we will not run into trouble when the above condition is not met.
-
Spark SQL supports ANTI and SEMI JOIN
-
As we are creating ANTI / SEMI JOIN in CalciteRelNodeVisitor already, it will be more consistent if the logical plan translated back also contains them.
sql/core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Lines 1288 to 1293 in d0195cb
if (node.getJoinType() == SEMI || node.getJoinType() == ANTI) { // semi and anti join only return left table outputs context.relBuilder.join( JoinAndLookupUtils.translateJoinType(node.getJoinType()), joinCondition); return context.relBuilder.peek(); }
| * }</pre> | ||
| */ | ||
| @Override | ||
| protected Result visitAntiOrSemiJoin(Join e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add qualifier can also solve the issue? e.g. t0.name. Then we can avoid the ANTI JOIN visit in PplSqlToRelConverter?
| @Override | ||
| public SqlKind getKind() { | ||
| return SqlKind.DIVIDE; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to enable operator-dependent type derivation.
| * internally. | ||
| */ | ||
| private final ImmutableMap<BuiltinFunctionName, List<Pair<CalciteFuncSignature, FunctionImp>>> | ||
| private final ImmutableMap<BuiltinFunctionName, Pair<CalciteFuncSignature, FunctionImp>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's still used when building logical plans from PPL's AST. yes, it's used in CalviteRelNodeVisitor. Now it's only name-based resolve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)
402-429: Document the intentional re-registration design or prevent duplicate registrations at initialization.The code logs warnings when functions are registered multiple times but continues with silent overwrites. Since
OpenSearchExecutionEngineis instantiated per-request and callsregisterOpenSearchFunctions()in its constructor, external functions (GEOIP, DISTINCT_COUNT_APPROX) are re-registered on every request, generating repeated warnings and making actual configuration errors difficult to detect.Either:
- Document that re-registration is intentional for plugin override scenarios
- Ensure functions are registered only once during startup (e.g., in a static initializer or plugin loader rather than per-request)
- Make duplicate registrations an error to fail fast on configuration mistakes
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.javacore/src/main/java/org/opensearch/sql/executor/QueryService.javacore/src/main/java/org/opensearch/sql/expression/function/CalciteFuncSignature.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
💤 Files with no reviewable changes (1)
- core/src/main/java/org/opensearch/sql/expression/function/CalciteFuncSignature.java
🧰 Additional context used
📓 Path-based instructions (4)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.javacore/src/main/java/org/opensearch/sql/executor/QueryService.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.javacore/src/main/java/org/opensearch/sql/executor/QueryService.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
core/src/main/java/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit
Files:
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.javacore/src/main/java/org/opensearch/sql/executor/QueryService.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing Calcite integration patterns
- Verify RelNode visitor implementations are complete
- Check RexNode handling follows project conventions
- Validate SQL generation is correct and optimized
- Ensure Calcite version compatibility
- Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
core/src/main/java/org/opensearch/sql/expression/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/org/opensearch/sql/expression/**/*.java: - New expression implementations must follow existing patterns
- Type handling must be consistent with project type system
- Error handling must use appropriate exception types
- Null handling must be explicit and documented
Files:
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
🧠 Learnings (5)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.javacore/src/main/java/org/opensearch/sql/executor/QueryService.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
core/src/main/java/org/opensearch/sql/executor/QueryService.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Applied to files:
core/src/main/java/org/opensearch/sql/executor/QueryService.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes
Applied to files:
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
🧬 Code graph analysis (1)
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)
core/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java (1)
PPLOperandTypes(20-299)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (21, doc)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: CodeQL-Scan (java)
- GitHub Check: test-sql-cli-integration (21)
- GitHub Check: WhiteSource Security Check
🔇 Additional comments (14)
core/src/main/java/org/opensearch/sql/executor/QueryService.java (2)
9-62: LGTM: Imports align with validation pipeline.The new imports for Calcite's validation infrastructure (SqlValidator, SqlNode, SqlToRelConverter, etc.) and PPL-specific validation components are appropriate for the validation pipeline being introduced.
124-125: LGTM: Clean validation integration.The validation step is consistently integrated into both execute and explain paths, positioned logically between filter merging and optimization.
Also applies to: 166-167
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java (4)
1-58: LGTM: Well-structured class declaration.The class follows proper patterns with a factory method, extends the appropriate Calcite base class, and includes clear documentation.
81-90: LGTM: Proper UDT handling.The override correctly identifies cases where UDTs require casting from character types, addressing a gap in the base implementation.
92-141: LGTM: Datetime and IP comparison handling.The custom comparison type resolution correctly handles datetime cross-type comparisons and IP-to-string comparisons as described in the PR objectives. The nullability is properly preserved.
Note: The previous review comment about using
&&instead of&on line 120 has been addressed.
143-181: LGTM: Robust operand coercion.The method properly handles dynamic parameters, nullability synchronization, and uses SAFE_CAST to avoid malformed data errors as mentioned in the PR objectives.
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (8)
256-256: LGTM: Import additions are appropriate.The new imports (
LocaleandOpenSearchTypeUtil) are properly used in the code.Localeis used for internationalized string formatting, andOpenSearchTypeUtilsupports the runtime type detection for ADD operations.Also applies to: 284-284
359-394: LGTM: Registry simplification aligns with Calcite validation migration.The migration from signature-based function lists to single-implementation-per-function simplifies the registry structure. Thread safety is maintained: immutable maps for internal registries and
ConcurrentHashMapfor external ones.
431-478: LGTM: Resolution logic correctly simplified.The resolution logic is now straightforward—direct name-based lookup and invocation. Type checking has been migrated to Calcite's validation phase (before resolution), eliminating the need for signature matching loops. The
compulsoryCastmethod remains as a documented ad-hoc workaround for the REDUCE function's specific casting requirements.
501-560: LGTM: Type checker migration to Calcite completed correctly.The migration from
PPLTypeCheckertoSqlOperandTypeCheckeris properly implemented. TheextractTypeCheckerFromUDFhelper correctly handles both user-defined and built-in operators. Note that some registrations passnullas the type checker (e.g., forJSON_ARRAY,JSON_OBJECT,TYPEOF), which is acceptable for functions with custom validation logic.
562-703: LGTM: Standard operator registration follows correct patterns.The registration of comparison and standard operators correctly uses Calcite's type checkers. The
SUBTRACToperator properly supports both numeric and datetime operands (for timestamp binning). The use of composite type checkers with.or()follows Calcite conventions.
811-867: LGTM: Function implementations handle edge cases correctly.The
TOSTRING,MVJOIN,SPLIT, andMVINDEXimplementations properly handle their respective requirements:
TOSTRINGsupports both 1-arg (cast) and 2-arg (format) formsSPLITcorrectly handles the empty delimiter edge case usingREGEXP_EXTRACT_ALLMVINDEXsupports both 2-arg and 3-arg variants with appropriate type checking
926-1228: LGTM: Remaining function registrations and builder classes are consistent.The rest of the
populate()method and builder classes follow consistent patterns:
- Type checkers properly use Calcite's
OperandTypesandSqlTypeFamilyIS_NOT_NULL/IS_NULLcorrectly useIGNOREfamily to support struct types (good fix)- Internal builder throws on duplicates (deterministic), while external registry warns (flexible for plugins)—this distinction is appropriate
- Aggregate function registrations consistently use
SqlOperandTypeChecker
914-925: Document the ADD operator's dual-behavior type checking pattern.The ADD operator uses
SqlStdOperatorTable.PLUS.getOperandTypeChecker()but implements runtime type detection to choose between CONCAT (for strings) and PLUS (for numbers). While the inline comment explains this is intentional to avoid confusion during function resolution, add a Javadoc comment or expanded code comment explaining:
- Why PLUS's type checker is used despite ADD accepting character types
- Whether Calcite's validation phase permits string operands to reach this runtime resolution, or if there's custom validator logic that handles this case
- Reference this as a Calcite-specific workaround pattern for dual-behavior operators
| @Override | ||
| public @Nullable RelDataType implicitCast(RelDataType in, SqlTypeFamily expected) { | ||
| RelDataType casted = super.implicitCast(in, expected); | ||
| if (casted == null) { | ||
| // String -> DATETIME is converted to String -> TIMESTAMP | ||
| if (OpenSearchTypeUtil.isCharacter(in) && expected == SqlTypeFamily.DATETIME) { | ||
| return createUDTWithAttributes(factory, in, OpenSearchTypeFactory.ExprUDT.EXPR_TIMESTAMP); | ||
| } | ||
| return null; | ||
| } | ||
| return switch (casted.getSqlTypeName()) { | ||
| case SqlTypeName.DATE, SqlTypeName.TIME, SqlTypeName.TIMESTAMP, SqlTypeName.BINARY -> | ||
| createUDTWithAttributes(factory, in, casted.getSqlTypeName()); | ||
| default -> casted; | ||
| }; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace switch expression with Java 11 compatible syntax.
The switch expression on lines 70-74 uses Java 14+ syntax, but the coding guidelines require Java 11 compatibility for OpenSearch 2.x.
🔎 Proposed fix using traditional switch statement
- return switch (casted.getSqlTypeName()) {
- case SqlTypeName.DATE, SqlTypeName.TIME, SqlTypeName.TIMESTAMP, SqlTypeName.BINARY ->
- createUDTWithAttributes(factory, in, casted.getSqlTypeName());
- default -> casted;
- };
+ switch (casted.getSqlTypeName()) {
+ case DATE:
+ case TIME:
+ case TIMESTAMP:
+ case BINARY:
+ return createUDTWithAttributes(factory, in, casted.getSqlTypeName());
+ default:
+ return casted;
+ }As per coding guidelines, maintain Java 11 compatibility when possible for OpenSearch 2.x.
🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
around lines 60 to 75, replace the Java 14+ switch expression with a Java
11-compatible traditional switch statement: compute the result into a local
RelDataType variable (e.g., result), use a traditional
switch(casted.getSqlTypeName()) with case labels for DATE, TIME, TIMESTAMP,
BINARY that set result = createUDTWithAttributes(factory, in,
casted.getSqlTypeName()); and a default that sets result = casted; then return
result; ensuring all code paths assign result before the return.
| private static SqlNode castTo(SqlNode node, RelDataType type) { | ||
| if (OpenSearchTypeUtil.isDatetime(type) || OpenSearchTypeUtil.isIp(type)) { | ||
| ExprType exprType = OpenSearchTypeFactory.convertRelDataTypeToExprType(type); | ||
| return switch (exprType) { | ||
| case ExprCoreType.DATE -> | ||
| PPLBuiltinOperators.DATE.createCall(node.getParserPosition(), node); | ||
| case ExprCoreType.TIMESTAMP -> | ||
| PPLBuiltinOperators.TIMESTAMP.createCall(node.getParserPosition(), node); | ||
| case ExprCoreType.TIME -> | ||
| PPLBuiltinOperators.TIME.createCall(node.getParserPosition(), node); | ||
| case ExprCoreType.IP -> PPLBuiltinOperators.IP.createCall(node.getParserPosition(), node); | ||
| default -> throw new UnsupportedOperationException("Unsupported type: " + exprType); | ||
| }; | ||
| } | ||
| // Use SAFE_CAST instead of CAST to avoid throwing errors when numbers are malformatted | ||
| return SqlLibraryOperators.SAFE_CAST.createCall( | ||
| node.getParserPosition(), | ||
| node, | ||
| SqlTypeUtil.convertTypeToSpec(type).withNullable(type.isNullable())); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace switch expression with Java 11 compatible syntax.
The switch expression on lines 186-195 uses Java 14+ syntax. This needs to be converted for Java 11 compatibility.
🔎 Proposed fix using traditional switch statement
private static SqlNode castTo(SqlNode node, RelDataType type) {
if (OpenSearchTypeUtil.isDatetime(type) || OpenSearchTypeUtil.isIp(type)) {
ExprType exprType = OpenSearchTypeFactory.convertRelDataTypeToExprType(type);
- return switch (exprType) {
- case ExprCoreType.DATE ->
- PPLBuiltinOperators.DATE.createCall(node.getParserPosition(), node);
- case ExprCoreType.TIMESTAMP ->
- PPLBuiltinOperators.TIMESTAMP.createCall(node.getParserPosition(), node);
- case ExprCoreType.TIME ->
- PPLBuiltinOperators.TIME.createCall(node.getParserPosition(), node);
- case ExprCoreType.IP -> PPLBuiltinOperators.IP.createCall(node.getParserPosition(), node);
- default -> throw new UnsupportedOperationException("Unsupported type: " + exprType);
- };
+ if (exprType == ExprCoreType.DATE) {
+ return PPLBuiltinOperators.DATE.createCall(node.getParserPosition(), node);
+ } else if (exprType == ExprCoreType.TIMESTAMP) {
+ return PPLBuiltinOperators.TIMESTAMP.createCall(node.getParserPosition(), node);
+ } else if (exprType == ExprCoreType.TIME) {
+ return PPLBuiltinOperators.TIME.createCall(node.getParserPosition(), node);
+ } else if (exprType == ExprCoreType.IP) {
+ return PPLBuiltinOperators.IP.createCall(node.getParserPosition(), node);
+ } else {
+ throw new UnsupportedOperationException("Unsupported type: " + exprType);
+ }
}As per coding guidelines, maintain Java 11 compatibility when possible for OpenSearch 2.x.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| private static SqlNode castTo(SqlNode node, RelDataType type) { | |
| if (OpenSearchTypeUtil.isDatetime(type) || OpenSearchTypeUtil.isIp(type)) { | |
| ExprType exprType = OpenSearchTypeFactory.convertRelDataTypeToExprType(type); | |
| return switch (exprType) { | |
| case ExprCoreType.DATE -> | |
| PPLBuiltinOperators.DATE.createCall(node.getParserPosition(), node); | |
| case ExprCoreType.TIMESTAMP -> | |
| PPLBuiltinOperators.TIMESTAMP.createCall(node.getParserPosition(), node); | |
| case ExprCoreType.TIME -> | |
| PPLBuiltinOperators.TIME.createCall(node.getParserPosition(), node); | |
| case ExprCoreType.IP -> PPLBuiltinOperators.IP.createCall(node.getParserPosition(), node); | |
| default -> throw new UnsupportedOperationException("Unsupported type: " + exprType); | |
| }; | |
| } | |
| // Use SAFE_CAST instead of CAST to avoid throwing errors when numbers are malformatted | |
| return SqlLibraryOperators.SAFE_CAST.createCall( | |
| node.getParserPosition(), | |
| node, | |
| SqlTypeUtil.convertTypeToSpec(type).withNullable(type.isNullable())); | |
| } | |
| private static SqlNode castTo(SqlNode node, RelDataType type) { | |
| if (OpenSearchTypeUtil.isDatetime(type) || OpenSearchTypeUtil.isIp(type)) { | |
| ExprType exprType = OpenSearchTypeFactory.convertRelDataTypeToExprType(type); | |
| if (exprType == ExprCoreType.DATE) { | |
| return PPLBuiltinOperators.DATE.createCall(node.getParserPosition(), node); | |
| } else if (exprType == ExprCoreType.TIMESTAMP) { | |
| return PPLBuiltinOperators.TIMESTAMP.createCall(node.getParserPosition(), node); | |
| } else if (exprType == ExprCoreType.TIME) { | |
| return PPLBuiltinOperators.TIME.createCall(node.getParserPosition(), node); | |
| } else if (exprType == ExprCoreType.IP) { | |
| return PPLBuiltinOperators.IP.createCall(node.getParserPosition(), node); | |
| } else { | |
| throw new UnsupportedOperationException("Unsupported type: " + exprType); | |
| } | |
| } | |
| // Use SAFE_CAST instead of CAST to avoid throwing errors when numbers are malformatted | |
| return SqlLibraryOperators.SAFE_CAST.createCall( | |
| node.getParserPosition(), | |
| node, | |
| SqlTypeUtil.convertTypeToSpec(type).withNullable(type.isNullable())); | |
| } |
🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
around lines 183 to 202, the code uses a Java 14+ switch expression for mapping
ExprType to SqlNode; replace it with a Java 11-compatible traditional switch
statement: declare a SqlNode result variable, switch on exprType with case
labels that assign result = PPLBuiltinOperators.X.createCall(...); include a
default case that throws the same UnsupportedOperationException, then break each
case and after the switch return result; keep the existing parser position and
node arguments and preserve surrounding logic (including the SAFE_CAST return
for non-datetime/ip types).
| /** | ||
| * Validates a RelNode by converting it to SqlNode, performing validation, and converting back. | ||
| * | ||
| * <p>This process enables Calcite's type validation and implicit casting mechanisms to work on | ||
| * PPL queries. | ||
| * | ||
| * @param relNode the relation node to validate | ||
| * @param context the Calcite plan context containing the validator | ||
| * @return the validated (and potentially modified) relation node | ||
| */ | ||
| private RelNode validate(RelNode relNode, CalcitePlanContext context) { | ||
| SkipRelValidationShuttle skipShuttle = new SkipRelValidationShuttle(); | ||
| relNode.accept(skipShuttle); | ||
| if (skipShuttle.shouldSkipValidation()) { | ||
| return relNode; | ||
| } | ||
| // Fix interval literals before conversion to SQL | ||
| RelNode sqlRelNode = relNode.accept(new PplRelToSqlRelShuttle(context.rexBuilder, true)); | ||
|
|
||
| // Convert RelNode to SqlNode for validation | ||
| RelToSqlConverter rel2sql = new PplRelToSqlNodeConverter(OpenSearchSparkSqlDialect.DEFAULT); | ||
| SqlImplementor.Result result = rel2sql.visitRoot(sqlRelNode); | ||
| SqlNode root = result.asStatement(); | ||
|
|
||
| // Rewrite SqlNode to remove database qualifiers | ||
| SqlNode rewritten = root.accept(new SqlRewriteShuttle()); | ||
| SqlValidator validator = context.getValidator(); | ||
| try { | ||
| validator.validate(Objects.requireNonNull(rewritten)); | ||
| } catch (CalciteContextException e) { | ||
| if (ValidationUtils.tolerantValidationException(e)) { | ||
| return relNode; | ||
| } | ||
| throw new ExpressionEvaluationException(e.getMessage(), e); | ||
| } | ||
|
|
||
| SqlToRelConverter.Config sql2relConfig = | ||
| SqlToRelConverter.config() | ||
| // Do not remove sort in subqueries so that the orders for queries like `... | sort a | ||
| // | fields b` is preserved | ||
| .withRemoveSortInSubQuery(false) | ||
| // Disable automatic JSON_TYPE_OPERATOR wrapping for nested JSON functions. | ||
| // See CALCITE-4989: Calcite wraps nested JSON functions with JSON_TYPE by default | ||
| .withAddJsonTypeOperatorEnabled(false) | ||
| // Set hint strategy so that hints can be properly propagated. | ||
| // See SqlToRelConverter.java#convertSelectImpl | ||
| .withHintStrategyTable(PPLHintStrategyTable.getHintStrategyTable()); | ||
| SqlToRelConverter sql2rel = | ||
| new PplSqlToRelConverter( | ||
| context.config.getViewExpander(), | ||
| validator, | ||
| validator.getCatalogReader().unwrap(CalciteCatalogReader.class), | ||
| context.relBuilder.getCluster(), | ||
| PplConvertletTable.INSTANCE, | ||
| sql2relConfig); | ||
| // Convert the validated SqlNode back to RelNode | ||
| RelNode validatedRel = sql2rel.convertQuery(rewritten, false, true).project(); | ||
| return validatedRel.accept(new PplRelToSqlRelShuttle(context.rexBuilder, false)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find ValidationUtils.java file
find . -name "ValidationUtils.java" -type f 2>/dev/null
# Search for tolerantValidationException method usage
rg -n "tolerantValidationException" --type=java -B2 -A5Repository: opensearch-project/sql
Length of output: 1922
🏁 Script executed:
#!/bin/bash
# List all Java files in executor module that might contain ValidationUtils
fd "ValidationUtils" . --type f --extension java
# Also search for the method definition and all its usages
rg -n "tolerantValidationException\s*\(" --type=java -A3 -B2Repository: opensearch-project/sql
Length of output: 1782
🏁 Script executed:
#!/bin/bash
# Find all Validation-related classes
fd -i "validation" . --type f --extension java | head -20
# Search for CalciteContextException handling
rg "CalciteContextException" --type=java -n -B3 -A5
# Look for logging patterns in the validate method
rg -n "log\." core/src/main/java/org/opensearch/sql/executor/QueryService.java 2>/dev/null || echo "File check"Repository: opensearch-project/sql
Length of output: 9799
🏁 Script executed:
#!/bin/bash
# Find the QueryService.java file and examine the validate method context
rg -n "class QueryService" --type=java -A200 | grep -A50 "private RelNode validate"
# Find ValidationUtils and tolerantValidationException
rg -rn "class ValidationUtils" --type=java
# Check for any logging configuration related to validation
rg -n "logger\|log\.info\|log\.debug" core/src/main/java/org/opensearch/sql/executor/QueryService.javaRepository: opensearch-project/sql
Length of output: 152
🏁 Script executed:
#!/bin/bash
# Search for ValidationUtils implementation
find . -path "*/src/*" -name "*.java" -exec grep -l "ValidationUtils" {} \;
# Get the file location and content
rg -n "class ValidationUtils|static.*tolerantValidationException" --type=java --max-count=5Repository: opensearch-project/sql
Length of output: 1172
🏁 Script executed:
#!/bin/bash
# Find all files mentioning ValidationUtils
rg -l "ValidationUtils" --type=java | head -10
# For each file, search for the method
for file in $(rg -l "ValidationUtils" --type=java | head -5); do
echo "=== File: $file ==="
rg -A10 "tolerantValidationException" "$file"
doneRepository: opensearch-project/sql
Length of output: 1416
🏁 Script executed:
#!/bin/bash
# Find where QueryService.java is and look at the surrounding code
find . -name "QueryService.java" -type f 2>/dev/null | head -1 | xargs -I {} bash -c 'wc -l {} && echo "File found" && grep -n "tolerantValidationException" {}'
# Search for ValidationUtils definition
rg -n "class ValidationUtils" --type=java
rg -n "static.*boolean.*tolerantValidationException" --type=javaRepository: opensearch-project/sql
Length of output: 468
🏁 Script executed:
#!/bin/bash
# Find ValidationUtils class
rg -l "class ValidationUtils" --type=java
# Show the full implementation
rg -A 30 "static.*tolerantValidationException" --type=javaRepository: opensearch-project/sql
Length of output: 1202
🏁 Script executed:
#!/bin/bash
# Get the complete ValidationUtils.java file
cat core/src/main/java/org/opensearch/sql/calcite/validate/ValidationUtils.javaRepository: opensearch-project/sql
Length of output: 7184
🏁 Script executed:
#!/bin/bash
# Get more context around the tolerantValidationException method
head -170 core/src/main/java/org/opensearch/sql/calcite/validate/ValidationUtils.java | tail -30Repository: opensearch-project/sql
Length of output: 1190
Add logging when validation is skipped for known Calcite compatibility issues.
The tolerant validation exception handling is well-documented and properly scoped to specific Calcite bugs (aggregate nesting and windowed aggregates in GROUP BY). However, when the fallback to the original RelNode occurs at line 325, there is no logging to track when queries bypass validation. Add a debug or warn log statement to improve observability and aid troubleshooting when these edge cases are encountered.
🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/executor/QueryService.java around lines
294 to 352, when validator.validate(...) throws a tolerant Calcite exception the
code currently returns the original relNode without any observability; add a log
entry (debug or warn based on severity policy) inside the catch branch where
ValidationUtils.tolerantValidationException(e) is true before returning relNode,
including the exception message and context (e.g., which validation rule was
tolerated and a short identifier of the relNode or query) so operators can see
when validation was skipped.
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
|
@coderabbitai generate docstrings |
✅ Actions performedInitiated docstring generation; will generate only if new commits exist. |
|
Note Docstrings generation - SUCCESS |
Docstrings generation was requested by @yuancu. * #4892 (comment) The following files were modified: * `core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java` * `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java` * `core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java` * `core/src/main/java/org/opensearch/sql/calcite/ExtendedRexBuilder.java` * `core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java` * `core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java` * `core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java` * `core/src/main/java/org/opensearch/sql/calcite/utils/binning/BinnableField.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/OpenSearchSparkSqlDialect.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/PplConvertletTable.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercionRule.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/SqlOperatorTableProvider.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/ValidationUtils.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/converters/PplRelToSqlNodeConverter.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/converters/PplSqlToRelConverter.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/PplRelToSqlRelShuttle.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/SkipRelValidationShuttle.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/SqlRewriteShuttle.java` * `core/src/main/java/org/opensearch/sql/executor/QueryService.java` * `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ExistsFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/FilterFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ForallFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/LambdaUtils.java` * `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVAppendFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MapAppendFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MapRemoveFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ReduceFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java` * `core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java` * `core/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.java` * `core/src/main/java/org/opensearch/sql/expression/function/UserDefinedFunctionBuilder.java` * `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonAppendFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonDeleteFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtendFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractAllFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonKeysFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/udf/RelevanceQueryFunction.java` * `core/src/main/java/org/opensearch/sql/expression/function/udf/SpanFunction.java` * `core/src/main/java/org/opensearch/sql/expression/function/udf/binning/WidthBucketFunction.java` * `core/src/main/java/org/opensearch/sql/expression/function/udf/math/DivideFunction.java` * `core/src/main/java/org/opensearch/sql/expression/function/udf/math/ModFunction.java` * `core/src/main/java/org/opensearch/sql/expression/function/udf/math/ScalarMaxFunction.java` * `core/src/main/java/org/opensearch/sql/expression/function/udf/math/ScalarMinFunction.java` * `core/src/test/java/org/opensearch/sql/expression/function/AggFunctionTestBase.java` * `integ-test/src/test/java/org/opensearch/sql/calcite/clickbench/PPLClickBenchIT.java` * `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java` * `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java` * `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMultisearchCommandIT.java` * `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAppendCommandIT.java` * `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAppendcolIT.java` * `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java` * `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLExplainIT.java` * `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLPatternsIT.java` * `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteWhereCommandIT.java` * `integ-test/src/test/java/org/opensearch/sql/ppl/DateTimeFunctionIT.java` * `integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.java`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java (2)
60-75: Replace switch expression with Java 11 compatible syntax.The switch expression on lines 70-74 uses Java 14+ syntax, but the coding guidelines require Java 11 compatibility for OpenSearch 2.x. This issue was previously flagged but has not yet been addressed.
🔎 Proposed fix using traditional switch statement
@Override public @Nullable RelDataType implicitCast(RelDataType in, SqlTypeFamily expected) { RelDataType casted = super.implicitCast(in, expected); if (casted == null) { // String -> DATETIME is converted to String -> TIMESTAMP if (OpenSearchTypeUtil.isCharacter(in) && expected == SqlTypeFamily.DATETIME) { return createUDTWithAttributes(factory, in, OpenSearchTypeFactory.ExprUDT.EXPR_TIMESTAMP); } return null; } - return switch (casted.getSqlTypeName()) { - case SqlTypeName.DATE, SqlTypeName.TIME, SqlTypeName.TIMESTAMP, SqlTypeName.BINARY -> - createUDTWithAttributes(factory, in, casted.getSqlTypeName()); - default -> casted; - }; + switch (casted.getSqlTypeName()) { + case DATE: + case TIME: + case TIMESTAMP: + case BINARY: + return createUDTWithAttributes(factory, in, casted.getSqlTypeName()); + default: + return casted; + } }As per coding guidelines, maintain Java 11 compatibility when possible for OpenSearch 2.x.
183-202: Replace switch expression with Java 11 compatible syntax.The switch expression on lines 186-195 uses Java 14+ syntax. This needs to be converted for Java 11 compatibility. This issue was previously flagged but has not yet been addressed.
🔎 Proposed fix using if-else chain
private static SqlNode castTo(SqlNode node, RelDataType type) { if (OpenSearchTypeUtil.isDatetime(type) || OpenSearchTypeUtil.isIp(type)) { ExprType exprType = OpenSearchTypeFactory.convertRelDataTypeToExprType(type); - return switch (exprType) { - case ExprCoreType.DATE -> - PPLBuiltinOperators.DATE.createCall(node.getParserPosition(), node); - case ExprCoreType.TIMESTAMP -> - PPLBuiltinOperators.TIMESTAMP.createCall(node.getParserPosition(), node); - case ExprCoreType.TIME -> - PPLBuiltinOperators.TIME.createCall(node.getParserPosition(), node); - case ExprCoreType.IP -> PPLBuiltinOperators.IP.createCall(node.getParserPosition(), node); - default -> throw new UnsupportedOperationException("Unsupported type: " + exprType); - }; + if (exprType == ExprCoreType.DATE) { + return PPLBuiltinOperators.DATE.createCall(node.getParserPosition(), node); + } else if (exprType == ExprCoreType.TIMESTAMP) { + return PPLBuiltinOperators.TIMESTAMP.createCall(node.getParserPosition(), node); + } else if (exprType == ExprCoreType.TIME) { + return PPLBuiltinOperators.TIME.createCall(node.getParserPosition(), node); + } else if (exprType == ExprCoreType.IP) { + return PPLBuiltinOperators.IP.createCall(node.getParserPosition(), node); + } else { + throw new UnsupportedOperationException("Unsupported type: " + exprType); + } } // Use SAFE_CAST instead of CAST to avoid throwing errors when numbers are malformatted return SqlLibraryOperators.SAFE_CAST.createCall( node.getParserPosition(), node, SqlTypeUtil.convertTypeToSpec(type).withNullable(type.isNullable())); }As per coding guidelines, maintain Java 11 compatibility when possible for OpenSearch 2.x.
🧹 Nitpick comments (3)
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java (1)
56-58: Consider adding JavaDoc to the public constructor.While the factory method above has good documentation, coding guidelines require JavaDoc on all public methods. A brief description would improve API documentation completeness.
📝 Suggested JavaDoc
+ /** + * Constructs a PplTypeCoercion instance. + * + * @param typeFactory the type factory + * @param validator the SQL validator + */ public PplTypeCoercion(RelDataTypeFactory typeFactory, SqlValidator validator) { super(typeFactory, validator); }As per coding guidelines, all public classes and methods must have proper JavaDoc.
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (2)
402-408: Consider making duplicate registration warnings more actionable.The duplicate registration warning is helpful, but could guide developers on next steps.
🔎 Suggested enhancement to warning message
if (externalFunctionRegistry.containsKey(functionName)) { logger.warn( - String.format(Locale.ROOT, "Function %s is registered multiple times", functionName)); + String.format( + Locale.ROOT, + "Function %s is already registered and will be overwritten. Consider using a unique function name.", + functionName)); }
914-925: Improve comment clarity for the ADD operator runtime selection logic.The dynamic selection between CONCAT and PLUS based on operand types should be clarified. Since
RexNode.getType()is guaranteed to return non-null (per Calcite contract), the current implementation is safe without additional filtering. However, a comment explaining the runtime selection rationale would improve maintainability.Consider expanding the comment to explain that CONCAT is selected when all operands are character types, otherwise PLUS is used for numeric addition. This clarifies the intent for future maintainers.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.javacore/src/main/java/org/opensearch/sql/executor/QueryService.javacore/src/main/java/org/opensearch/sql/expression/function/CalciteFuncSignature.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.javacore/src/test/java/org/opensearch/sql/expression/function/AggFunctionTestBase.java
💤 Files with no reviewable changes (1)
- core/src/main/java/org/opensearch/sql/expression/function/CalciteFuncSignature.java
🚧 Files skipped from review as they are similar to previous changes (1)
- core/src/main/java/org/opensearch/sql/executor/QueryService.java
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
core/src/test/java/org/opensearch/sql/expression/function/AggFunctionTestBase.javacore/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
core/src/test/java/org/opensearch/sql/expression/function/AggFunctionTestBase.javacore/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
**/test/**/*.java
⚙️ CodeRabbit configuration file
**/test/**/*.java: - Verify NULL input tests for all new functions
- Check boundary condition tests (min/max values, empty inputs)
- Validate error condition tests (invalid inputs, exceptions)
- Ensure multi-document tests for per-document operations
- Flag smoke tests without meaningful assertions
- Check test naming follows pattern: test
- Verify test data is realistic and covers edge cases
- Verify test coverage for new business logic
- Ensure tests are independent and don't rely on execution order
- Validate meaningful test data that reflects real-world scenarios
- Check for proper cleanup of test resources
Files:
core/src/test/java/org/opensearch/sql/expression/function/AggFunctionTestBase.java
core/src/main/java/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit
Files:
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing Calcite integration patterns
- Verify RelNode visitor implementations are complete
- Check RexNode handling follows project conventions
- Validate SQL generation is correct and optimized
- Ensure Calcite version compatibility
- Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
core/src/main/java/org/opensearch/sql/expression/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/org/opensearch/sql/expression/**/*.java: - New expression implementations must follow existing patterns
- Type handling must be consistent with project type system
- Error handling must use appropriate exception types
- Null handling must be explicit and documented
Files:
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Maintain Java 11 compatibility when possible for OpenSearch 2.x
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Development requires JDK 21 for the OpenSearch SQL project
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes
Applied to files:
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
🧬 Code graph analysis (3)
core/src/test/java/org/opensearch/sql/expression/function/AggFunctionTestBase.java (1)
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)
PPLFuncImpTable(291-1269)
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java (2)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java (1)
OpenSearchTypeFactory(62-394)core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercionRule.java (1)
PplTypeCoercionRule(45-107)
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (2)
core/src/main/java/org/opensearch/sql/calcite/utils/UserDefinedFunctionUtils.java (1)
UserDefinedFunctionUtils(52-320)core/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java (1)
PPLOperandTypes(20-299)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (25, unit)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: build-linux (25, doc)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: test-sql-cli-integration (21)
- GitHub Check: CodeQL-Scan (java)
- GitHub Check: WhiteSource Security Check
🔇 Additional comments (19)
core/src/test/java/org/opensearch/sql/expression/function/AggFunctionTestBase.java (5)
19-28: LGTM! Simplified registry access.The change from
Pair<?, AggHandler>to directAggHandlerstorage simplifies the API and eliminates unnecessary wrapping. The reflection-based approach is appropriate for test utilities that need to verify internal registry state.
30-39: Consistent with simplified registry type.The method correctly uses the new
Map<BuiltinFunctionName, AggHandler>type and directly accesses the handler without Pair extraction. The assertions remain appropriate and meaningful.
47-54: Direct handler access is cleaner.Line 50 demonstrates the benefit of the simplified registry—direct assignment of
AggHandlerwithout needing to extract from aPair. This makes the code more straightforward and easier to understand.
56-70: Consistent type updates.Both methods correctly use the updated
Map<BuiltinFunctionName, AggHandler>type. Since these methods primarily operate on registry keys and size, the value type simplification has minimal impact while maintaining consistency across the test base.
72-98: Unaffected helper methods remain correct.These utility methods operate independently of the aggregate function registry and are unaffected by the type simplification. They continue to provide useful validation for function name resolution and enum mapping.
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java (5)
36-54: LGTM!The class structure and factory method are well-designed. The JavaDoc clearly explains the purpose and the blacklist approach for PPL-specific type coercion.
77-90: LGTM!The special handling for user-defined types (UDTs) is well-documented and necessary to prevent UDTs from being incorrectly treated as character types during cast validation.
92-111: LGTM!The datetime-string equality handling is well-implemented, correctly managing bidirectional comparisons and preserving UDT types when appropriate.
113-141: LGTM!The common type resolution for binary comparisons correctly implements PPL-specific rules:
- Date/time combinations coerce to timestamp
- IP/string comparisons coerce to IP
- Nullability is properly preserved
143-181: LGTM!The operand coercion logic is well-structured with proper handling of:
- Dynamic parameters (correctly skipped)
- Attribute synchronization
- SAFE_CAST nullability propagation
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (9)
359-394: Registry simplification aligns well with Calcite validation migration.The migration from
List<Pair<CalciteFuncSignature, FunctionImp>>to singleFunctionImpper function name is a significant architectural improvement that complements the new Calcite-based type checking and coercion. The simplified initialization logic usingputAllis cleaner and more maintainable.Based on learnings, this follows the pattern of leveraging Calcite's built-in mechanisms rather than maintaining custom parallel systems.
460-478: Simplified resolution logic is cleaner and more performant.The removal of signature list iteration and coercion logic at this layer makes sense given that type checking and coercion are now handled by Calcite's validation pipeline. The direct implementation lookup is more efficient.
504-560: Type checker migration to SqlOperandTypeChecker is well-structured.The updated signatures using
SqlOperandTypeCheckerand the helper methodextractTypeCheckerFromUDFprovide clean integration with Calcite's type system. The direct use ofOperandTypes.family()inregisterDivideFunctionis more maintainable than wrapped custom type checkers.
813-821: TOSTRING implementation correctly handles both signatures.The inlined implementation cleanly handles the 1-argument case (direct cast to VARCHAR) and 2-argument case (delegates to PPLBuiltinOperators.TOSTRING). The type checker using
.or()correctly validates both signatures.
1082-1110: Builder simplifications are consistent with the single-implementation model.The Builder and AggBuilder now enforce single registration per function name, which is appropriate for the new architecture. The exception thrown on duplicate internal registration (line 1091) correctly prevents configuration errors during initialization.
1137-1191: Aggregate function registrations correctly use type checkers.The aggregate registrations properly extract type checkers from UDFs using
extractTypeCheckerFromUDF()and use appropriateOperandTypesfor built-in aggregates. The MEDIAN implementation's reuse of PERCENTILE_APPROX with a fixed percentile value is elegant.
1246-1268: Excellent JavaDoc for extractTypeCheckerFromUDF.The comprehensive JavaDoc clearly explains the purpose of unwrapping
UDFOperandMetadatato extract the underlyingSqlOperandTypeCheckerand references the relevantPPLOperandTypesclass. This documentation is valuable given the method's increased usage across the refactored registration logic.
994-1009: The SUBSTRING and SUBSTR type checkers are correctly implemented and documented.Both functions properly register with custom type checkers using
OperandTypes.family(SqlTypeFamily.CHARACTER, SqlTypeFamily.INTEGER).or(OperandTypes.family(SqlTypeFamily.CHARACTER, SqlTypeFamily.INTEGER, SqlTypeFamily.INTEGER))to support both 2-arg and 3-arg variants. This aligns with Calcite's SUBSTRING(string FROM start [FOR length]) pattern and matches OpenSearch PPL's substring(str, start) or substring(str, start, length) syntax. The inline comment in the code already documents that manual type checker creation is necessary sinceSqlStdOperatorTable.SUBSTRING.getOperandTypeChecker()is null. Test coverage exists for both variants (e.g., AstBuilderTest.java validates 3-arg form, SemanticAnalyzerScalarFunctionTest.java covers multiple cases).
564-581: Verify OperandTypes type family handling with OpenSearch datetime and temporal types.The direct
OperandTypes.family(SqlTypeFamily.DATETIME, SqlTypeFamily.DATETIME)registration at lines 580-581 is correct. OpenSearch-specific UDT types (TIMESTAMP, TIME, DATE, BINARY) are properly mapped to SqlTypes through the PplValidator's type derivation layer (sqlTypeToUserDefinedType()in PplValidator.java andcreateUDTWithAttributes()in ValidationUtils.java), which handles the integration between Calcite's standard type families and OpenSearch's custom types during validation.
…king is deferred to the validation phase against the finished logical plan Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)
397-424: Add null validation for external registration parameters.Both
registerExternalOperatorandregisterExternalAggOperatorshould validate that theoperatorandaggFunctionparameters are not null before registration. Currently, null operators would cause NPE during function resolution.🔎 Proposed null checks
public void registerExternalOperator(BuiltinFunctionName functionName, SqlOperator operator) { + if (operator == null) { + throw new IllegalArgumentException("Operator cannot be null"); + } if (externalFunctionRegistry.containsKey(functionName)) { logger.warn( String.format(Locale.ROOT, "Function %s is registered multiple times", functionName)); } externalFunctionRegistry.put(functionName, (builder, args) -> builder.makeCall(operator, args)); } public void registerExternalAggOperator( BuiltinFunctionName functionName, SqlUserDefinedAggFunction aggFunction) { + if (aggFunction == null) { + throw new IllegalArgumentException("AggFunction cannot be null"); + } if (aggExternalFunctionRegistry.containsKey(functionName)) { logger.warn( String.format( Locale.ROOT, "Aggregate function %s is registered multiple times", functionName)); }
🧹 Nitpick comments (4)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java (2)
48-52: Consider extracting UDT type-checking pattern.Multiple methods repeat the pattern of checking
isUserDefinedType, casting toAbstractExprRelDataType<?>, and then accessinggetUdt()orgetExprType(). While not critical, consider extracting this pattern to reduce duplication.💡 Example refactor
Add a private helper method:
private static <T> Optional<T> getUdtAttribute(RelDataType type, Function<AbstractExprRelDataType<?>, T> extractor) { if (isUserDefinedType(type)) { return Optional.of(extractor.apply((AbstractExprRelDataType<?>) type)); } return Optional.empty(); }Then refactor methods like
isDate:public static boolean isDate(RelDataType type) { return getUdtAttribute(type, AbstractExprRelDataType::getUdt) .map(udt -> udt == OpenSearchTypeFactory.ExprUDT.EXPR_DATE) .orElse(SqlTypeName.DATE.equals(type.getSqlTypeName())); }Note: This is a stylistic suggestion; the current implementation is also acceptable.
Also applies to: 72-78, 93-96, 111-115, 130-134, 156-158, 172-175
33-55: Clarify JavaDoc for mixed numeric/character checking.The method checks for both numeric and character types among built-in SQL types, but only numeric types among UDTs. The JavaDoc could make this asymmetry more explicit to avoid confusion.
🔎 Suggested clarification
/** - * Checks if the RelDataType represents a numeric type. Supports standard SQL numeric types - * (INTEGER, BIGINT, SMALLINT, TINYINT, FLOAT, DOUBLE, DECIMAL, REAL), OpenSearch UDT numeric - * types, and string types (VARCHAR, CHAR). + * Checks if the RelDataType represents a numeric or character type. + * + * <p>For built-in SQL types, this method checks both numeric types (INTEGER, BIGINT, SMALLINT, + * TINYINT, FLOAT, DOUBLE, DECIMAL, REAL) and character types (VARCHAR, CHAR). For OpenSearch + * UDTs, only numeric types are checked (character UDTs like VARCHAR-mapped types are excluded). * * @param fieldType the RelDataType to check - * @return true if the type is numeric or string, false otherwise + * @return true if the type is numeric (built-in or UDT) or character (built-in only), false + * otherwise */ public static boolean isNumericOrCharacter(RelDataType fieldType) {core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (2)
479-493: Add bounds checking in compulsoryCast.The
REDUCEcase accessesargs[1]andargs[2]without verifying that the array has at least 3 elements. While Calcite's validation should ensure correct arity, defensive bounds checking would prevent potentialArrayIndexOutOfBoundsExceptionand provide clearer error messages.🔎 Proposed bounds check
switch (functionName) { case BuiltinFunctionName.REDUCE: + if (args.length < 3) { + throw new IllegalArgumentException( + String.format("REDUCE requires at least 3 arguments, got %d", args.length)); + } // Set the second argument to the return type of the lambda function, so that // code generated with linq4j can correctly accumulate the result. RexLambda call = (RexLambda) args[2]; args[1] = builder.makeCast(call.getType(), args[1], true, true); break;
286-1114: Consider adding class-level JavaDoc and monitoring file size.The
PPLFuncImpTableclass serves as a central registry for PPL function implementations and plays a critical role in the Calcite integration. Consider adding comprehensive class-level JavaDoc that explains:
- The migration from multi-signature to single-implementation model
- How Calcite's validation handles type checking and coercion
- The relationship between internal and external registries
- When to use
registerExternalOperatorvs internal registrationThe file currently exceeds 1000 lines, which may warrant future refactoring to extract operator groups (e.g., date/time operators, string operators) into separate builder classes.
As per coding guidelines, classes over 500 lines may need organization review.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
🧰 Additional context used
📓 Path-based instructions (4)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
core/src/main/java/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit
Files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing Calcite integration patterns
- Verify RelNode visitor implementations are complete
- Check RexNode handling follows project conventions
- Validate SQL generation is correct and optimized
- Ensure Calcite version compatibility
- Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
core/src/main/java/org/opensearch/sql/expression/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/org/opensearch/sql/expression/**/*.java: - New expression implementations must follow existing patterns
- Type handling must be consistent with project type system
- Error handling must use appropriate exception types
- Null handling must be explicit and documented
Files:
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Applied to files:
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes
Applied to files:
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (21, unit)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: test-sql-cli-integration (21)
- GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (7)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java (1)
16-31: Well-designed utility class for UDT-aware type checking.The utility class provides a clean, centralized API for type checking with proper UDT support. The design follows established patterns (check built-in types first, then UDTs) and integrates well with Calcite's
SqlTypeUtil. This aligns with the PR's objectives to support UDTs during Calcite validation.core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (6)
455-473: LGTM! External function override logic is correct.The resolution logic correctly prioritizes external function implementations over internal ones, allowing data-storage-dependent implementations to override built-in functions. The null check ensures fail-fast behavior for unregistered functions.
752-760: LGTM! TOSTRING implementation handles both single and multi-arg cases correctly.The implementation appropriately uses Calcite's CAST for the simple single-argument case and delegates to a custom operator for more complex scenarios. The nullable VARCHAR type is correctly specified.
769-794: LGTM! SPLIT implementation elegantly handles empty delimiter edge case.The use of a CASE expression to conditionally choose between character splitting (via REGEXP_EXTRACT_ALL with '.') and standard delimiter splitting is a clean solution. The inline comments clearly explain the behavior.
531-536: LGTM! Migration to Calcite standard comparison operators.The direct registration of comparison operators using
SqlStdOperatorTablealigns with the PR objective of leveraging Calcite's built-in validation and type coercion instead of custom implementations.
968-982: LGTM! Simplified Builder enforces single-registration constraint.The refactored
Builderclass enforces that each function name maps to exactly one implementation, preventing ambiguity. The use ofIllegalStateExceptionwith a clear error message ensures fail-fast behavior during initialization.
984-1098: LGTM! AggBuilder follows consistent pattern with simplified registration.The
AggBuilderrefactor mirrors theBuilderchanges, maintaining consistency across the codebase. The aggregation handler registrations are clean and leverage lambda expressions effectively. The inline implementations (AVG, COUNT, PERCENTILE_APPROX, MEDIAN, EARLIEST, LATEST, FIRST, LAST) are well-structured.
| public class OpenSearchTypeUtil { | ||
| /** | ||
| * Whether a given RelDataType is a user-defined type (UDT) | ||
| * | ||
| * @param type the RelDataType to check | ||
| * @return true if the type is a user-defined type, false otherwise | ||
| */ | ||
| public static boolean isUserDefinedType(RelDataType type) { | ||
| return type instanceof AbstractExprRelDataType<?>; | ||
| } | ||
|
|
||
| /** | ||
| * Checks if the RelDataType represents a numeric type. Supports standard SQL numeric types | ||
| * (INTEGER, BIGINT, SMALLINT, TINYINT, FLOAT, DOUBLE, DECIMAL, REAL), OpenSearch UDT numeric | ||
| * types, and string types (VARCHAR, CHAR). | ||
| * | ||
| * @param fieldType the RelDataType to check | ||
| * @return true if the type is numeric or string, false otherwise | ||
| */ | ||
| public static boolean isNumericOrCharacter(RelDataType fieldType) { | ||
| // Check standard SQL numeric types & string types (VARCHAR, CHAR) | ||
| if (SqlTypeUtil.isNumeric(fieldType) || SqlTypeUtil.isCharacter(fieldType)) { | ||
| return true; | ||
| } | ||
|
|
||
| // Check for OpenSearch UDT numeric types | ||
| if (isUserDefinedType(fieldType)) { | ||
| AbstractExprRelDataType<?> exprType = (AbstractExprRelDataType<?>) fieldType; | ||
| ExprType udtType = exprType.getExprType(); | ||
| return ExprCoreType.numberTypes().contains(udtType); | ||
| } | ||
|
|
||
| return false; | ||
| } | ||
|
|
||
| /** | ||
| * Checks if the RelDataType represents a time-based field (timestamp, date, or time). Supports | ||
| * both standard SQL time types (including TIMESTAMP, TIMESTAMP_WITH_LOCAL_TIME_ZONE, DATE, TIME, | ||
| * and their timezone variants) and OpenSearch UDT time types. | ||
| * | ||
| * @param fieldType the RelDataType to check | ||
| * @return true if the type is time-based, false otherwise | ||
| */ | ||
| public static boolean isDatetime(RelDataType fieldType) { | ||
| // Check standard SQL time types | ||
| if (SqlTypeUtil.isDatetime(fieldType)) { | ||
| return true; | ||
| } | ||
|
|
||
| // Check for OpenSearch UDT types (EXPR_TIMESTAMP mapped to VARCHAR) | ||
| if (isUserDefinedType(fieldType)) { | ||
| AbstractExprRelDataType<?> exprType = (AbstractExprRelDataType<?>) fieldType; | ||
| ExprType udtType = exprType.getExprType(); | ||
| return udtType == ExprCoreType.TIMESTAMP | ||
| || udtType == ExprCoreType.DATE | ||
| || udtType == ExprCoreType.TIME; | ||
| } | ||
|
|
||
| return false; | ||
| } | ||
|
|
||
| /** | ||
| * Checks whether a {@link RelDataType} represents a date type. | ||
| * | ||
| * <p>This method returns true for both Calcite's built-in {@link SqlTypeName#DATE} type and | ||
| * OpenSearch's user-defined date type {@link OpenSearchTypeFactory.ExprUDT#EXPR_DATE}. | ||
| * | ||
| * @param type the type to check | ||
| * @return true if the type is a date type (built-in or user-defined), false otherwise | ||
| */ | ||
| public static boolean isDate(RelDataType type) { | ||
| if (isUserDefinedType(type)) { | ||
| if (((AbstractExprRelDataType<?>) type).getUdt() == OpenSearchTypeFactory.ExprUDT.EXPR_DATE) { | ||
| return true; | ||
| } | ||
| } | ||
| return SqlTypeName.DATE.equals(type.getSqlTypeName()); | ||
| } | ||
|
|
||
| /** | ||
| * Checks whether a {@link RelDataType} represents a timestamp type. | ||
| * | ||
| * <p>This method returns true for both Calcite's built-in {@link SqlTypeName#TIMESTAMP} type and | ||
| * OpenSearch's user-defined timestamp type {@link OpenSearchTypeFactory.ExprUDT#EXPR_TIMESTAMP}. | ||
| * | ||
| * @param type the type to check | ||
| * @return true if the type is a timestamp type (built-in or user-defined), false otherwise | ||
| */ | ||
| public static boolean isTimestamp(RelDataType type) { | ||
| if (isUserDefinedType(type)) { | ||
| if (((AbstractExprRelDataType<?>) type).getUdt() | ||
| == OpenSearchTypeFactory.ExprUDT.EXPR_TIMESTAMP) { | ||
| return true; | ||
| } | ||
| } | ||
| return SqlTypeName.TIMESTAMP.equals(type.getSqlTypeName()); | ||
| } | ||
|
|
||
| /** | ||
| * Checks whether a {@link RelDataType} represents a time type. | ||
| * | ||
| * <p>This method returns true for both Calcite's built-in {@link SqlTypeName#TIME} type and | ||
| * OpenSearch's user-defined time type {@link OpenSearchTypeFactory.ExprUDT#EXPR_TIME}. | ||
| * | ||
| * @param type the type to check | ||
| * @return true if the type is a time type (built-in or user-defined), false otherwise | ||
| */ | ||
| public static boolean isTime(RelDataType type) { | ||
| if (isUserDefinedType(type)) { | ||
| if (((AbstractExprRelDataType<?>) type).getUdt() == OpenSearchTypeFactory.ExprUDT.EXPR_TIME) { | ||
| return true; | ||
| } | ||
| } | ||
| return SqlTypeName.TIME.equals(type.getSqlTypeName()); | ||
| } | ||
|
|
||
| /** | ||
| * This method should be used in place for {@link SqlTypeUtil#isCharacter(RelDataType)} because | ||
| * user-defined types also have VARCHAR as their SqlTypeName. | ||
| */ | ||
| public static boolean isCharacter(RelDataType type) { | ||
| return !isUserDefinedType(type) && SqlTypeUtil.isCharacter(type); | ||
| } | ||
|
|
||
| /** | ||
| * Checks whether a {@link RelDataType} represents an IP address type. | ||
| * | ||
| * <p>This method returns true only for OpenSearch's user-defined IP type {@link | ||
| * OpenSearchTypeFactory.ExprUDT#EXPR_IP}. | ||
| * | ||
| * @param type the type to check | ||
| * @return true if the type is an IP address type, false otherwise | ||
| */ | ||
| public static boolean isIp(RelDataType type) { | ||
| if (isUserDefinedType(type)) { | ||
| return ((AbstractExprRelDataType<?>) type).getUdt() == OpenSearchTypeFactory.ExprUDT.EXPR_IP; | ||
| } | ||
| return false; | ||
| } | ||
|
|
||
| /** | ||
| * Checks whether a {@link RelDataType} represents a binary type. | ||
| * | ||
| * <p>This method returns true for both Calcite's built-in binary types (BINARY, VARBINARY) and | ||
| * OpenSearch's user-defined binary type {@link OpenSearchTypeFactory.ExprUDT#EXPR_BINARY}. | ||
| * | ||
| * @param type the type to check | ||
| * @return true if the type is a binary type (built-in or user-defined), false otherwise | ||
| */ | ||
| public static boolean isBinary(RelDataType type) { | ||
| if (isUserDefinedType(type)) { | ||
| return ((AbstractExprRelDataType<?>) type).getUdt() | ||
| == OpenSearchTypeFactory.ExprUDT.EXPR_BINARY; | ||
| } | ||
| return SqlTypeName.BINARY_TYPES.contains(type.getSqlTypeName()); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Search for test files related to OpenSearchTypeUtil
find . -type f -name "*OpenSearchTypeUtil*Test*.java" 2>/dev/null | head -20Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Also search for references to OpenSearchTypeUtil in test files
rg -l "OpenSearchTypeUtil" --type java -g "*Test*.java" 2>/dev/nullRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Check if the main file exists in the context provided
ls -la core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java 2>/dev/null || echo "File not found in repository"Repository: opensearch-project/sql
Length of output: 190
🏁 Script executed:
# Search for method references in any test files
rg "isUserDefinedType|isNumericOrCharacter|isDatetime|isDate|isTimestamp|isTime|isCharacter|isIp|isBinary" --type java -g "*Test.java" | head -30Repository: opensearch-project/sql
Length of output: 1197
🏁 Script executed:
# Also search without the Test suffix to catch integration tests (*IT.java)
rg "isUserDefinedType|isNumericOrCharacter|isDatetime|isDate|isTimestamp|isTime|isCharacter|isIp|isBinary" --type java -g "*IT.java" | head -30Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Check for any test files in the calcite/utils directory
find core/src/test -type f -path "*calcite/utils*" -name "*.java" 2>/dev/nullRepository: opensearch-project/sql
Length of output: 288
🏁 Script executed:
# Search for any direct callers of OpenSearchTypeUtil in the codebase
rg "OpenSearchTypeUtil\." --type java -g "!*Test.java" -g "!*IT.java" | wc -lRepository: opensearch-project/sql
Length of output: 68
🏁 Script executed:
# Search for imports of OpenSearchTypeUtil in test files (could be indirect testing)
rg "import.*OpenSearchTypeUtil" --type java | grep -i testRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Check for files that call OpenSearchTypeUtil methods in tests
rg "OpenSearchTypeUtil\." --type java | grep -i "test" | head -20Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Get a list of test files that might indirectly test these utilities
fd -e java -t f "Test.java$" "core/src/test" | xargs grep -l "isUserDefinedType\|isNumericOrCharacter\|isDatetime\|isDate\|isTimestamp\|isTime\|isCharacter\|isIp\|isBinary" 2>/dev/nullRepository: opensearch-project/sql
Length of output: 50
Create comprehensive unit tests for OpenSearchTypeUtil utility methods.
Unit tests are mandatory per coding guidelines for all new functions. This utility class contains 9 public static methods with no dedicated test coverage. Create OpenSearchTypeUtilTest.java in core/src/test/java/org/opensearch/sql/calcite/utils/ with test cases covering:
isUserDefinedType()– verify UDT detection and non-UDT handlingisNumericOrCharacter()– test both SQL numeric/character types and UDT numeric typesisDatetime()– verify SQL time types and UDT time types (TIMESTAMP, DATE, TIME)isDate(),isTimestamp(),isTime()– test built-in and UDT variants for each typeisCharacter()– verify non-UDT VARCHAR/CHAR detection onlyisIp()– test UDT IP type detectionisBinary()– test both SQL binary types and UDT binary type
Include edge cases: null inputs (once null handling is added), mixed type scenarios, and both code paths (built-in vs. UDT) for each method per Calcite integration test requirements.
🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
(around lines 22-178), there are nine public static helpers with no unit tests;
create
core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtilTest.java
and implement focused JUnit tests that validate: isUserDefinedType() for both
UDT and non-UDT RelDataType instances; isNumericOrCharacter() for Calcite
numeric/character types and for AbstractExprRelDataType UDT numeric types;
isDatetime() for Calcite datetime types and UDT TIMESTAMP/DATE/TIME; isDate(),
isTimestamp(), isTime() each for built-in SqlTypeName and corresponding
OpenSearchTypeFactory.ExprUDT variants; isCharacter() ensuring it returns true
only for non-UDT VARCHAR/CHAR; isIp() for UDT EXPR_IP; isBinary() for
SqlTypeName binary types and UDT EXPR_BINARY; include mixed-type scenarios and
edge-case assertions (null inputs handling should be tested after adding
null-safety in the production methods — if null-safety is not present, assert
expected NullPointerException to reflect current behavior), and construct UDT
types by instantiating or mocking AbstractExprRelDataType and
OpenSearchTypeFactory.ExprUDT as needed so both code paths (built-in vs UDT) are
covered.
| public static boolean isUserDefinedType(RelDataType type) { | ||
| return type instanceof AbstractExprRelDataType<?>; | ||
| } | ||
|
|
||
| /** | ||
| * Checks if the RelDataType represents a numeric type. Supports standard SQL numeric types | ||
| * (INTEGER, BIGINT, SMALLINT, TINYINT, FLOAT, DOUBLE, DECIMAL, REAL), OpenSearch UDT numeric | ||
| * types, and string types (VARCHAR, CHAR). | ||
| * | ||
| * @param fieldType the RelDataType to check | ||
| * @return true if the type is numeric or string, false otherwise | ||
| */ | ||
| public static boolean isNumericOrCharacter(RelDataType fieldType) { | ||
| // Check standard SQL numeric types & string types (VARCHAR, CHAR) | ||
| if (SqlTypeUtil.isNumeric(fieldType) || SqlTypeUtil.isCharacter(fieldType)) { | ||
| return true; | ||
| } | ||
|
|
||
| // Check for OpenSearch UDT numeric types | ||
| if (isUserDefinedType(fieldType)) { | ||
| AbstractExprRelDataType<?> exprType = (AbstractExprRelDataType<?>) fieldType; | ||
| ExprType udtType = exprType.getExprType(); | ||
| return ExprCoreType.numberTypes().contains(udtType); | ||
| } | ||
|
|
||
| return false; | ||
| } | ||
|
|
||
| /** | ||
| * Checks if the RelDataType represents a time-based field (timestamp, date, or time). Supports | ||
| * both standard SQL time types (including TIMESTAMP, TIMESTAMP_WITH_LOCAL_TIME_ZONE, DATE, TIME, | ||
| * and their timezone variants) and OpenSearch UDT time types. | ||
| * | ||
| * @param fieldType the RelDataType to check | ||
| * @return true if the type is time-based, false otherwise | ||
| */ | ||
| public static boolean isDatetime(RelDataType fieldType) { | ||
| // Check standard SQL time types | ||
| if (SqlTypeUtil.isDatetime(fieldType)) { | ||
| return true; | ||
| } | ||
|
|
||
| // Check for OpenSearch UDT types (EXPR_TIMESTAMP mapped to VARCHAR) | ||
| if (isUserDefinedType(fieldType)) { | ||
| AbstractExprRelDataType<?> exprType = (AbstractExprRelDataType<?>) fieldType; | ||
| ExprType udtType = exprType.getExprType(); | ||
| return udtType == ExprCoreType.TIMESTAMP | ||
| || udtType == ExprCoreType.DATE | ||
| || udtType == ExprCoreType.TIME; | ||
| } | ||
|
|
||
| return false; | ||
| } | ||
|
|
||
| /** | ||
| * Checks whether a {@link RelDataType} represents a date type. | ||
| * | ||
| * <p>This method returns true for both Calcite's built-in {@link SqlTypeName#DATE} type and | ||
| * OpenSearch's user-defined date type {@link OpenSearchTypeFactory.ExprUDT#EXPR_DATE}. | ||
| * | ||
| * @param type the type to check | ||
| * @return true if the type is a date type (built-in or user-defined), false otherwise | ||
| */ | ||
| public static boolean isDate(RelDataType type) { | ||
| if (isUserDefinedType(type)) { | ||
| if (((AbstractExprRelDataType<?>) type).getUdt() == OpenSearchTypeFactory.ExprUDT.EXPR_DATE) { | ||
| return true; | ||
| } | ||
| } | ||
| return SqlTypeName.DATE.equals(type.getSqlTypeName()); | ||
| } | ||
|
|
||
| /** | ||
| * Checks whether a {@link RelDataType} represents a timestamp type. | ||
| * | ||
| * <p>This method returns true for both Calcite's built-in {@link SqlTypeName#TIMESTAMP} type and | ||
| * OpenSearch's user-defined timestamp type {@link OpenSearchTypeFactory.ExprUDT#EXPR_TIMESTAMP}. | ||
| * | ||
| * @param type the type to check | ||
| * @return true if the type is a timestamp type (built-in or user-defined), false otherwise | ||
| */ | ||
| public static boolean isTimestamp(RelDataType type) { | ||
| if (isUserDefinedType(type)) { | ||
| if (((AbstractExprRelDataType<?>) type).getUdt() | ||
| == OpenSearchTypeFactory.ExprUDT.EXPR_TIMESTAMP) { | ||
| return true; | ||
| } | ||
| } | ||
| return SqlTypeName.TIMESTAMP.equals(type.getSqlTypeName()); | ||
| } | ||
|
|
||
| /** | ||
| * Checks whether a {@link RelDataType} represents a time type. | ||
| * | ||
| * <p>This method returns true for both Calcite's built-in {@link SqlTypeName#TIME} type and | ||
| * OpenSearch's user-defined time type {@link OpenSearchTypeFactory.ExprUDT#EXPR_TIME}. | ||
| * | ||
| * @param type the type to check | ||
| * @return true if the type is a time type (built-in or user-defined), false otherwise | ||
| */ | ||
| public static boolean isTime(RelDataType type) { | ||
| if (isUserDefinedType(type)) { | ||
| if (((AbstractExprRelDataType<?>) type).getUdt() == OpenSearchTypeFactory.ExprUDT.EXPR_TIME) { | ||
| return true; | ||
| } | ||
| } | ||
| return SqlTypeName.TIME.equals(type.getSqlTypeName()); | ||
| } | ||
|
|
||
| /** | ||
| * This method should be used in place for {@link SqlTypeUtil#isCharacter(RelDataType)} because | ||
| * user-defined types also have VARCHAR as their SqlTypeName. | ||
| */ | ||
| public static boolean isCharacter(RelDataType type) { | ||
| return !isUserDefinedType(type) && SqlTypeUtil.isCharacter(type); | ||
| } | ||
|
|
||
| /** | ||
| * Checks whether a {@link RelDataType} represents an IP address type. | ||
| * | ||
| * <p>This method returns true only for OpenSearch's user-defined IP type {@link | ||
| * OpenSearchTypeFactory.ExprUDT#EXPR_IP}. | ||
| * | ||
| * @param type the type to check | ||
| * @return true if the type is an IP address type, false otherwise | ||
| */ | ||
| public static boolean isIp(RelDataType type) { | ||
| if (isUserDefinedType(type)) { | ||
| return ((AbstractExprRelDataType<?>) type).getUdt() == OpenSearchTypeFactory.ExprUDT.EXPR_IP; | ||
| } | ||
| return false; | ||
| } | ||
|
|
||
| /** | ||
| * Checks whether a {@link RelDataType} represents a binary type. | ||
| * | ||
| * <p>This method returns true for both Calcite's built-in binary types (BINARY, VARBINARY) and | ||
| * OpenSearch's user-defined binary type {@link OpenSearchTypeFactory.ExprUDT#EXPR_BINARY}. | ||
| * | ||
| * @param type the type to check | ||
| * @return true if the type is a binary type (built-in or user-defined), false otherwise | ||
| */ | ||
| public static boolean isBinary(RelDataType type) { | ||
| if (isUserDefinedType(type)) { | ||
| return ((AbstractExprRelDataType<?>) type).getUdt() | ||
| == OpenSearchTypeFactory.ExprUDT.EXPR_BINARY; | ||
| } | ||
| return SqlTypeName.BINARY_TYPES.contains(type.getSqlTypeName()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add null parameter validation to public methods.
All public methods accept RelDataType parameters but perform no null checks. Since this is a new public utility API, consider adding defensive null checks or documenting the non-null requirement with annotations.
🔎 Example fix for null validation
Option 1: Add explicit null checks with meaningful exceptions:
public static boolean isUserDefinedType(RelDataType type) {
+ if (type == null) {
+ throw new IllegalArgumentException("RelDataType parameter cannot be null");
+ }
return type instanceof AbstractExprRelDataType<?>;
}Option 2: Use Lombok's @NonNull annotation (requires adding Lombok dependency if not already present):
+import lombok.NonNull;
+
-public static boolean isUserDefinedType(RelDataType type) {
+public static boolean isUserDefinedType(@NonNull RelDataType type) {
return type instanceof AbstractExprRelDataType<?>;
}Apply similar changes to all other public methods in this class.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
around lines 29 to 177, all public methods that accept RelDataType
(isUserDefinedType, isNumericOrCharacter, isDatetime, isDate, isTimestamp,
isTime, isCharacter, isIp, isBinary) lack null validation; add defensive null
checks at the start of each public method (e.g., Objects.requireNonNull(type,
"type must not be null") or throw a clear NPE with a descriptive message) or
annotate parameters with a non-null annotation if your codebase prefers that
style, ensuring consistency across all methods and corresponding unit tests are
updated to expect the NPE/annotation contract.
| /** | ||
| * This method should be used in place for {@link SqlTypeUtil#isCharacter(RelDataType)} because | ||
| * user-defined types also have VARCHAR as their SqlTypeName. | ||
| */ | ||
| public static boolean isCharacter(RelDataType type) { | ||
| return !isUserDefinedType(type) && SqlTypeUtil.isCharacter(type); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve JavaDoc formatting and grammar.
The isCharacter method uses a comment-style documentation instead of the JavaDoc format used by other methods. Additionally, "in place for" should be "in place of".
🔎 Proposed fix
- /**
- * This method should be used in place for {@link SqlTypeUtil#isCharacter(RelDataType)} because
- * user-defined types also have VARCHAR as their SqlTypeName.
- */
+ /**
+ * Checks whether a {@link RelDataType} represents a character type.
+ *
+ * <p>This method should be used in place of {@link SqlTypeUtil#isCharacter(RelDataType)} because
+ * user-defined types also have VARCHAR as their SqlTypeName but are not character types.
+ *
+ * @param type the type to check
+ * @return true if the type is a character type (excluding UDTs), false otherwise
+ */
public static boolean isCharacter(RelDataType type) {
return !isUserDefinedType(type) && SqlTypeUtil.isCharacter(type);
}🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
around lines 138 to 144, replace the comment-style documentation with proper
JavaDoc and fix the grammar: change "in place for" to "in place of", keep the
{@link SqlTypeUtil#isCharacter(RelDataType)} reference, and ensure the JavaDoc
sentence clearly explains that this method should be used instead of
SqlTypeUtil.isCharacter(...) because user-defined types also have VARCHAR as
their SqlTypeName; update punctuation and formatting to match surrounding method
docs.
| protected void registerOperator(BuiltinFunctionName functionName, SqlOperator operator) { | ||
| register( | ||
| functionName, | ||
| (RexBuilder builder, RexNode... args) -> builder.makeCall(operator, args), | ||
| typeChecker); | ||
| functionName, (RexBuilder builder, RexNode... args) -> builder.makeCall(operator, args)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add null validation for operator parameter.
The registerOperator method should validate that the operator parameter is not null before creating the lambda. This prevents cryptic NPEs during function resolution and provides clearer error messages at registration time.
🔎 Proposed null check
protected void registerOperator(BuiltinFunctionName functionName, SqlOperator operator) {
+ if (operator == null) {
+ throw new IllegalArgumentException(
+ String.format("Operator cannot be null for function %s", functionName));
+ }
register(
functionName, (RexBuilder builder, RexNode... args) -> builder.makeCall(operator, args));
}🤖 Prompt for AI Agents
In
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
around lines 511 to 514, the registerOperator method currently accepts a
potentially null SqlOperator which can cause cryptic NPEs later; add an explicit
null check for the operator parameter and throw an IllegalArgumentException (or
Objects.requireNonNull with a clear message) before calling register so the
error is raised at registration time with a descriptive message.
| // Register ADD (+ symbol) for string concatenation and numeric addition | ||
| // Not creating PPL builtin operator as it will cause confusion during function resolution | ||
| FunctionImp add = | ||
| (builder, args) -> { | ||
| SqlOperator op = | ||
| (Stream.of(args).map(RexNode::getType).allMatch(OpenSearchTypeUtil::isCharacter)) | ||
| ? SqlStdOperatorTable.CONCAT | ||
| : SqlStdOperatorTable.PLUS; | ||
| return builder.makeCall(op, args); | ||
| }; | ||
| register(ADD, add); | ||
| register(ADDFUNCTION, add); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find the PPLFuncImpTable file and examine resolve method calls
find . -name "PPLFuncImpTable.java" -type f 2>/dev/null | head -5Repository: opensearch-project/sql
Length of output: 146
🏁 Script executed:
#!/bin/bash
# Search for resolve() method calls on FunctionImp
rg -n "\.resolve\(" --type=java -g "**/calcite/**" -A 3 -B 3 | head -100
# Search for similar type checking patterns with Stream
rg -n "Stream.*allMatch.*getType" --type=java | head -50
# Search for null type handling in expression implementations
rg -n "getType\(\).*=.*null\|null.*getType\(\)" --type=java -g "**/expression/**" | head -50Repository: opensearch-project/sql
Length of output: 11845
🏁 Script executed:
#!/bin/bash
# Get the PPLFuncImpTable.resolve method signature
sed -n '1,100p' ./core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
# Search for FunctionImp interface definition
rg -n "interface FunctionImp\|class.*FunctionImp" --type=java -A 10
# Check for Stream.allMatch with empty args handling
rg -n "allMatch.*getType\|getType.*allMatch" --type=java -B 2 -A 2 | head -50Repository: opensearch-project/sql
Length of output: 7943
🏁 Script executed:
#!/bin/bash
# Look at the PPLFuncImpTable file around lines 842-853 for ADD function
sed -n '800,900p' ./core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
# Also search for the resolve method signature
rg -n "public.*resolve\(" ./core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java -A 5
# Find how OpenSearchTypeUtil.isCharacter is implemented
rg -n "isCharacter" --type=java -B 2 -A 5Repository: opensearch-project/sql
Length of output: 19671
🏁 Script executed:
#!/bin/bash
# Check context where resolve is called - look for test patterns
rg -n "resolve.*ADD\|resolve.*ADDFUNCTION" --type=java -B 3 -A 3 | head -60Repository: opensearch-project/sql
Length of output: 48
Add defensive guard for empty args and document the dynamic operator selection pattern.
The type-based dispatch for ADD/ADDFUNCTION works correctly in practice (types are guaranteed post-Calcite validation), but needs defensive handling and documentation:
Stream.of(args).allMatch()returnstruefor empty args, incorrectly selecting CONCAT instead of PLUS- No documentation explaining the Calcite-specific dynamic dispatch pattern (required by coding guidelines)
Add an empty check and Calcite-specific documentation:
FunctionImp add =
(builder, args) -> {
// In Calcite, dynamic operator selection at plan-build time based on validated argument types.
// All RexNodes have types post-validation; allMatch returns true on empty stream, so guard it.
SqlOperator op = (args.length > 0
&& Stream.of(args).map(RexNode::getType).allMatch(OpenSearchTypeUtil::isCharacter))
? SqlStdOperatorTable.CONCAT
: SqlStdOperatorTable.PLUS;
return builder.makeCall(op, args);
};🤖 Prompt for AI Agents
In
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
around lines 842 to 853, the ADD/ADDFUNCTION implementation uses
Stream.of(args).allMatch(...) which returns true for empty args and thus wrongly
picks CONCAT; add a defensive guard to check args.length > 0 before evaluating
allMatch and update the inline comment to document the Calcite-specific dynamic
operator selection (types are available post-Calcite validation and allMatch on
empty streams yields true), so choose CONCAT only when args.length > 0 and all
argument types are character, otherwise select PLUS.
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java (1)
131-141: Operand count validation doesn't respectthrowOnFailureparameter.While the fix now correctly passes
throwOnFailuretocheckSingleOperandTypeat line 140, the operand count validation at line 133-134 still doesn't respect thethrowOnFailureparameter. When the operand count is invalid andthrowOnFailureis true, the method should throw a validation exception rather than silently returning false.🔎 Proposed fix
@Override public boolean checkOperandTypes(SqlCallBinding callBinding, boolean throwOnFailure) { if (!getOperandCountRange().isValidCount(callBinding.getOperandCount())) { + if (throwOnFailure) { + throw callBinding.newValidationSignatureError(); + } return false; } return IntStream.range(0, 2) .allMatch( i -> checkSingleOperandType( callBinding, callBinding.operand(i), i, throwOnFailure)); }core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java (1)
48-81: Operand count validation doesn't enforce odd argument count requirement.The current implementation improves upon the previous version by enforcing a minimum of 3 arguments (line 72). However, it still doesn't validate that the total argument count is odd, which is required by the function's contract (json_string + path/value pairs).
With an even argument count (e.g., 4 arguments), the validation at lines 56-68 would pass (checking only arg[1] as a path), but the runtime
evalmethod would fail at line 104 when trying to accesskeys.get(i + 1)for a missing value argument.🔎 Proposed fix
@Override public boolean checkOperandTypes(SqlCallBinding callBinding, boolean throwOnFailure) { final int count = callBinding.getOperandCount(); + // Validate odd argument count (json_string + path/value pairs) + if (count % 2 == 0) { + if (throwOnFailure) { + throw callBinding.newValidationSignatureError(); + } + return false; + } for (int i = 1; i < count; i += 2) { RelDataType nameType = callBinding.getOperandType(i); if (!OpenSearchTypeUtil.isCharacter(nameType)) { if (throwOnFailure) { throw callBinding.newError(RESOURCE.expectedCharacter()); } return false; } } return true; }
🧹 Nitpick comments (2)
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java (1)
62-152: Consider documenting the arbitrary additional arguments behavior.The implementation allows arbitrary arguments after the first two (array and function), as noted in the comment at line 63. However, the runtime behavior in the
evalmethod (lines 195-208) currently only supports one captured variable. The limitation is documented in comments at lines 197-202, but users might not be aware of this constraint when calling the function.Consider adding JavaDoc to the class or method level explaining:
- The function accepts additional arguments as captured variables for the lambda
- Currently only the first captured variable is fully supported
- The behavior difference between Function1 (no captures/index) and Function2 (captures or index)
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (1)
84-90: Consider instance-scoped dependency injection over static mutable state.The static
operatorTableProviderfield with a public setter creates global mutable state that could complicate testing (tests must coordinate access to shared state) and scenarios where different contexts need different operator tables. While this pattern breaks circular dependencies as noted in the PR objectives, consider whether you could pass the provider via thecreate()factory method or constructor instead. If the static approach is necessary, document the initialization contract and consider thread-safety ifsetOperatorTableProvidercould be called concurrently.Alternative instance-based approach
@Setter private static SqlOperatorTableProvider operatorTableProvider; + private final SqlOperatorTableProvider instanceOperatorTableProvider; private CalcitePlanContext(FrameworkConfig config, SysLimit sysLimit, QueryType queryType) { + this(config, sysLimit, queryType, operatorTableProvider); + } + + private CalcitePlanContext(FrameworkConfig config, SysLimit sysLimit, QueryType queryType, SqlOperatorTableProvider provider) { this.config = config; this.sysLimit = sysLimit; this.queryType = queryType; + this.instanceOperatorTableProvider = provider; // ... rest of initializationThen use
instanceOperatorTableProvideringetValidator()instead of the static field.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.javacore/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.javacore/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.javacore/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java
🧰 Additional context used
📓 Path-based instructions (4)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.javacore/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.javacore/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.javacore/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.javacore/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.javacore/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.javacore/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java
core/src/main/java/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit
Files:
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.javacore/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.javacore/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.javacore/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java
core/src/main/java/org/opensearch/sql/expression/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/org/opensearch/sql/expression/**/*.java: - New expression implementations must follow existing patterns
- Type handling must be consistent with project type system
- Error handling must use appropriate exception types
- Null handling must be explicit and documented
Files:
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.javacore/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.javacore/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing Calcite integration patterns
- Verify RelNode visitor implementations are complete
- Check RexNode handling follows project conventions
- Validate SQL generation is correct and optimized
- Ensure Calcite version compatibility
- Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.javacore/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Applied to files:
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.javacore/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.javacore/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Development requires JDK 21 for the OpenSearch SQL project
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
🧬 Code graph analysis (2)
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java (1)
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/LambdaUtils.java (1)
LambdaUtils(11-29)
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (5)
core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java (1)
CalciteToolsHelper(99-379)core/src/main/java/org/opensearch/sql/calcite/validate/OpenSearchSparkSqlDialect.java (1)
OpenSearchSparkSqlDialect(28-113)core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java (1)
PplTypeCoercion(43-203)core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercionRule.java (1)
PplTypeCoercionRule(45-107)core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java (1)
PplValidator(50-205)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (21, integration)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: test-sql-cli-integration (21)
🔇 Additional comments (2)
core/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.java (1)
91-101: LGTM! Empty allowedParamTypes case is now handled correctly.The previous review concern about invalid operand count range when
allowedParamTypesis empty has been properly addressed. The code now checks for null/empty at line 92 and returns a valid range of (0,0) before computing min/max.core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java (1)
56-68: Ensure comprehensive validation before runtime execution.The operand type validation correctly checks that path arguments are character types, which prevents type mismatches. However, this validation should be paired with the odd-count check (flagged separately) to ensure all runtime preconditions are validated during the validation phase rather than failing later in
eval().
| public SqlValidator getValidator() { | ||
| if (validator == null) { | ||
| synchronized (this) { | ||
| // Double-Checked Locking for thread-safety | ||
| if (validator == null) { | ||
| final CalciteServerStatement statement; | ||
| try { | ||
| statement = connection.createStatement().unwrap(CalciteServerStatement.class); | ||
| } catch (SQLException e) { | ||
| throw new RuntimeException(e); | ||
| } | ||
| if (operatorTableProvider == null) { | ||
| throw new IllegalStateException( | ||
| "SqlOperatorTableProvider must be set before creating CalcitePlanContext"); | ||
| } | ||
| SqlValidator.Config validatorConfig = | ||
| SqlValidator.Config.DEFAULT | ||
| .withTypeCoercionRules(PplTypeCoercionRule.instance()) | ||
| .withTypeCoercionFactory(PplTypeCoercion::create) | ||
| // Use lenient conformance for PPL compatibility | ||
| .withConformance(OpenSearchSparkSqlDialect.DEFAULT.getConformance()) | ||
| // Use Spark SQL's NULL collation (NULLs sorted LOW/FIRST) | ||
| .withDefaultNullCollation(NullCollation.LOW) | ||
| // This ensures that coerced arguments are replaced with cast version in sql | ||
| // select list because coercion is performed during select list expansion during | ||
| // sql validation. Affects 4356.yml | ||
| // See SqlValidatorImpl#validateSelectList and AggConverter#translateAgg | ||
| .withIdentifierExpansion(true); | ||
| validator = | ||
| PplValidator.create( | ||
| statement, | ||
| config, | ||
| operatorTableProvider.getOperatorTable(), | ||
| TYPE_FACTORY, | ||
| validatorConfig); | ||
| } | ||
| } | ||
| } | ||
| return validator; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find CalciteServerStatement in the codebase
echo "=== Searching for CalciteServerStatement ==="
find . -type f -name "*.java" | xargs grep -l "CalciteServerStatement" | head -20
# Look for the file and class definition
echo -e "\n=== Checking if CalciteServerStatement is from Calcite library ==="
grep -r "class CalciteServerStatement\|interface CalciteServerStatement" --include="*.java" 2>/dev/null | head -5Repository: opensearch-project/sql
Length of output: 403
🏁 Script executed:
#!/bin/bash
# Find CalcitePlanContext to examine the actual implementation
echo "=== Finding CalcitePlanContext file ==="
find . -type f -name "CalcitePlanContext.java"
# Get the complete file content to understand the context
echo -e "\n=== CalcitePlanContext content ==="
cat ./core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.javaRepository: opensearch-project/sql
Length of output: 11109
🏁 Script executed:
#!/bin/bash
# Search for PplValidator.create to understand how statement is used
echo "=== Searching for PplValidator.create ==="
find . -type f -name "*.java" | xargs grep -A 20 "PplValidator.create\|class PplValidator" | head -100Repository: opensearch-project/sql
Length of output: 6457
🏁 Script executed:
#!/bin/bash
# Check how CalciteServerStatement is used elsewhere in the codebase
echo "=== CalciteServerStatement usage patterns ==="
find . -type f -name "*.java" -exec grep -H "CalciteServerStatement\|createStatement" {} \; | grep -v "\.class" | head -50Repository: opensearch-project/sql
Length of output: 3384
🏁 Script executed:
#!/bin/bash
# Get the complete CalciteToolsHelper file
echo "=== CalciteToolsHelper content ==="
cat ./core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.javaRepository: opensearch-project/sql
Length of output: 14929
🏁 Script executed:
#!/bin/bash
# Also check PplValidator to see if it stores statement reference
echo "=== PplValidator.java (full file) ==="
cat ./core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.javaRepository: opensearch-project/sql
Length of output: 8498
🏁 Script executed:
#!/bin/bash
# Check how CalcitePlanContext is created and used - who creates it?
echo "=== Searching for CalcitePlanContext.create() callers ==="
rg -B2 -A2 "CalcitePlanContext\.create\|new CalcitePlanContext" --type java | head -100Repository: opensearch-project/sql
Length of output: 123
🏁 Script executed:
#!/bin/bash
# Search for CalcitePlanContext usage in test and main files
echo "=== CalcitePlanContext usage in codebase ==="
rg "CalcitePlanContext" --type java -B 1 -A 1 | head -150Repository: opensearch-project/sql
Length of output: 15090
🏁 Script executed:
#!/bin/bash
# Look for unit tests of CalcitePlanContext to understand lifecycle
echo "=== CalcitePlanContext test files ==="
find . -type f -name "*CalcitePlanContext*Test*.java" -o -name "*Test*CalcitePlanContext*.java"Repository: opensearch-project/sql
Length of output: 103
🏁 Script executed:
#!/antml:function_calls
# Check if there is a close() method being called on CalcitePlanContext anywhere
echo "=== Searching for CalcitePlanContext close/cleanup ==="
rg "\.close\(\)|try.*CalcitePlanContext|AutoCloseable.*CalcitePlanContext" --type javaRepository: opensearch-project/sql
Length of output: 10796
🏁 Script executed:
#!/bin/bash
# Check how withPrepare is used - does it handle statement cleanup?
echo "=== CalciteToolsHelper.withPrepare usage and statement handling ==="
rg "withPrepare\|perform" core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java -B 2 -A 5
# Check if there's any statement closing elsewhere in related code
echo -e "\n=== Searching for statement management patterns ==="
rg "createStatement\(\)" --type java -B 2 -A 5 core/src/main/java/org/opensearch/sql/calcite/ | head -80Repository: opensearch-project/sql
Length of output: 1995
🏁 Script executed:
#!/bin/bash
# Verify CalciteServerStatement is from Calcite library and check its inheritance
echo "=== Check Calcite imports for CalciteServerStatement ==="
grep -r "import.*CalciteServerStatement" core/src/main/java/org/opensearch/sql/calcite/ --include="*.java"
# Look at how statements created elsewhere are managed (e.g., in tests that use try-with-resources)
echo -e "\n=== Statement resource patterns in tests ==="
rg "try.*createStatement\|createStatement.*try" --type java -B 1 -A 3 integ-test/ | head -50Repository: opensearch-project/sql
Length of output: 558
Address resource leak: Close the statement after validator creation.
The underlying Statement resource created by connection.createStatement() is not being closed. Although CalciteServerStatement itself doesn't implement AutoCloseable, it wraps a JDBC Statement that does hold resources requiring proper cleanup.
The statement is only needed to extract the prepare context via statement.createPrepareContext(). Since it's not stored or reused, close it immediately after validator initialization:
final CalciteServerStatement statement;
try {
statement = connection.createStatement().unwrap(CalciteServerStatement.class);
if (operatorTableProvider == null) {
throw new IllegalStateException(
"SqlOperatorTableProvider must be set before creating CalcitePlanContext");
}
SqlValidator.Config validatorConfig =
SqlValidator.Config.DEFAULT
.withTypeCoercionRules(PplTypeCoercionRule.instance())
.withTypeCoercionFactory(PplTypeCoercion::create)
.withConformance(OpenSearchSparkSqlDialect.DEFAULT.getConformance())
.withDefaultNullCollation(NullCollation.LOW)
.withIdentifierExpansion(true);
validator =
PplValidator.create(
statement,
config,
operatorTableProvider.getOperatorTable(),
TYPE_FACTORY,
validatorConfig);
} catch (SQLException e) {
throw new RuntimeException(e);
} finally {
if (statement != null) {
try {
statement.close();
} catch (SQLException e) {
// log or handle exception
}
}
}Also check CalciteToolsHelper.withPrepare() for the same pattern—it creates a statement at line 137 that should be closed after OpenSearchPrepareImpl().perform() completes.
The double-checked locking implementation with volatile is correct.
🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java around
lines 129-168, the Statement created via
connection.createStatement().unwrap(CalciteServerStatement.class) is never
closed causing a resource leak; after using the statement to create the
validator, ensure the underlying JDBC Statement is closed (e.g., wrap
creation/use in try/finally or try-with-resources so statement.close() is always
called and handle/log any SQLException from close) and keep the existing
exception handling for creation; also review CalciteToolsHelper.withPrepare()
(around line 137) where a statement is created and ensure that statement is
likewise closed after OpenSearchPrepareImpl().perform() completes.
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java (1)
138-144: JavaDoc grammar issue persists.The phrase "in place for" on line 139 should be "in place of" as noted in a previous review. Additionally, the JavaDoc format differs from other methods in this class.
🔎 Proposed fix
/** - * This method should be used in place for {@link SqlTypeUtil#isCharacter(RelDataType)} because - * user-defined types also have VARCHAR as their SqlTypeName. + * Checks whether a {@link RelDataType} represents a character type. + * + * <p>This method should be used in place of {@link SqlTypeUtil#isCharacter(RelDataType)} because + * user-defined types also have VARCHAR as their SqlTypeName but are not character types. + * + * @param type the type to check + * @return true if the type is a character type (excluding UDTs), false otherwise */ public static boolean isCharacter(RelDataType type) {
🧹 Nitpick comments (2)
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CidrMatchFunction.java (1)
57-79: Signature string doesn't reflect all accepted operand types.The
getAllowedSignaturesmethod returns"CIDRMATCH(<IP>, <STRING>)"but the type checker also accepts(STRING, STRING)viaOperandTypes.CHARACTER_CHARACTER.or(...). Consider updating the signature to reflect both accepted forms.🔎 Proposed fix
@Override public String getAllowedSignatures(SqlOperator op, String opName) { - return "CIDRMATCH(<IP>, <STRING>)"; + return "CIDRMATCH(<IP>, <STRING>), CIDRMATCH(<STRING>, <STRING>)"; }core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java (1)
492-509: Consider usingcreateCallinstead of mutating the SqlCall in place.The
rewriteCallimplementation mutates theSqlBasicCallviasetOperator(op)before returning it. While this pattern is consistent with similar operator rewriting inPplConvertletTable, creating a new call viacreateCallwould be more explicit and safer. The codebase usescreateCallfor similar transformation scenarios elsewhere, so adopting this approach would improve consistency.@Override public SqlNode rewriteCall(SqlValidator validator, SqlCall call) { SqlOperator op = call.getOperandList().size() == 2 ? SqlStdOperatorTable.ATAN2 : SqlStdOperatorTable.ATAN; - ((SqlBasicCall) call).setOperator(op); - return call; + return op.createCall(call.getParserPosition(), call.getOperandList()); } };
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.javacore/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.javacore/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.javacore/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/CidrMatchFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/IPFunction.java
🧰 Additional context used
📓 Path-based instructions (4)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/IPFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/CidrMatchFunction.javacore/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.javacore/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.javacore/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.javacore/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/IPFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/CidrMatchFunction.javacore/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.javacore/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.javacore/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.javacore/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java
core/src/main/java/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit
Files:
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/IPFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/CidrMatchFunction.javacore/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.javacore/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.javacore/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.javacore/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java
core/src/main/java/org/opensearch/sql/expression/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/org/opensearch/sql/expression/**/*.java: - New expression implementations must follow existing patterns
- Type handling must be consistent with project type system
- Error handling must use appropriate exception types
- Null handling must be explicit and documented
Files:
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/IPFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/CidrMatchFunction.javacore/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.javacore/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing Calcite integration patterns
- Verify RelNode visitor implementations are complete
- Check RexNode handling follows project conventions
- Validate SQL generation is correct and optimized
- Ensure Calcite version compatibility
- Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.javacore/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java
🧠 Learnings (10)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Applied to files:
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/IPFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/CidrMatchFunction.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.javacore/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : All new business logic requires unit tests
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : Name unit tests with `*Test.java` suffix in OpenSearch SQL
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Encourage comprehensive test coverage in code reviews
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Prefer `Optional<T>` for nullable returns in Java
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Validate all user inputs, especially queries
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
🧬 Code graph analysis (1)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java (1)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java (1)
OpenSearchTypeFactory(62-394)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (21, integration)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (25, doc)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: test-sql-cli-integration (21)
- GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (8)
core/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java (2)
242-293: Well-structured custom type checkers for scalar validation.The
SCALARandSCALAR_OPTIONAL_INTEGERimplementations correctly delegate toOpenSearchTypeUtil.isScalarand follow the established pattern for customSqlOperandTypeCheckerimplementations. The operand count validation before type checking is appropriate defensive programming.
101-156: Good documentation of Calcite schema differences.The inline comments explaining the difference between OpenSearch type system (INTERVAL as STRING) and Calcite SCOTT schema behavior are helpful for maintainability. This follows the coding guideline to document Calcite-specific workarounds.
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/IPFunction.java (1)
50-71: Clean implementation of IP function operand validation.The custom
SqlOperandTypeCheckercorrectly validates that the single operand is either an IP or character type, and properly uses thethrowOnFailureparameter to throw a validation signature error when needed. The signature string accurately reflects both accepted input types.core/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.java (1)
26-62: Clean API simplification.The refactored
wrapmethod provides a simpler interface that delegates all type checking to the wrappedSqlOperandTypeChecker. TheUnsupportedOperationExceptionforparamTypesandparamNamesis appropriate since these methods are not used in the current context, and the comments make this clear.core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java (2)
216-224: Good explicit null handling in isScalar.The
isScalarmethod explicitly handles null input and documents it in the JavaDoc (@param type the type to check; may be null). This is appropriate for this method's use case. Note that other methods in this class don't have null handling, which is acceptable if callers guarantee non-null inputs, but inconsistent.
159-178: Well-documented Calcite workaround for IP type.The JavaDoc clearly explains that
SqlTypeName.OTHERis "borrowed" to represent IP type during validation becauseSqlTypeName.IPdoesn't exist. This follows the coding guideline to document Calcite-specific workarounds. The overload pattern allows callers to opt-in to accepting OTHER when appropriate.core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java (2)
530-541: Parameter type narrowed from SqlOperator to SqlFunction.The parameter type change from
SqlOperatortoSqlFunctionis a narrowing change. Since this is a private method, the impact is limited to this class. This aligns with the broader refactoring converting operators to functions throughout the file.
510-514: Clear explanation for explicit SPARK function declaration.The comment explains the design decision to explicitly declare SPARK dialect functions to avoid operator resolution issues with multiple overrides. This is helpful documentation for future maintainers.
| (OperandTypes.family(SqlTypeFamily.NUMERIC, SqlTypeFamily.STRING)) | ||
| .or(OperandTypes.family(SqlTypeFamily.STRING, SqlTypeFamily.STRING))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent use of SqlTypeFamily.STRING vs SqlTypeFamily.CHARACTER.
Line 78 uses SqlTypeFamily.STRING while the rest of the file consistently uses SqlTypeFamily.CHARACTER for string types (e.g., lines 41, 65-66). SqlTypeFamily.STRING and SqlTypeFamily.CHARACTER are distinct in Calcite—CHARACTER is the standard family for CHAR/VARCHAR.
🔎 Proposed fix
public static final UDFOperandMetadata NUMERIC_STRING_OR_STRING_STRING =
UDFOperandMetadata.wrap(
- (OperandTypes.family(SqlTypeFamily.NUMERIC, SqlTypeFamily.STRING))
- .or(OperandTypes.family(SqlTypeFamily.STRING, SqlTypeFamily.STRING)));
+ (OperandTypes.family(SqlTypeFamily.NUMERIC, SqlTypeFamily.CHARACTER))
+ .or(OperandTypes.family(SqlTypeFamily.CHARACTER, SqlTypeFamily.CHARACTER)));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| (OperandTypes.family(SqlTypeFamily.NUMERIC, SqlTypeFamily.STRING)) | |
| .or(OperandTypes.family(SqlTypeFamily.STRING, SqlTypeFamily.STRING))); | |
| (OperandTypes.family(SqlTypeFamily.NUMERIC, SqlTypeFamily.CHARACTER)) | |
| .or(OperandTypes.family(SqlTypeFamily.CHARACTER, SqlTypeFamily.CHARACTER))); |
🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java
around lines 78 to 79, there's an inconsistent use of SqlTypeFamily.STRING;
replace SqlTypeFamily.STRING with SqlTypeFamily.CHARACTER so the operand family
call matches the rest of the file (use CHARACTER for CHAR/VARCHAR), ensuring all
string-type checks consistently use SqlTypeFamily.CHARACTER.
| return UDFOperandMetadata.wrap( | ||
| new SqlOperandTypeChecker() { | ||
| @Override | ||
| public boolean checkOperandTypes(SqlCallBinding callBinding, boolean throwOnFailure) { | ||
| if (!getOperandCountRange().isValidCount(callBinding.getOperandCount())) { | ||
| return false; | ||
| } | ||
| return OpenSearchTypeUtil.isIp(callBinding.getOperandType(0), true) | ||
| && OpenSearchTypeUtil.isIp(callBinding.getOperandType(1), true); | ||
| } | ||
|
|
||
| @Override | ||
| public SqlOperandCountRange getOperandCountRange() { | ||
| return SqlOperandCountRanges.of(2); | ||
| } | ||
|
|
||
| @Override | ||
| public String getAllowedSignatures(SqlOperator op, String opName) { | ||
| return String.format(Locale.ROOT, "%s(<IP>, <IP>)", opName); | ||
| } | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
rg -n "CompareIpFunction|compare.*IP.*STRING|GREATER_IP|LESS_IP" --type java -g "*Test*.java" -A 3 | head -60Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
fd -t f "*Test*.java" | xargs rg -l "Ip|IP" | head -20Repository: opensearch-project/sql
Length of output: 1964
🏁 Script executed:
cat -n core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.javaRepository: opensearch-project/sql
Length of output: 8471
🏁 Script executed:
find . -type f -name "OpenSearchTypeUtil.java" | head -5Repository: opensearch-project/sql
Length of output: 143
🏁 Script executed:
rg -n "isIp.*acceptOther|public.*isIp.*boolean" --type java -A 15 | head -80Repository: opensearch-project/sql
Length of output: 1941
Class-level documentation claims unsupported STRING operand signatures.
The class Javadoc (lines 42-47) documents signatures (IP, STRING) and (STRING, IP), but the type checker uses isIp(type, true) which only accepts:
- User-defined EXPR_IP type
- SqlTypeName.OTHER
It does not accept VARCHAR/STRING types. While the runtime implementation can convert String objects to ExprIpValue (lines 194-195), these conversions only occur after type validation, which rejects VARCHAR operands. The getAllowedSignatures method correctly reflects the actual type checker behavior.
Update the class Javadoc to remove the STRING signature claims, or modify the type checker to accept VARCHAR types if STRING support is intended.
🤖 Prompt for AI Agents
In
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.java
around lines 127 to 147, the runtime only accepts EXPR_IP/OTHER via
OpenSearchTypeUtil.isIp(type, true) and rejects VARCHAR/STRING, so update the
class Javadoc (earlier Javadoc lines ~42-47) to remove the documented (IP,
STRING) and (STRING, IP) signatures (or alternatively, if STRING support is
intended, change the type checker here to also accept
SqlTypeName.VARCHAR/CHAR/STRING and add tests); ensure the Javadoc signatures
match the actual type checker behavior and, if keeping STRING support, also
move/adjust validation so the runtime string-to-ExprIpValue conversion (around
lines 194-195) is reachable during type checking.
| + " LogicalTableScan(table=[[scott, EMP]])\n"; | ||
| verifyLogical(root, expectedLogical); | ||
|
|
||
| String expectedSparkSql = | ||
| "SELECT `EMPNO`, `COMM`, COALESCE(`COMM`, `EMPNO`, 'fallback') `result`\n" | ||
| "SELECT `EMPNO`, `COMM`, COALESCE(`COMM`, `EMPNO`) `result`\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to check: is this optimized because EMPNO is nonnull?
the rest optimization seems reasonable.
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (4)
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q38.yaml (1)
10-11: Physical plan uses SCRIPT-based pushdown with serialized expressions.The physical plan now uses script-based pushdowns containing base64-encoded RexNode expressions for the SAFE_CAST comparisons. While this aligns with the PR's architecture changes, the opaque base64 payloads reduce the readability and maintainability of this expected output file.
Consider documenting (in a comment or companion file) what these serialized scripts represent, or providing a utility to decode them for debugging purposes. This would help future maintainers verify correctness when the expected output changes.
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q39.yaml (1)
13-13: Recommend verifying SCRIPT pushdown correctness and performance.The physical plan now uses SCRIPT-based pushdowns with serialized, base64-encoded operator trees. This raises several concerns:
Maintainability: The encoded script payloads are not human-readable, making debugging pushdown issues difficult.
Performance: OpenSearch script-based filters are typically slower than native filters. Three separate script clauses for SAFE_CAST operations may impact query performance, especially on large datasets.
Correctness: The test expectation validates plan shape but not execution results. Ensure:
- The actual test verifies query results match expected values
- Edge cases (malformatted data, NULL handling) are tested
- Timezone handling in EXPR_TIMESTAMP VARCHAR is correct (note the UTC "Z" suffix in the JSON range query)
Consider:
- Adding integration tests that validate query results, not just plan shape
- Benchmarking query performance compared to the previous FILTER-based approach
- Verifying the script execution produces correct results with representative data
Based on learnings, test SQL generation and optimization paths for Calcite integration changes.
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q30.yaml (1)
1-9: Ensure test coverage for validation pipeline edge cases across the suite.This test file correctly follows the expected YAML format and represents a realistic clickbench query with a high field count (90 fields). As per the coding guidelines for test resources, verify that the broader test suite covers edge cases and boundary conditions for the new Calcite validation pipeline, such as:
- Queries with various field index patterns and ranges
- Type coercion scenarios (UDTs, cross-datetime comparisons, SAFE_CAST)
- Preservation of hints, sort orders, and null directions across RelNode↔SqlNode conversion
- Operator overload resolution (IP comparator, ATAN, ADD ambiguity)
Based on learnings: Test SQL generation and optimization paths for Calcite integration changes.
integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_field_list_max_option.yaml (1)
1-20: Consider additional test cases for edge cases.This test validates the max=1 case with inner join and proper pushdown optimization. To improve test coverage for the max option feature, consider adding test cases for:
- Different max values (e.g., max=2, max=10)
- Join without max option to validate baseline behavior
- Edge case where right side has no matches (empty result)
- Comparing against the SEMI/ANTI join behavior where max has no effect (as noted in learnings)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (71)
core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.javainteg-test/src/test/resources/expectedOutput/calcite/agg_composite_date_range_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/composite_date_histogram_daily.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/composite_terms.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/composite_terms_keyword.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/date_histogram_minute_agg.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/multi_terms_keyword.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/terms_significant_1.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/terms_significant_2.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_null_str.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_use_other.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_with_integer_span.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q19.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q2.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q30.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q37.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q38.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q39.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q40.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q41.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q42.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q43.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q8.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_join4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_buckets_not_pushed.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_agg_with_script.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_chart_multi_group_key.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_chart_single_group_key.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_complex_dedup.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push5.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_count_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex2.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_pushdown.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_pushdown.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_rename.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_filter_agg_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_filter_with_search.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_filter_with_search_call.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_join_with_field_list_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown3.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown_bucket_nullable1.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yaml
✅ Files skipped from review due to trivial changes (7)
- integ-test/src/test/resources/expectedOutput/calcite/explain.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_filter_with_search_call.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_chart_single_group_key.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_chart_multi_group_key.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_count_push.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_rename.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown3.yaml
🚧 Files skipped from review as they are similar to previous changes (27)
- integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_filter_agg_push.yaml
- integ-test/src/test/resources/expectedOutput/calcite/clickbench/q19.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex2.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push5.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_filter_with_search.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yaml
- integ-test/src/test/resources/expectedOutput/calcite/big5/date_histogram_minute_agg.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml
- integ-test/src/test/resources/expectedOutput/calcite/clickbench/q8.yaml
- integ-test/src/test/resources/expectedOutput/calcite/clickbench/q43.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_pushdown.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yaml
- integ-test/src/test/resources/expectedOutput/calcite/big5/terms_significant_2.yaml
- integ-test/src/test/resources/expectedOutput/calcite/clickbench/q40.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yaml
- integ-test/src/test/resources/expectedOutput/calcite/big5/composite_date_histogram_daily.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative.yaml
- integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms.yaml
- integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms_keyword.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_join4.yaml
- integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
🧰 Additional context used
📓 Path-based instructions (4)
integ-test/src/test/resources/**/*
⚙️ CodeRabbit configuration file
integ-test/src/test/resources/**/*: - Verify test data is realistic and representative
- Check data format matches expected schema
- Ensure test data covers edge cases and boundary conditions
Files:
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q38.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q42.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q39.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_buckets_not_pushed.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown_bucket_nullable1.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q37.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/multi_terms_keyword.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_null_str.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_complex_dedup.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q41.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q30.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_pushdown.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_join_with_field_list_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q2.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_use_other.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yamlinteg-test/src/test/resources/expectedOutput/calcite/agg_composite_date_range_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/terms_significant_1.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_with_integer_span.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_agg_with_script.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
core/src/main/java/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit
Files:
core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing Calcite integration patterns
- Verify RelNode visitor implementations are complete
- Check RexNode handling follows project conventions
- Validate SQL generation is correct and optimized
- Ensure Calcite version compatibility
- Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q38.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q42.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q39.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_buckets_not_pushed.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown_bucket_nullable1.yamlcore/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.javainteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q37.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/multi_terms_keyword.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_null_str.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_complex_dedup.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q41.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q30.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_pushdown.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_join_with_field_list_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q2.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_use_other.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yamlinteg-test/src/test/resources/expectedOutput/calcite/agg_composite_date_range_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q41_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/terms_significant_1.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_with_integer_span.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_agg_with_script.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.
Applied to files:
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_null_str.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_join_with_field_list_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_use_other.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_with_integer_span.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.javainteg-test/src/test/resources/expectedOutput/calcite/chart_null_str.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_use_other.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: In opensearch-project/sql, when creating custom Calcite RelNode classes that extend EnumerableLimitSort or other Calcite RelNode types, always override the `copy` method. Without overriding copy, the class will downgrade to its parent class type during copy operations, losing the custom implementation.
Applied to files:
integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Applied to files:
integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: security-it-linux (25)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: security-it-linux (21)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: CodeQL-Scan (java)
- GitHub Check: test-sql-cli-integration (21)
| physical: | | ||
| CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=LogicalProject#,group={0},agg#0=LITERAL_AGG(1)), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"gender":{"terms":{"field":"gender.keyword","missing_bucket":false,"order":"asc"}}}]},"aggregations":{"$f1":{"top_hits":{"from":0,"size":1,"version":false,"seq_no_primary_term":false,"explain":false,"_source":{"includes":["gender","account_number","firstname","address","balance","city","employer","state","age","email","lastname"],"excludes":[]},"script_fields":{}}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find test files that reference this expected output
rg -n "explain_complex_dedup" --type=java -C 3 integ-test/Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# List the directory structure for expected output files
ls -la integ-test/src/test/resources/expectedOutput/calcite/ | head -20Repository: opensearch-project/sql
Length of output: 1703
🏁 Script executed:
#!/bin/bash
# Read the YAML file to verify it's valid and examine its structure
cat -n integ-test/src/test/resources/expectedOutput/calcite/explain_complex_dedup.yamlRepository: opensearch-project/sql
Length of output: 1740
🏁 Script executed:
#!/bin/bash
# Check if there are related dedup test files
fd -e yaml integ-test/src/test/resources/expectedOutput/calcite/ | grep -i dedupRepository: opensearch-project/sql
Length of output: 583
🏁 Script executed:
#!/bin/bash
# Search for test files that reference expectedOutput directory or YAML files
rg -n "expectedOutput" --type=java integ-test/ | head -20Repository: opensearch-project/sql
Length of output: 2964
🏁 Script executed:
#!/antml:bin/bash
# Search for files referencing YAML test data patterns
rg -n "\.yaml" --type=java integ-test/src/test/ | grep -i explain | head -10Repository: opensearch-project/sql
Length of output: 1477
🏁 Script executed:
#!/bin/bash
# Look for test configuration files or test data loaders
find integ-test/src/test -type f -name "*Test.java" | head -5 | xargs rg -l "explain"Repository: opensearch-project/sql
Length of output: 50
🏁 Script executed:
#!/bin/bash
# Check the structure of test files to understand how expected outputs are loaded
find integ-test/src/test/java -type f -name "*Test.java" | head -1 | xargs cat | head -50Repository: opensearch-project/sql
Length of output: 1935
🏁 Script executed:
#!/bin/bash
# Search more broadly for "dedup" in test files
rg -n "dedup\|complex_dedup" --type=java integ-test/src/test/Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Search in PPLIntegTestCase to understand how YAML files are loaded and mapped
rg -A 10 -B 5 "loadExpectedPlan\|loadFromFile" integ-test/src/test/java/org/opensearch/sql/ppl/PPLIntegTestCase.java | head -40Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check what test classes exist in PPL test directory
ls -la integ-test/src/test/java/org/opensearch/sql/ppl/*IT.java | head -10Repository: opensearch-project/sql
Length of output: 1264
🏁 Script executed:
#!/bin/bash
# Look for Calcite-specific SQL tests
fd -e java integ-test/src/test/java/org/opensearch/sql/sql/ | head -5Repository: opensearch-project/sql
Length of output: 563
🏁 Script executed:
#!/bin/bash
# Examine DedupCommandIT.java to understand dedup test structure
cat -n integ-test/src/test/java/org/opensearch/sql/ppl/DedupCommandIT.java | head -100Repository: opensearch-project/sql
Length of output: 4058
🏁 Script executed:
#!/bin/bash
# Search for SQL test files (not PPL)
fd --full-path 'integ-test/src/test/java/org/opensearch/sql/sql/' -e java | head -10Repository: opensearch-project/sql
Length of output: 767
🏁 Script executed:
#!/bin/bash
# Look for CalciteExplain or similar test classes
rg -n "CalciteExplain\|ExplainCalcite" --type=java integ-test/Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check if there are any tests that directly reference Calcite explain outputs
rg -n "calcite/explain" --type=java integ-test/src/test/ | head -20Repository: opensearch-project/sql
Length of output: 2203
🏁 Script executed:
#!/bin/bash
# Search for explain_complex_dedup in CalciteExplainIT.java
rg -n "explain_complex_dedup" integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java -A 5 -B 5Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check if there are YAML files in calcite expectedOutput directory
fd . 'integ-test/src/test/resources/expectedOutput/calcite/' --type f | grep -i dedupRepository: opensearch-project/sql
Length of output: 3013
🏁 Script executed:
#!/bin/bash
# List all dedup-related YAML files in expectedOutput
find integ-test/src/test/resources/expectedOutput -name "*dedup*" -type fRepository: opensearch-project/sql
Length of output: 3573
🏁 Script executed:
#!/bin/bash
# Look at the CalciteExplainIT.java file to find dedup test methods
rg -n "dedup\|Dedup" integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java -B 2 -A 10Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Examine CalciteExplainIT.java to understand how expected output files are loaded
cat -n integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java | grep -A 20 "loadFromFile\|@Test" | head -100Repository: opensearch-project/sql
Length of output: 4771
🏁 Script executed:
#!/bin/bash
# Check if there's a parameterized test or data provider mechanism
rg -n "@ParameterizedTest\|@ValueSource\|@CsvSource\|arguments(" integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java | head -20Repository: opensearch-project/sql
Length of output: 228
🏁 Script executed:
#!/bin/bash
# Look at the full CalciteExplainIT.java file to find dedup-related tests
wc -l integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javaRepository: opensearch-project/sql
Length of output: 151
🏁 Script executed:
#!/bin/bash
# Search for test method names that might match the expected output files
rg -n "testDedup\|test.*dedup" integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java -iRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Search for references to any of the dedup expected output files in the test files
rg -n "explain_dedup_push\|explain_complex_dedup\|explain_dedup_complex" integ-test/src/test/java/ --type=java | head -20Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check the CalciteExplainIT.java file structure - look for all @Test methods
rg -n "@Test" integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java | wc -lRepository: opensearch-project/sql
Length of output: 69
🏁 Script executed:
#!/bin/bash
# Search for test data providers or parameterized test implementations
rg -n "Stream|stream|arguments" integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java | head -20Repository: opensearch-project/sql
Length of output: 1799
🏁 Script executed:
#!/bin/bash
# Check if there's a parent class or superclass that might contain the dedup tests
grep -n "class CalciteExplainIT\|extends" integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java | head -5Repository: opensearch-project/sql
Length of output: 118
🏁 Script executed:
#!/bin/bash
# Check the parent class ExplainIT for dedup test methods
rg -n "testDedup\|test.*[Dd]edup\|explain_complex_dedup" integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.java -iRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Look for where dedup tests are parameterized or data-driven
rg -n "explain_dedup\|DedupProvider\|dedupProvider" integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.java -iRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check if there's a common test base class that provides parameterized tests
find integ-test/src/test/java -name "PPLIntegTestCase.java" -o -name "ExplainIT.java" | xargs grep -l "Stream.of\|@MethodSource"Repository: opensearch-project/sql
Length of output: 50
🏁 Script executed:
#!/bin/bash
# Search for explain test methods that are data-driven in ExplainIT
rg -n "@MethodSource|@ParameterizedTest|arguments" integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.javaRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Search all Java test files for any reference to the dedup expected output files
rg -n "explain_dedup\|explain_complex_dedup" integ-test/src/test/java/ --type=javaRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Look at the parent class ExplainIT to understand its structure
head -100 integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.javaRepository: opensearch-project/sql
Length of output: 3784
🏁 Script executed:
#!/bin/bash
# Check the PPLIntegTestCase base class for test data loading mechanisms
grep -n "loadFromFile\|loadExpectedPlan\|Stream.of\|@ParameterizedTest" integ-test/src/test/java/org/opensearch/sql/ppl/PPLIntegTestCase.java | head -30Repository: opensearch-project/sql
Length of output: 253
🏁 Script executed:
#!/bin/bash
# Check if there's a test discovery mechanism based on file patterns
find integ-test/src/test/java -type f -name "*.java" -exec grep -l "Files.list\|Files.walk\|explain_dedup" {} \;Repository: opensearch-project/sql
Length of output: 279
🏁 Script executed:
#!/bin/bash
# Look for method that invokes the test with explain_complex_dedup
rg -n "testDedup\|complex_dedup" integ-test/src/test/java/org/opensearch/sql/ppl/ -iRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check if tests are discovered dynamically - look for file scanning in base test class
grep -n "Files\|Path\|Stream\|discover" integ-test/src/test/java/org/opensearch/sql/ppl/PPLIntegTestCase.java | head -20Repository: opensearch-project/sql
Length of output: 192
🏁 Script executed:
#!/bin/bash
# Check the full structure of ExplainIT to see if there are additional test methods after line 100
sed -n '100,200p' integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.javaRepository: opensearch-project/sql
Length of output: 3774
🏁 Script executed:
#!/bin/bash
# Look for any test that references the gender field and dedup together (like in the YAML)
rg -n "gender\|dedup" integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.java -B 2 -A 5 | head -50Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Look at the entire PPLIntegTestCase to understand the test infrastructure
cat -n integ-test/src/test/java/org/opensearch/sql/ppl/PPLIntegTestCase.java | tail -100Repository: opensearch-project/sql
Length of output: 4326
🏁 Script executed:
#!/bin/bash
# Search for parameterized test methods in CalciteExplainIT
sed -n '1,500p' integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java | grep -A 10 "@ParameterizedTest\|@MethodSource\|Stream.of"Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Look for any test that loads YAML files dynamically or uses dedup + gender
rg -n "Stream.of\|@ParameterizedTest\|@MethodSource" integ-test/src/test/java/org/opensearch/sql/calcite/ --type=javaRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check if CalcitePPLExplainIT might have the dedup tests
grep -n "dedup\|Dedup" integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLExplainIT.java -iRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Search for test methods in CalciteExplainIT that use dedup in their queries
rg -n "dedup" integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java -B 5 -A 5Repository: opensearch-project/sql
Length of output: 9557
🏁 Script executed:
#!/bin/bash
# Look for any class that might be implementing parameterized tests for dedup scenarios
find integ-test/src/test/java -type f -name "*.java" -exec grep -l "explain_dedup\|explain_complex_dedup" {} \;Repository: opensearch-project/sql
Length of output: 209
🏁 Script executed:
#!/antml:bin/bash
# Check if the YAML test files are used by inspecting files that reference them
find integ-test -type f -name "*.java" -exec grep -l "explain_dedup_push\|explain_complex_dedup" {} \;Repository: opensearch-project/sql
Length of output: 128
🏁 Script executed:
#!/bin/bash
# Look at the complete CalciteExplainIT for any tests with dedup queries
rg -n "source=.*dedup|dedup.*|DEDUP" integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java -i -B 3 -A 3Repository: opensearch-project/sql
Length of output: 9362
🏁 Script executed:
#!/bin/bash
# Search specifically for the explain_complex_dedup.yaml file reference in test code
rg -n "explain_complex_dedup" integ-test/src/test/java/ --type=javaRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Look for all test methods that load dedup complex expected outputs to understand the pattern
grep -n "explain_dedup_complex\|testComplexDedup" integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java -B 2 -A 15Repository: opensearch-project/sql
Length of output: 1650
🏁 Script executed:
#!/bin/bash
# Check if explain_complex_dedup.yaml is actually used or if it's an orphaned file
find integ-test -type f \( -name "*.java" -o -name "*.kt" -o -name "*.scala" \) -exec grep -l "explain_complex_dedup\.yaml\|explain_complex_dedup" {} \;Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Search for any test that might use a data provider pattern or file discovery for dedup tests
rg -n "@Test.*[Dd]edup|testDedup\|testComplexDedup" integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java -B 5 -A 20 | head -60Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check the Git history or file metadata for when this file was added
cd integ-test/src/test/resources/expectedOutput/calcite && ls -ltr *dedup*.yaml | tail -5Repository: opensearch-project/sql
Length of output: 446
🏁 Script executed:
#!/bin/bash
# Double-check by searching with different patterns - maybe it's used in a different way
grep -r "complex_dedup\|explain_complex_dedup" integ-test/src/test/ --include="*.java"Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Look at the full test method signature for testComplexDedup to ensure we have context
sed -n '2014,2035p' integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javaRepository: opensearch-project/sql
Length of output: 1139
🏁 Script executed:
#!/antml:bin/bash
# Check if there's an alternative test class or integration test that might use this
find integ-test -type f -name "*.java" | xargs grep -l "ComplexDedup\|complex.dedup" 2>/dev/nullRepository: opensearch-project/sql
Length of output: 146
This expected output file is orphaned and not covered by any test.
The explain_complex_dedup.yaml file exists in the test resources but is never referenced by any test method. The testComplexDedup() method in CalciteExplainIT.java (line 2014) uses explain_dedup_complex1.yaml, explain_dedup_complex2.yaml, explain_dedup_complex3.yaml, and explain_dedup_complex4.yaml instead.
Either remove this unused file or add a corresponding test case that validates the physical plan against actual Calcite query generation.
🤖 Prompt for AI Agents
integ-test/src/test/resources/expectedOutput/calcite/explain_complex_dedup.yaml
(lines 9-10): this resource file is orphaned (no tests reference it) so either
delete it or add a test that validates its contents; to fix, either 1) remove
the file from the resources directory and update any resource index if present,
or 2) add a test reference in CalciteExplainIT.java (around line ~2014 where
testComplexDedup() is defined) by including this file alongside the other
explain_dedup_complex*.yaml files (or create a new test method that loads the
SQL used for complex dedup, generates the physical plan, and asserts that the
produced plan matches explain_complex_dedup.yaml), then run the test suite to
ensure the new/updated test passes and commit the resource+test change.
| calcite: | ||
| logical: | | ||
| LogicalSystemLimit(sort0=[$1], sort1=[$3], dir0=[ASC-nulls-first], dir1=[DESC-nulls-last], fetch=[10000], type=[QUERY_SIZE_LIMIT]) | ||
| LogicalProject(account_number=[$0], gender=[$1], age=[$2], state=[$3], new_gender=[$4], new_state=[$5]) | ||
| LogicalFilter(condition=[<=($6, SAFE_CAST(2:BIGINT))]) | ||
| LogicalProject(account_number=[$0], gender=[$1], age=[$2], state=[$3], new_gender=[$4], new_state=[$5], _row_number_dedup_=[ROW_NUMBER() OVER (PARTITION BY $4, $5)]) | ||
| LogicalFilter(condition=[AND(IS NOT NULL($4), IS NOT NULL($5))]) | ||
| LogicalSort(sort0=[$1], sort1=[$3], dir0=[ASC-nulls-first], dir1=[DESC-nulls-last]) | ||
| LogicalProject(account_number=[$0], gender=[$4], age=[$8], state=[$7], new_gender=[LOWER($4)], new_state=[LOWER($7)]) | ||
| CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]]) | ||
| physical: | | ||
| CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=LogicalProject#,group={0, 1},agg#0=LITERAL_AGG(2)), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"new_gender":{"terms":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["gender.keyword"]}},"missing_bucket":false,"order":"asc"}}},{"new_state":{"terms":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["state.keyword"]}},"missing_bucket":false,"order":"asc"}}}]},"aggregations":{"$f2":{"top_hits":{"from":0,"size":2,"version":false,"seq_no_primary_term":false,"explain":false,"_source":{"includes":["account_number","gender","age","state"],"excludes":[]},"script_fields":{"new_gender":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["gender.keyword"]}},"ignore_failure":false},"new_state":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["state.keyword"]}},"ignore_failure":false}}}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find the test file that uses this expected output
fd "explain_dedup" --type fRepository: opensearch-project/sql
Length of output: 3405
🏁 Script executed:
#!/bin/bash
# Search for tests related to explain and dedup functionality
fd "explain" integ-test/src/test -type f | head -20Repository: opensearch-project/sql
Length of output: 236
🏁 Script executed:
#!/bin/bash
# Look for serialization/deserialization tests in the codebase
find . -name "*Test.java" -type f | xargs grep -l "serialize\|deserialize" | head -10Repository: opensearch-project/sql
Length of output: 1076
🏁 Script executed:
#!/bin/bash
# Check if there are utilities that generate or validate these expected output files
rg -l "base64\|serialize" --type java | grep -i test | head -15Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Find the test file that uses these expected output YAML files
rg "explain_dedup_expr\.yaml" --type javaRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Search for the test class that handles these YAML comparisons
rg -l "expectedOutput.*calcite.*yaml" integ-test/src/test -A 5Repository: opensearch-project/sql
Length of output: 146
🏁 Script executed:
#!/bin/bash
# Look for how YAML expected outputs are loaded and compared
rg "base64\|rO0AB" integ-test/src/test/resources -A 2 -B 2 | head -50Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Find test class that handles EXPLAIN DEDUP
fd ".*Test.*\.java" integ-test/src/test | xargs rg -l "DEDUP" | head -5Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Examine the CalciteExplainIT test file
cat integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java | head -200Repository: opensearch-project/sql
Length of output: 7974
🏁 Script executed:
#!/bin/bash
# Look for how the test loads and compares YAML files
rg "explain_dedup" integ-test/src/test/java/org/opensearch/sql/calcite -B 5 -A 5 | head -100Repository: opensearch-project/sql
Length of output: 12616
🏁 Script executed:
#!/bin/bash
# Check if there's a test data generator or snapshot mechanism
fd ".*\\.java" integ-test/src/test | xargs rg -l "expectedOutput\|snapshot\|yaml" | head -10Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Look at the parent class ExplainIT to understand how YAML is loaded and compared
rg "loadExpectedPlan\|assertYamlEqualsIgnoreId" integ-test/src/test -A 3 -B 1 | head -80Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check for a base test class that handles YAML loading
fd "ExplainIT.java" integ-test/src/testRepository: opensearch-project/sql
Length of output: 434
🏁 Script executed:
#!/bin/bash
# Look for how expected output files are loaded
rg "expectedOutput\|LoadFile\|loadExpected" integ-test/src/test/java -A 2 | head -60Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check if the explain_dedup_expr.yaml file is actually used in any test (not expr1, expr2, etc.)
rg "explain_dedup_expr\.yaml" integ-test/src/test/javaRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Search more broadly for any reference to explain_dedup_expr.yaml
rg "explain_dedup_expr" --type yaml --type java | grep -v "explain_dedup_expr[0-9]" | head -20Repository: opensearch-project/sql
Length of output: 709
🏁 Script executed:
#!/bin/bash
# Check the parent test class ExplainIT
cat integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.java | grep -A 20 "loadExpectedPlan\|assertYamlEqualsIgnoreId" | head -50Repository: opensearch-project/sql
Length of output: 1627
🏁 Script executed:
#!/bin/bash
# Look for the actual test method that might use explain_dedup_expr.yaml
rg "testDedupExpr|test.*[Dd]edup" integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java -A 15 | head -100Repository: opensearch-project/sql
Length of output: 4926
🏁 Script executed:
#!/bin/bash
# Final verification: search ALL test files for exact reference to explain_dedup_expr.yaml
find integ-test/src/test/java -name "*.java" -exec grep -l "explain_dedup_expr\.yaml" {} \;Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Also check git history to see if this file was recently added and if there was a test for it
git log --oneline --all -- "integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr.yaml" | head -10Repository: opensearch-project/sql
Length of output: 102
🏁 Script executed:
#!/bin/bash
# Check if the file was in the original state of the repository when added
git log -p --follow -- "integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr.yaml" | head -100Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check the commit that added this file to understand context
git show ff91bdc25 --stat | head -50Repository: opensearch-project/sql
Length of output: 2933
🏁 Script executed:
#!/bin/bash
# Look for any test that might reference the file without explicit mention
rg "loadExpectedPlan" integ-test/src/test/java | grep -c "expr"Repository: opensearch-project/sql
Length of output: 68
🏁 Script executed:
#!/bin/bash
# Check the exact state of the file in the repository
ls -lh integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr.yamlRepository: opensearch-project/sql
Length of output: 191
🏁 Script executed:
#!/bin/bash
# Verify if this is a newly added file in this PR by checking PR context
git diff --name-only HEAD~1 HEAD | grep "explain_dedup_expr"Repository: opensearch-project/sql
Length of output: 252
🏁 Script executed:
#!/bin/bash
# Check if there's any parametrized or dynamic test loading mechanism
rg "@Parameterized|@ParameterizedTest|@CsvSource" integ-test/src/test/java | grep -i dedupRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check the exact content of explain_dedup_expr.yaml to understand what query it represents
head -15 integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr.yamlRepository: opensearch-project/sql
Length of output: 3928
🏁 Script executed:
#!/bin/bash
# Search for any test method that might use a dynamic file name pattern
rg "testDedupExpr\(\)" integ-test/src/test/java -A 50 | head -80Repository: opensearch-project/sql
Length of output: 9941
🏁 Script executed:
#!/bin/bash
# Check if there's a parent test class method that uses this file
rg "explain_dedup_expr" integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.javaRepository: opensearch-project/sql
Length of output: 48
Remove the unused test resource file or add the corresponding test.
The file explain_dedup_expr.yaml is not referenced by any test method. All dedup expression tests use the numbered variants (explain_dedup_expr1.yaml through explain_dedup_expr4.yaml with alternatives), but this base file has no corresponding test case. Either add a test that uses this expected output or remove the file if it's not needed.
🤖 Prompt for AI Agents
In integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr.yaml
lines 1-12: the file is an unused expected-output resource (no test references
explain_dedup_expr.yaml); either delete this file from the repo to remove dead
test data, or add a corresponding test case that loads this expected output
(follow the naming/structure of the existing explain_dedup_expr1..4 tests),
register the test in the test suite so it runs in CI, and verify the test
passes; commit only one of these fixes so the repository no longer contains an
unreferenced expected-output file.
| LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT]) | ||
| LogicalProject(account_number=[$0], firstname=[$1], address=[$2], balance=[$3], gender=[$4], city=[$5], employer=[$6], state=[$7], age=[$8], email=[$9], lastname=[$10], new_gender=[$17], new_state=[$18]) | ||
| LogicalFilter(condition=[<=($19, 2)]) | ||
| LogicalFilter(condition=[<=($19, SAFE_CAST(2:BIGINT))]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, check if the file exists and view context around line 5
head -20 integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yamlRepository: opensearch-project/sql
Length of output: 3284
🏁 Script executed:
# Run the pattern search from the review comment to check consistency
rg -n 'SAFE_CAST\(\d+:(?:BIGINT|INTEGER|TINYINT|SMALLINT)\)' integ-test/src/test/resources/expectedOutput/calcite/ | head -30Repository: opensearch-project/sql
Length of output: 5001
🏁 Script executed:
# Also check for SAFE_CAST usage more broadly to understand the pattern
rg -n 'SAFE_CAST' integ-test/src/test/resources/expectedOutput/calcite/ | wc -lRepository: opensearch-project/sql
Length of output: 69
🌐 Web query:
Apache Calcite SAFE_CAST type coercion validation when applied to literal constants
💡 Result:
Summary — how SAFE_CAST interacts with validation/coercion for literals in Calcite
- SAFE_CAST is implemented as a separate operator that returns NULL on conversion failure (Calcite SQL reference). [1]
- During validation Calcite may perform implicit type coercion and literal-type inference (TypeCoercion / validator). If a literal can be coerced to the target type during validation, that coercion can be applied before or instead of leaving the literal for a runtime SAFE_CAST. In other words, literals are subject to the same validator/coercion rules as other expressions. [2][3]
- Because SAFE_CAST must return NULL on runtime conversion failure, Calcite’s simplifying/transformation logic must avoid applying CAST simplifications that would change SAFE_CAST semantics. The project notes and issue discussion call out the need to treat SAFE_CAST specially (do not accidentally simplify like normal CAST). [4][5]
- There have been related bugfixes (e.g. CALCITE-5557 adding SAFE_CAST, and CALCITE-6117 about converting SAFE_CAST between RexCall/SqlCall and preserving type info) showing the implementation and unparse/validation edge cases have been addressed in recent releases. Check those issues if you see behavior where literal coercion or simplification alters SAFE_CAST results. [4][5]
If you have a specific example SQL (literal + SAFE_CAST) that behaves unexpectedly, paste it and I’ll analyze why Calcite’s validator/coercion or simplifier is producing that result and point to the exact code path or bug.
Sources:
[1] Calcite SQL reference (SAFE_CAST).
[2] Calcite TypeCoercion / TypeCoercionImpl docs.
[3] Calcite SQL language docs — implicit coercion / literal handling.
[4] CALCITE-5557 (Add SAFE_CAST).
[5] CALCITE-6117 (SAFE_CAST RexCall→SqlCall issue).
SAFE_CAST on literal integer constants should be optimized away by the validation layer.
This pattern is confirmed across 116 instances in the test files: literal integers like 2, 1, 30, 40, 10000 are systematically wrapped in SAFE_CAST(N:BIGINT). While this preserves semantic correctness (SAFE_CAST returns NULL on conversion failure, but literal integers cannot fail), it appears to be an optimization opportunity. Calcite's validator should simplify away SAFE_CAST wrapping literals that are guaranteed to convert successfully, especially since the PR focuses on validation and optimization paths. Consider whether the type coercion logic should include a simplification pass to eliminate unnecessary SAFE_CAST on compile-time constants.
penghuo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Description
Previously, we have implemented our own type checking mechanism due to a lacking of SQL validation phase, which relies on
SqlNode, and compatibility issues with user-defined types.With this PR, I try to solve these problems and migrate to Calcite's type checking mechanism by introducing a SqlNode layer. This is the Approach 1 described in #3865
It has the following advantages:
sqrt(x)by replacing it withpow(x, 0.5)in the SqlNode layer.Please feel free to ask why do you change this and why does that no longer match during review!
Work items:
SqlTypeName, therefore there is no space for UDT there. I did the trick by replacing the operand types of a function call before type checking and restore the types after type checking. E.g. EXPR_TIME -> SqlTypeName.TIME (1090/1599 ITs passed)SqlTypeName.TIMESTAMP). However, when the return type is related to the argument type, this will lead the return type to be sql type instead of UDTs, leaving to an incorrect plan and validation failures. We need to find a way that only use UDT at the top level, using SQL types in intermediate steps.Fixed by overriding
commonTypeForBinaryComparison(1240/1599 ITs passed)IPtype in calcite. For the purpose of type checking, I may have to use an existing sql type to fake it.ATANfunction has a one-operator and a two-operator version: define a new operator that rewrites sql call conditionallyADDcan be string concatenation and number addition - fixed by resolve call conditionally in FuncImpTableLOG(a, b)testRand)reducefunction's type inference is incomplete -- it will derive return types in bothSqlNodeandRelNodelevel, ensuring that they are the same. The current implementation works only at theRelNodeleveltestStatsWithBinsOnTimeAndTermField_AvgandtestStatsWithBinsOnTimeAndTermField_CounttestStatsWithBinsOnTimeField_CountandtestStatsWithBinsOnTimeField_Avgbucket_nullablecorrectlyrel node -> sql node -> rel noderound-trip conversion: fixed by addingSqlHintafter convertingLogicalAggregationRelated Issues
Resolves #4636, resolves #3865
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.