-
Notifications
You must be signed in to change notification settings - Fork 181
Mvexpand feature #4944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Mvexpand feature #4944
Conversation
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
…it supposed to Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
docs/user/ppl/cmd/mvexpand.md (1)
7-9: Add language specifier to syntax code block for consistency.The fenced code block at line 7 is missing a language identifier, unlike the other example code blocks in the file (lines 24, 40, 54, 64, 77) which use
text. This creates a minor formatting inconsistency.## Syntax -``` +```text mvexpand <field> [limit=<int>]</blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: Path: .coderabbit.yaml **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 44814ab0922099584138430aab7a0e704d0c65a9 and f9dd692b38579ba69040a53247c6de3600cc2b6d. </details> <details> <summary>📒 Files selected for processing (3)</summary> * `docs/category.json` (2 hunks) * `docs/user/ppl/cmd/mvexpand.md` (1 hunks) * `docs/user/ppl/index.rst` (1 hunks) </details> <details> <summary>🚧 Files skipped from review as they are similar to previous changes (1)</summary> * docs/user/ppl/index.rst </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🧠 Learnings (2)</summary> <details> <summary>📓 Common learnings</summary>Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns inCalciteRelNodeVisitorandCalciteRexNodeVisitorfor Calcite integration</details> <details> <summary>📚 Learning: 2025-12-02T17:27:55.938Z</summary>Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: All tests must pass before merge; fix or ask for guidance if blocked**Applied to files:** - `docs/category.json` </details> </details><details> <summary>🪛 markdownlint-cli2 (0.18.1)</summary> <details> <summary>docs/user/ppl/cmd/mvexpand.md</summary> 7-7: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> </details> <details> <summary>🔇 Additional comments (2)</summary><blockquote> <details> <summary>docs/category.json (1)</summary><blockquote> `30-30`: **✅ mvexpand entry is correctly placed and formatted.** The mvexpand documentation entry is now properly located in the `ppl_cli_calcite` section with the `.md` extension, maintaining alphabetical ordering between `multisearch.md` and `replace.md`. Previous merge conflict issues have been resolved, and the JSON is syntactically valid. </blockquote></details> <details> <summary>docs/user/ppl/cmd/mvexpand.md (1)</summary><blockquote> `1-98`: **✅ Documentation is comprehensive and covers edge cases well.** The mvexpand documentation clearly describes the command purpose, provides syntax with required and optional parameters, and includes five well-structured examples covering key scenarios: basic expansion, limit parameter, empty arrays, null arrays, single-value arrays, and missing fields. This aligns with the PR objectives and provides users with clear guidance on command behavior. </blockquote></details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/user/ppl/cmd/mvexpand.md(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/user/ppl/cmd/mvexpand.md
7-7: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (1)
docs/user/ppl/cmd/mvexpand.md (1)
1-125: Comprehensive documentation with good edge-case coverage.The documentation is well-structured and covers the key mvexpand requirements from the PR objectives: basic expansion, optional limit parameter, and important edge cases (empty arrays, null arrays, single-element arrays, missing fields). The examples use clear tabular output formatting and stable doctest methodology (filtering by
case=<name>).
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
Show resolved
Hide resolved
Signed-off-by: Srikanth Padakanti <srikanth29.9@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java (1)
94-94: Add JavaDoc for visitor method.Per coding guidelines, public methods must have JavaDoc documentation with @param and @return tags. A previous review already flagged this same issue.
Based on coding guidelines, all public classes and methods must have proper JavaDoc.
Also applies to: 872-878
🧹 Nitpick comments (2)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (2)
3107-3182: Consider extracting the missing field handling logic to improve readability.The
visitMvExpandmethod is ~75 lines, exceeding the 50-line guideline for this file. The missing field handling (lines 3135-3156) is a self-contained 22-line block that could be extracted into a helper method likebuildEmptyRelationWithField(RelBuilder, RelDataType, String). This would improve readability and maintainability.The core logic is sound:
- Missing field correctly returns an EMPTY relation (0 rows) while preserving schema
- Non-array types properly throw
SemanticCheckException- Limit validation and array expansion are handled correctly
🔎 Suggested refactoring to extract complex logic
Extract the missing field handling into a helper method:
/** * Builds an empty relation (0 rows) while preserving schema and adding the missing field as NULL. */ private void buildEmptyRelationWithField( RelBuilder relBuilder, RelDataType rowType, String missingFieldName) { List<RelDataTypeField> fields = rowType.getFieldList(); List<RexNode> projects = new ArrayList<>(); List<String> names = new ArrayList<>(); // Keep existing fields for (RelDataTypeField f : fields) { projects.add(relBuilder.field(f.getIndex())); names.add(f.getName()); } // Add NULL for missing field projects.add(relBuilder.literal(null)); names.add(missingFieldName); relBuilder.project(projects, names); // Return 0 rows relBuilder.filter(relBuilder.literal(false)); }Then simplify visitMvExpand:
// 2A. Missing field → true EMPTY relation (no schema, no rows) if (matched == null) { - // Schema must include the missing field, even if no rows returned. - List<RelDataTypeField> fields = rowType.getFieldList(); - List<RexNode> projects = new ArrayList<>(); - List<String> names = new ArrayList<>(); - - // Keep existing fields - for (RelDataTypeField f : fields) { - projects.add(relBuilder.field(f.getIndex())); - names.add(f.getName()); - } - - // Add NULL for missing field - projects.add(relBuilder.literal(null)); - names.add(fieldName); - - relBuilder.project(projects, names); - - // Now return 0 rows - relBuilder.filter(relBuilder.literal(false)); - + buildEmptyRelationWithField(relBuilder, rowType, fieldName); return relBuilder.peek(); }
3184-3196: Consider making the fallback logic more explicit.The
extractFieldNamemethod correctly handlesQualifiedNameby joining parts. However, the fallbacktoString().replace("", "")` is somewhat fragile as it:
- Assumes backticks are the only wrapper characters
- Depends on
UnresolvedExpression.toString()implementation details- Could incorrectly strip backticks that are part of the actual field name
Consider adding a comment explaining the fallback behavior or handling other known
UnresolvedExpressionsubtypes explicitly.🔎 Example: More explicit fallback handling
private String extractFieldName(Field f) { UnresolvedExpression inner = f.getField(); if (inner instanceof QualifiedName) { List<String> parts = ((QualifiedName) inner).getParts(); if (!parts.isEmpty()) { return String.join(".", parts); } } - // Fallback - return clean string + // Fallback: Remove backtick delimiters that may be added by the parser. + // This assumes field names don't contain backticks as part of their actual name. return inner.toString().replace("`", ""); }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
core/src/main/java/org/opensearch/sql/analysis/Analyzer.java(2 hunks)core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java(2 hunks)core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java(6 hunks)core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java(1 hunks)docs/category.json(2 hunks)integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java(1 hunks)integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java(2 hunks)ppl/src/main/antlr/OpenSearchPPLLexer.g4(1 hunks)ppl/src/main/antlr/OpenSearchPPLParser.g4(3 hunks)ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java(2 hunks)ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java(2 hunks)ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- docs/category.json
- ppl/src/main/antlr/OpenSearchPPLParser.g4
- integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
- ppl/src/main/antlr/OpenSearchPPLLexer.g4
- integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
🧰 Additional context used
📓 Path-based instructions (9)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.javacore/src/main/java/org/opensearch/sql/analysis/Analyzer.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.javacore/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.javappl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.javappl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.javacore/src/main/java/org/opensearch/sql/analysis/Analyzer.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.javacore/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.javappl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.javappl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java
**/ppl/**/*.java
⚙️ CodeRabbit configuration file
**/ppl/**/*.java: - For PPL parser changes, verify grammar tests with positive/negative cases
- Check AST generation for new syntax
- Ensure corresponding AST builder classes are updated
- Validate edge cases and boundary conditions
Files:
ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.javappl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.javappl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java
core/src/main/java/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit
Files:
core/src/main/java/org/opensearch/sql/analysis/Analyzer.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.javacore/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing Calcite integration patterns
- Verify RelNode visitor implementations are complete
- Check RexNode handling follows project conventions
- Validate SQL generation is correct and optimized
- Ensure Calcite version compatibility
- Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
⚙️ CodeRabbit configuration file
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java: - Flag methods >50 lines - this file is known to be hard to read
- Suggest extracting complex logic into helper methods
- Check for code organization and logical grouping
- Validate all RelNode types are handled
Files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
core/src/main/java/org/opensearch/sql/ast/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/org/opensearch/sql/ast/**/*.java: - AST nodes must be immutable where possible
- Follow visitor pattern for AST traversal
- Ensure proper toString() implementation for debugging
Files:
core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java
**/*Test.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*Test.java: All new business logic requires unit tests
Name unit tests with*Test.javasuffix in OpenSearch SQL
Files:
ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
**/test/**/*.java
⚙️ CodeRabbit configuration file
**/test/**/*.java: - Verify NULL input tests for all new functions
- Check boundary condition tests (min/max values, empty inputs)
- Validate error condition tests (invalid inputs, exceptions)
- Ensure multi-document tests for per-document operations
- Flag smoke tests without meaningful assertions
- Check test naming follows pattern: test
- Verify test data is realistic and covers edge cases
- Verify test coverage for new business logic
- Ensure tests are independent and don't rely on execution order
- Validate meaningful test data that reflects real-world scenarios
- Check for proper cleanup of test resources
Files:
ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
core/src/main/java/org/opensearch/sql/expression/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/org/opensearch/sql/expression/**/*.java: - New expression implementations must follow existing patterns
- Type handling must be consistent with project type system
- Error handling must use appropriate exception types
- Null handling must be explicit and documented
Files:
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
🧠 Learnings (5)
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes
Applied to files:
ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Verify AST generation for new PPL parser syntax
Applied to files:
ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Applied to files:
core/src/main/java/org/opensearch/sql/analysis/Analyzer.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.javacore/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.javappl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
core/src/main/java/org/opensearch/sql/analysis/Analyzer.java
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🧬 Code graph analysis (1)
ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java (1)
legacy/src/main/java/org/opensearch/sql/legacy/utils/StringUtils.java (1)
StringUtils(17-117)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: WhiteSource Security Check
🔇 Additional comments (9)
core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java (1)
71-71: LGTM! MvExpand visitor follows established pattern.The new
visitMvExpandmethod correctly follows the visitor pattern established in this class, delegating tovisitChildrenas expected.Also applies to: 458-460
ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java (1)
85-85: LGTM! MvExpand anonymization correctly implemented.The
visitMvExpandmethod properly anonymizes the mvexpand command by masking the field and limit values, following the established anonymization patterns in this class.Also applies to: 658-667
core/src/main/java/org/opensearch/sql/analysis/Analyzer.java (1)
83-83: LGTM! MvExpand correctly designated as Calcite-only.The
visitMvExpandoverride properly throws a Calcite-only exception, consistent with other commands like Expand, Flatten, and Reverse that follow the same pattern in this file.Also applies to: 717-720
ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (1)
866-875: LGTM! Good test coverage for mvexpand anonymization.The new tests cover both the basic mvexpand command and the variant with a limit parameter, following the established test patterns in this file.
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (4)
94-94: LGTM! Imports are necessary for MVExpand support.The new imports for
QualifiedNameandMvExpandare correctly added to support the MVExpand command implementation.Also applies to: 126-126
849-854: LGTM! Backward compatible integration.The
visitPatternsmethod correctly passesnullforperDocLimit, maintaining the existing expand behavior without per-document limits.
3102-3102: LGTM! Backward compatible integration.The
visitExpandmethod correctly passesnullforperDocLimit, maintaining the existing expand behavior without per-document limits.
3441-3494: LGTM! Clean integration of per-document limit support.The
buildExpandRelNodechanges are well-designed:
- The
@Nullable Integer perDocLimitparameter clearly indicates optional behavior- The limit is correctly applied after uncollect but before building the right node (lines 3468-3470), ensuring it limits values per document
- Backward compatibility is maintained for existing
ExpandandPatternsoperations- The integration follows existing patterns in the codebase
Based on learnings: This follows existing patterns in
CalciteRelNodeVisitorfor Calcite integration.core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)
1111-1118: The IGNORE type family fallback is an established design pattern, not a risk.The second registration for
INTERNAL_ITEMcorrectly implements a designed fallback mechanism:
Intentional design: The
SqlTypeFamily.IGNOREpattern is explicitly handled inPPLTypeChecker.java(line 81) and already used for other operators (IS_NULL,IS_NOT_NULL,IS_PRESENT), confirming it's a well-established, deliberate pattern—not a permissive edge case.Proper registration order: The first registration (lines 1103-1110) validates specific types (ARRAY+INTEGER, MAP+ANY), while the fallback (lines 1115-1118) only activates when the first doesn't match, enabling runtime resolution for UNDEFINED element types as required by MVEXPAND.
Test coverage exists: Integration tests (
CalciteMvExpandCommandIT.java) and planning tests (CalcitePPLMvExpandTest.java) cover array expansion scenarios including edge cases (empty arrays, null values, primitives). ThetestInternalItemAccessOnStructstest inCalciteExplainIT.javavalidatesINTERNAL_ITEMaccess patterns on dynamic structures.The implementation is sound and follows established patterns for deferred type resolution.
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/user/ppl/index.md(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist
Applied to files:
docs/user/ppl/index.md
🔇 Additional comments (2)
docs/user/ppl/index.md (2)
81-83: No duplicate entries found—all three commands appear once with unique entries.The addtotals, addcoltotals, and mvexpand commands each appear only once in the documentation table at their respective lines (81–83). The mvexpand.md documentation file exists and is properly formatted. The table entries are correctly formatted with appropriate version and status information.
Likely an incorrect or invalid review comment.
83-83: mvexpand.md documentation file exists and is appropriately structured.The mvexpand.md file is present at docs/user/ppl/cmd/mvexpand.md with comprehensive documentation including clear syntax definition, five detailed examples covering normal cases and edge cases (empty arrays, null values, missing fields, limit parameter), and explanatory notes. The documentation follows the established pattern of other PPL command docs.
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
integ-test/src/test/resources/expectedOutput/calcite/explain_mvexpand.yaml
Outdated
Show resolved
Hide resolved
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_mvexpand.yaml
Outdated
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Show resolved
Hide resolved
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
Outdated
Show resolved
Hide resolved
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
Show resolved
Hide resolved
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
docs/user/ppl/cmd/mvexpand.md (1)
7-10: Add language identifier to syntax code block.The fenced code block is missing a language identifier, which triggers markdownlint MD040. For consistency with other code blocks in this file (which use
```ppl), add the language specifier.Proposed fix
## Syntax -``` +```ppl mvexpand <field> [limit=<int>]</details> </blockquote></details> </blockquote></details> <details> <summary>🧹 Nitpick comments (1)</summary><blockquote> <details> <summary>core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)</summary><blockquote> `3147-3150`: **String-based exception detection is fragile.** The helper `isMissingFieldException` relies on parsing the exception message content. If the message format changes in `rexVisitor.analyze()` or the field resolution logic, this detection will silently break. Consider alternatives: - Define a custom `FieldNotFoundException` to catch explicitly - Check the exception type more specifically if one exists - Document this coupling prominently </blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: Path: .coderabbit.yaml **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between b1f2e591d78441925e777de66b374f3239c9dc0e and f7d942d4ef51011987e0d4bc2020145fb8699bd1. </details> <details> <summary>📒 Files selected for processing (11)</summary> * `core/src/main/java/org/opensearch/sql/ast/tree/MvExpand.java` * `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java` * `docs/user/ppl/cmd/mvexpand.md` * `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java` * `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java` * `integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java` * `integ-test/src/test/resources/expectedOutput/calcite/explain_mvexpand.yaml` * `integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_mvexpand.yaml` * `integ-test/src/test/resources/mvexpand_int_mapping.json` * `integ-test/src/test/resources/mvexpand_missing_field_mapping.json` * `integ-test/src/test/resources/mvexpand_not_array_mapping.json` </details> <details> <summary>✅ Files skipped from review due to trivial changes (2)</summary> * integ-test/src/test/resources/mvexpand_missing_field_mapping.json * integ-test/src/test/resources/mvexpand_not_array_mapping.json </details> <details> <summary>🚧 Files skipped from review as they are similar to previous changes (3)</summary> * integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java * integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java * integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_mvexpand.yaml </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>📓 Path-based instructions (7)</summary> <details> <summary>integ-test/src/test/resources/**/*</summary> **⚙️ CodeRabbit configuration file** > `integ-test/src/test/resources/**/*`: - Verify test data is realistic and representative > - Check data format matches expected schema > - Ensure test data covers edge cases and boundary conditions > Files: - `integ-test/src/test/resources/mvexpand_int_mapping.json` - `integ-test/src/test/resources/expectedOutput/calcite/explain_mvexpand.yaml` </details> <details> <summary>**/*.java</summary> **📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)** > `**/*.java`: Use `PascalCase` for class names (e.g., `QueryExecutor`) > Use `camelCase` for method and variable names (e.g., `executeQuery`) > Use `UPPER_SNAKE_CASE` for constants (e.g., `MAX_RETRY_COUNT`) > Keep methods under 20 lines with single responsibility > All public classes and methods must have proper JavaDoc > Use specific exception types with meaningful messages for error handling > Prefer `Optional<T>` for nullable returns in Java > Avoid unnecessary object creation in loops > Use `StringBuilder` for string concatenation in loops > Validate all user inputs, especially queries > Sanitize data before logging to prevent injection attacks > Use try-with-resources for proper resource cleanup in Java > Maintain Java 11 compatibility when possible for OpenSearch 2.x > Document Calcite-specific workarounds in code Files: - `core/src/main/java/org/opensearch/sql/ast/tree/MvExpand.java` - `integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java` - `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java` --- **⚙️ CodeRabbit configuration file** > `**/*.java`: - Flag methods >50 lines as potentially too complex - suggest refactoring > - Flag classes >500 lines as needing organization review > - Check for dead code, unused imports, and unused variables > - Identify code reuse opportunities across similar implementations > - Assess holistic maintainability - is code easy to understand and modify? > - Flag code that appears AI-generated without sufficient human review > - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables) > - Check for proper JavaDoc on public classes and methods > - Flag redundant comments that restate obvious code > - Ensure proper error handling with specific exception types > - Check for Optional<T> usage instead of null returns > - Validate proper use of try-with-resources for resource management > Files: - `core/src/main/java/org/opensearch/sql/ast/tree/MvExpand.java` - `integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java` - `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java` </details> <details> <summary>core/src/main/java/**/*.java</summary> **⚙️ CodeRabbit configuration file** > `core/src/main/java/**/*.java`: - New functions MUST have unit tests in the same commit > - Public methods MUST have JavaDoc with @param, @return, and @throws > - Follow existing function implementation patterns in the same package > - New expression functions should follow ExpressionFunction interface patterns > - Validate function naming follows project conventions (camelCase) > Files: - `core/src/main/java/org/opensearch/sql/ast/tree/MvExpand.java` - `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java` </details> <details> <summary>core/src/main/java/org/opensearch/sql/ast/**/*.java</summary> **⚙️ CodeRabbit configuration file** > `core/src/main/java/org/opensearch/sql/ast/**/*.java`: - AST nodes must be immutable where possible > - Follow visitor pattern for AST traversal > - Ensure proper toString() implementation for debugging > Files: - `core/src/main/java/org/opensearch/sql/ast/tree/MvExpand.java` </details> <details> <summary>**/test/**/*.java</summary> **⚙️ CodeRabbit configuration file** > `**/test/**/*.java`: - Verify NULL input tests for all new functions > - Check boundary condition tests (min/max values, empty inputs) > - Validate error condition tests (invalid inputs, exceptions) > - Ensure multi-document tests for per-document operations > - Flag smoke tests without meaningful assertions > - Check test naming follows pattern: test<Functionality><Condition> > - Verify test data is realistic and covers edge cases > - Verify test coverage for new business logic > - Ensure tests are independent and don't rely on execution order > - Validate meaningful test data that reflects real-world scenarios > - Check for proper cleanup of test resources > Files: - `integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java` </details> <details> <summary>**/calcite/**/*.java</summary> **⚙️ CodeRabbit configuration file** > `**/calcite/**/*.java`: - Follow existing Calcite integration patterns > - Verify RelNode visitor implementations are complete > - Check RexNode handling follows project conventions > - Validate SQL generation is correct and optimized > - Ensure Calcite version compatibility > - Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor > - Document any Calcite-specific workarounds > - Test compatibility with Calcite version constraints > Files: - `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java` </details> <details> <summary>core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java</summary> **⚙️ CodeRabbit configuration file** > `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`: - Flag methods >50 lines - this file is known to be hard to read > - Suggest extracting complex logic into helper methods > - Check for code organization and logical grouping > - Validate all RelNode types are handled > Files: - `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java` </details> </details><details> <summary>🧠 Learnings (9)</summary> <details> <summary>📓 Common learnings</summary>Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changesLearnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes</details> <details> <summary>📚 Learning: 2025-12-02T17:27:55.938Z</summary>Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with*IT.javasuffix in OpenSearch SQL**Applied to files:** - `integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java` </details> <details> <summary>📚 Learning: 2025-12-02T17:27:55.938Z</summary>Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes**Applied to files:** - `integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java` - `integ-test/src/test/resources/expectedOutput/calcite/explain_mvexpand.yaml` </details> <details> <summary>📚 Learning: 2025-12-02T17:27:55.938Z</summary>Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : Name unit tests with*Test.javasuffix in OpenSearch SQL**Applied to files:** - `integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java` </details> <details> <summary>📚 Learning: 2025-12-11T05:27:39.856Z</summary>Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, themaxoption has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing themaxoption is intentional and correct behavior.**Applied to files:** - `integ-test/src/test/resources/expectedOutput/calcite/explain_mvexpand.yaml` - `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java` </details> <details> <summary>📚 Learning: 2025-12-29T05:32:11.893Z</summary>Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:11.893Z
Learning: In opensearch-project/sql, when creating custom Calcite RelNode classes that extend EnumerableLimitSort or other Calcite RelNode types, always override thecopymethod. Without overriding copy, the class will downgrade to its parent class type during copy operations, losing the custom implementation.**Applied to files:** - `integ-test/src/test/resources/expectedOutput/calcite/explain_mvexpand.yaml` </details> <details> <summary>📚 Learning: 2025-12-02T17:27:55.938Z</summary>Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns inCalciteRelNodeVisitorandCalciteRexNodeVisitorfor Calcite integration**Applied to files:** - `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java` </details> <details> <summary>📚 Learning: 2025-12-29T05:32:03.491Z</summary>Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.**Applied to files:** - `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java` </details> <details> <summary>📚 Learning: 2025-12-02T17:27:55.938Z</summary>Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist**Applied to files:** - `docs/user/ppl/cmd/mvexpand.md` </details> </details><details> <summary>🧬 Code graph analysis (1)</summary> <details> <summary>core/src/main/java/org/opensearch/sql/ast/tree/MvExpand.java (2)</summary><blockquote> <details> <summary>core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java (1)</summary> * `AbstractNodeVisitor` (50-424) </details> <details> <summary>legacy/src/main/java/org/opensearch/sql/legacy/domain/Field.java (1)</summary> * `Field` (17-147) </details> </blockquote></details> </details><details> <summary>🪛 markdownlint-cli2 (0.18.1)</summary> <details> <summary>docs/user/ppl/cmd/mvexpand.md</summary> 8-8: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> </details> <details> <summary>🔇 Additional comments (8)</summary><blockquote> <details> <summary>integ-test/src/test/resources/expectedOutput/calcite/explain_mvexpand.yaml (2)</summary><blockquote> `2-7`: **Logical plan correctly shows Unnest path.** The logical plan structure is correct and includes LogicalUnnest, which is the appropriate Calcite operator for expanding array fields (mvexpand operation). This addresses the previous concern about the plan shortcircuiting. --- `1-13`: **Verify per-document limit parameter representation and edge case coverage.** According to the PR objectives, mvexpand supports an "optional limit parameter to restrict number of expanded values per event/document." However, this plan only shows a QUERY_SIZE_LIMIT (10000), which appears to be a system-level limit on total query results rather than a per-document array expansion limit. Additionally, while the table name `mvexpand_edge_cases` suggests edge case testing, it's unclear from the plan alone which specific edge cases are covered (e.g., null arrays, empty arrays, non-array fields, large arrays). Please verify: 1. Does this test exercise mvexpand without the limit parameter, or if a per-document limit is used, how is it represented in the Calcite plan? 2. What specific edge cases from the PR objectives (empty/null arrays, non-array fields, large arrays) are being validated by this explain test? Based on coding guidelines for test resource files, ensure test data covers the documented edge cases and boundary conditions. </blockquote></details> <details> <summary>core/src/main/java/org/opensearch/sql/ast/tree/MvExpand.java (1)</summary><blockquote> `17-46`: **LGTM! Clean AST node implementation following established patterns.** The `MvExpand` node correctly extends `UnresolvedPlan`, uses Lombok for boilerplate reduction, and properly implements the visitor pattern. The immutable `ImmutableList` return in `getChild()` is a good practice. One minor consideration: the `limit` field accepts any `Integer` including negative values. Based on the Calcite visitor implementation (which throws for negative limits in join's `max` option), you may want to add validation here or document that validation occurs downstream. </blockquote></details> <details> <summary>integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java (1)</summary><blockquote> `702-723`: **LGTM! Test index entries properly defined.** The new MVEXPAND test indices follow the established enum pattern and provide good coverage for edge cases (empty/null arrays, non-array fields, integer fields, missing fields). The `null` dataset for some entries is consistent with the pattern used elsewhere when test data is populated programmatically. Minor nit: There's an extra blank line between `MVEXPAND_NOT_ARRAY` and `MVEXPAND_INT_FIELD` (line 712) that differs from the formatting of other entries, though this doesn't affect functionality. </blockquote></details> <details> <summary>core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (2)</summary><blockquote> `3104-3145`: **Well-structured MVExpand visitor with proper validation and edge-case handling.** The implementation correctly: 1. Validates field type is ARRAY before expansion 2. Handles missing fields gracefully by preserving schema with empty results 3. Reuses the shared `buildExpandRelNode` infrastructure Two considerations: 1. **Limit validation**: If `mvExpand.getLimit()` is negative, it passes through to `context.relBuilder.limit(0, perDocLimit)`. Should negative limits be rejected with a `SemanticCheckException`? 2. **String-based exception detection** (lines 3147-3150): The `isMissingFieldException` method relies on parsing the exception message string. This is fragile if the message format changes. Consider a more robust approach if feasible. --- `3415-3425`: **Shared expand logic cleanly extended for per-document limits.** The `buildExpandRelNode` method is cleanly updated to support the optional `perDocLimit` parameter. Applying the limit to the right-side (unnested) node before the correlate join is the correct approach for limiting expanded values per document. Verified that existing callers (`visitExpand`, `visitPatterns`) pass `null` to preserve their existing behavior. </blockquote></details> <details> <summary>integ-test/src/test/resources/mvexpand_int_mapping.json (1)</summary><blockquote> `1-8`: **LGTM! Simple test mapping for non-array field scenario.** This mapping correctly defines `skills` as a scalar `integer` type (not an array), which aligns with testing the `mvexpand` error handling for non-array fields. The `SemanticCheckException` thrown in `visitMvExpand` when the field type is not `ARRAY` should be exercised by tests using this mapping. </blockquote></details> <details> <summary>docs/user/ppl/cmd/mvexpand.md (1)</summary><blockquote> `1-166`: **Comprehensive documentation with clear examples.** The documentation effectively covers: - Syntax with required and optional parameters - Multiple realistic examples using `array()` for deterministic doctests - Edge cases: empty arrays, null arrays, single-value arrays, missing fields - Clear explanation of output field naming behavior The approach of using `eval` + `array()` for doctest stability is a good practice that was incorporated from prior review feedback. </blockquote></details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (2)
3107-3161: MVExpand implementation is solid, but consider extracting missing-field logic.The method correctly handles the MVExpand semantics including:
- Missing field → empty results with preserved schema (Splunk-compatible behavior, as noted in past review)
- ARRAY type enforcement with clear error messaging
- Per-document limit propagation via
buildExpandRelNodeHowever, the method is ~42 lines, exceeding the 20-line guideline.
Optional refactor: Extract missing-field handling
Consider extracting the missing-field handling (lines 3128-3148) into a private helper method to improve readability:
private void handleMissingFieldForMvExpand( RelBuilder relBuilder, String fieldName, CalcitePlanContext context) { // Preserve schema by projecting NULL ARRAY column with the expected name final RelDataTypeFactory typeFactory = relBuilder.getTypeFactory(); final RelDataType arrayAny = typeFactory.createArrayType(typeFactory.createSqlType(SqlTypeName.ANY), -1); relBuilder.projectPlus( List.of( relBuilder.alias(relBuilder.getRexBuilder().makeNullLiteral(arrayAny), fieldName))); // Force empty relation (no rows), preserving schema relBuilder.filter(relBuilder.literal(false)); }Then in
visitMvExpand:try { arrayFieldRex = (RexInputRef) rexVisitor.analyze(field, context); } catch (IllegalArgumentException e) { if (isMissingFieldException(e)) { handleMissingFieldForMvExpand(relBuilder, fieldName, context); return relBuilder.peek(); } throw e; }Also, consider enhancing the JavaDoc to explicitly document the missing-field behavior:
/** * MVExpand command visitor. * * <p>For Calcite remote planning, mvexpand shares the same expansion mechanics as {@link Expand}: * it unnests the target multivalue field and joins back to the original relation. The additional * mvexpand semantics (such as an optional per-document limit) are surfaced via the MVExpand AST * node but reuse the same underlying RelBuilder pipeline as expand at this layer. * * <p><b>Missing field handling:</b> When the target field does not exist, mvexpand returns an * EMPTY result set (0 rows) while preserving the schema by projecting a NULL ARRAY column. * This behavior matches Splunk PPL semantics and avoids hard failures in chained queries. * * @param mvExpand MVExpand command to be visited * @param context CalcitePlanContext containing the RelBuilder and other context * @return RelNode representing records with the expanded multi-value field (or empty with preserved schema) * @throws SemanticCheckException if the target field exists but is not of ARRAY type */
3411-3464: Clean perDocLimit integration, but method exceeds 50-line guideline.The
buildExpandRelNodesignature change is well-executed:
@Nullable Integer perDocLimitparameter added cleanly- All existing callers (
visitExpand,visitPatterns) passnull→ backward compatible- Limit applied correctly after
uncollectand before building right node (lines 3438-3440)- Only MVExpand benefits from the new per-document limit feature
However, the method is 54 lines, exceeding the 50-line complexity guideline.
Given the method's length and Calcite API complexity, you may optionally consider extracting the right-node building logic (lines 3431-3441) into a helper:
private RelNode buildExpandRightNode( RexNode correlArrayFieldAccess, String arrayFieldName, @Nullable Integer perDocLimit, CalcitePlanContext context) { context .relBuilder .push(LogicalValues.createOneRow(context.relBuilder.getCluster())) .project(List.of(correlArrayFieldAccess), List.of(arrayFieldName)) .uncollect(List.of(), false); if (perDocLimit != null) { context.relBuilder.limit(0, perDocLimit); } return context.relBuilder.build(); }This would bring the main method closer to the guideline while maintaining clarity.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🧰 Additional context used
📓 Path-based instructions (4)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
core/src/main/java/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit
Files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing Calcite integration patterns
- Verify RelNode visitor implementations are complete
- Check RexNode handling follows project conventions
- Validate SQL generation is correct and optimized
- Ensure Calcite version compatibility
- Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
⚙️ CodeRabbit configuration file
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java: - Flag methods >50 lines - this file is known to be hard to read
- Suggest extracting complex logic into helper methods
- Check for code organization and logical grouping
- Validate all RelNode types are handled
Files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🔇 Additional comments (2)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (2)
54-54: LGTM: Clean import additions.The new imports for
RelDataTypeFactoryandMvExpandare necessary and properly used in the MVExpand implementation.Also applies to: 126-126
849-854: LGTM: Backward-compatible signature updates.The updates to
visitExpandandvisitPatternscorrectly passnullfor the newperDocLimitparameter, maintaining existing behavior while enabling MVExpand's per-document limit feature.Also applies to: 3102-3102
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)
3415-3468: Method exceeds complexity threshold and missing parameter documentation.This method has two issues:
Length: At 54 lines, it exceeds the 50-line threshold. Consider extracting the right-node building logic (lines 3435-3445) into a helper method like
buildExpandRightNode.Missing JavaDoc: The
perDocLimitparameter lacks@paramdocumentation. Document its purpose: "per-document limit for expansion; if non-null, limits the number of expanded values per input row."As per coding guidelines, methods over 50 lines should be reviewed for complexity and refactored.
♻️ Duplicate comments (1)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)
3160-3170: Fragile exception detection via string matching.This concern was already raised in a previous review. The
isMissingFieldExceptionmethod relies on regex pattern matching to detect missing field errors, which is brittle and will break silently if error message formats change inCalciteRexNodeVisitor.analyze().Consider implementing the previously suggested solution: introduce a typed
FieldNotFoundExceptionin the codebase and catch it specifically invisitMvExpand, or at minimum use more defensive string matching.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🧰 Additional context used
📓 Path-based instructions (4)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
core/src/main/java/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit
Files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing Calcite integration patterns
- Verify RelNode visitor implementations are complete
- Check RexNode handling follows project conventions
- Validate SQL generation is correct and optimized
- Ensure Calcite version compatibility
- Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
⚙️ CodeRabbit configuration file
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java: - Flag methods >50 lines - this file is known to be hard to read
- Suggest extracting complex logic into helper methods
- Check for code organization and logical grouping
- Validate all RelNode types are handled
Files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🧠 Learnings (5)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Verify AST generation for new PPL parser syntax
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🔇 Additional comments (2)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (2)
3133-3142: Verify empty expansion schema preservation is needed.The missing field handling projects a NULL
ARRAY(ANY)column then immediately filters it out withliteral(false). While this preserves the schema before filtering (adding the field name), the indirection is notable.Confirm this approach is required for schema consistency in downstream operations. If the added field is never observed (due to the immediate filter), consider whether a simpler approach could work.
Based on learnings, this treats missing fields as empty expansion matching Splunk behavior and avoiding hard failures in chained queries.
3442-3444: Per-document limit implementation looks correct.The placement of the limit after
uncollectbut before building the right node achieves the intended per-document limiting behavior. Because the expansion happens within a correlate scope, each left row's array expansion is limited independently, which is the correct semantic for MVExpand's limit parameter.
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Show resolved
Hide resolved
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
|
@ykmr1224 |
Description
This pull request adds native support for the mvexpand command in PPL to OpenSearch SQL, enabling users to expand multivalue fields (arrays) into separate rows directly within queries. This functionality is analogous to Splunk's mvexpand command and streamlines analytics, dashboarding, and data preparation involving arrays or multivalue fields.
Key features introduced:
Native mvexpand command for PPL queries to expand array fields into separate rows/events.
Optional limit parameter to restrict the number of expanded values per event/document.
Robust handling of empty/null arrays, large arrays (with memory/resource limits), and non-array fields.
Streaming/distributable execution for performance and scalability.
Comprehensive documentation and edge case coverage.
This feature makes OpenSearch SQL more powerful and user-friendly for log analytics, data exploration, and migration from platforms like Splunk.
Related Issues
Resolves #4439
#4439
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.