Skip to content

Conversation

@srikanthpadakanti
Copy link

Description

This pull request adds native support for the mvexpand command in PPL to OpenSearch SQL, enabling users to expand multivalue fields (arrays) into separate rows directly within queries. This functionality is analogous to Splunk's mvexpand command and streamlines analytics, dashboarding, and data preparation involving arrays or multivalue fields.

Key features introduced:

Native mvexpand command for PPL queries to expand array fields into separate rows/events.
Optional limit parameter to restrict the number of expanded values per event/document.
Robust handling of empty/null arrays, large arrays (with memory/resource limits), and non-array fields.
Streaming/distributable execution for performance and scalability.
Comprehensive documentation and edge case coverage.
This feature makes OpenSearch SQL more powerful and user-friendly for log analytics, data exploration, and migration from platforms like Splunk.

Related Issues

Resolves #4439
#4439

Check List

  • [ X] New functionality includes testing.
  • [ X] New functionality has been documented.
  • [ X] New functionality has javadoc added.
  • [ X] New functionality has a user manual doc added.
  • [ X] New PPL command checklist all confirmed.
  • [ X] API changes companion pull request created.
  • [ X] Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Srikanth Padakanti added 30 commits November 22, 2025 14:42
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
…it supposed to

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
docs/user/ppl/cmd/mvexpand.md (1)

7-9: Add language specifier to syntax code block for consistency.

The fenced code block at line 7 is missing a language identifier, unlike the other example code blocks in the file (lines 24, 40, 54, 64, 77) which use text. This creates a minor formatting inconsistency.

 ## Syntax
-```
+```text
 mvexpand <field> [limit=<int>]

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: Path: .coderabbit.yaml

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 44814ab0922099584138430aab7a0e704d0c65a9 and f9dd692b38579ba69040a53247c6de3600cc2b6d.

</details>

<details>
<summary>📒 Files selected for processing (3)</summary>

* `docs/category.json` (2 hunks)
* `docs/user/ppl/cmd/mvexpand.md` (1 hunks)
* `docs/user/ppl/index.rst` (1 hunks)

</details>

<details>
<summary>🚧 Files skipped from review as they are similar to previous changes (1)</summary>

* docs/user/ppl/index.rst

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🧠 Learnings (2)</summary>

<details>
<summary>📓 Common learnings</summary>

Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor for Calcite integration


</details>
<details>
<summary>📚 Learning: 2025-12-02T17:27:55.938Z</summary>

Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: All tests must pass before merge; fix or ask for guidance if blocked


**Applied to files:**
- `docs/category.json`

</details>

</details><details>
<summary>🪛 markdownlint-cli2 (0.18.1)</summary>

<details>
<summary>docs/user/ppl/cmd/mvexpand.md</summary>

7-7: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

</details>

<details>
<summary>🔇 Additional comments (2)</summary><blockquote>

<details>
<summary>docs/category.json (1)</summary><blockquote>

`30-30`: **✅ mvexpand entry is correctly placed and formatted.**

The mvexpand documentation entry is now properly located in the `ppl_cli_calcite` section with the `.md` extension, maintaining alphabetical ordering between `multisearch.md` and `replace.md`. Previous merge conflict issues have been resolved, and the JSON is syntactically valid.

</blockquote></details>
<details>
<summary>docs/user/ppl/cmd/mvexpand.md (1)</summary><blockquote>

`1-98`: **✅ Documentation is comprehensive and covers edge cases well.**

The mvexpand documentation clearly describes the command purpose, provides syntax with required and optional parameters, and includes five well-structured examples covering key scenarios: basic expansion, limit parameter, empty arrays, null arrays, single-value arrays, and missing fields. This aligns with the PR objectives and provides users with clear guidance on command behavior.

</blockquote></details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

srikanthpadakanti and others added 3 commits December 11, 2025 12:12
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f9dd692 and 2464675.

📒 Files selected for processing (1)
  • docs/user/ppl/cmd/mvexpand.md (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/user/ppl/cmd/mvexpand.md

7-7: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (1)
docs/user/ppl/cmd/mvexpand.md (1)

1-125: Comprehensive documentation with good edge-case coverage.

The documentation is well-structured and covers the key mvexpand requirements from the PR objectives: basic expansion, optional limit parameter, and important edge cases (empty arrays, null arrays, single-element arrays, missing fields). The examples use clear tabular output formatting and stable doctest methodology (filtering by case=<name>).

Signed-off-by: Srikanth Padakanti <srikanth29.9@gmail.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java (1)

94-94: Add JavaDoc for visitor method.

Per coding guidelines, public methods must have JavaDoc documentation with @param and @return tags. A previous review already flagged this same issue.

Based on coding guidelines, all public classes and methods must have proper JavaDoc.

Also applies to: 872-878

🧹 Nitpick comments (2)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (2)

3107-3182: Consider extracting the missing field handling logic to improve readability.

The visitMvExpand method is ~75 lines, exceeding the 50-line guideline for this file. The missing field handling (lines 3135-3156) is a self-contained 22-line block that could be extracted into a helper method like buildEmptyRelationWithField(RelBuilder, RelDataType, String). This would improve readability and maintainability.

The core logic is sound:

  • Missing field correctly returns an EMPTY relation (0 rows) while preserving schema
  • Non-array types properly throw SemanticCheckException
  • Limit validation and array expansion are handled correctly
🔎 Suggested refactoring to extract complex logic

Extract the missing field handling into a helper method:

/**
 * Builds an empty relation (0 rows) while preserving schema and adding the missing field as NULL.
 */
private void buildEmptyRelationWithField(
    RelBuilder relBuilder, RelDataType rowType, String missingFieldName) {
  List<RelDataTypeField> fields = rowType.getFieldList();
  List<RexNode> projects = new ArrayList<>();
  List<String> names = new ArrayList<>();

  // Keep existing fields
  for (RelDataTypeField f : fields) {
    projects.add(relBuilder.field(f.getIndex()));
    names.add(f.getName());
  }

  // Add NULL for missing field
  projects.add(relBuilder.literal(null));
  names.add(missingFieldName);

  relBuilder.project(projects, names);
  // Return 0 rows
  relBuilder.filter(relBuilder.literal(false));
}

Then simplify visitMvExpand:

     // 2A. Missing field → true EMPTY relation (no schema, no rows)
     if (matched == null) {
-      // Schema must include the missing field, even if no rows returned.
-      List<RelDataTypeField> fields = rowType.getFieldList();
-      List<RexNode> projects = new ArrayList<>();
-      List<String> names = new ArrayList<>();
-
-      // Keep existing fields
-      for (RelDataTypeField f : fields) {
-        projects.add(relBuilder.field(f.getIndex()));
-        names.add(f.getName());
-      }
-
-      // Add NULL for missing field
-      projects.add(relBuilder.literal(null));
-      names.add(fieldName);
-
-      relBuilder.project(projects, names);
-
-      // Now return 0 rows
-      relBuilder.filter(relBuilder.literal(false));
-
+      buildEmptyRelationWithField(relBuilder, rowType, fieldName);
       return relBuilder.peek();
     }

3184-3196: Consider making the fallback logic more explicit.

The extractFieldName method correctly handles QualifiedName by joining parts. However, the fallback toString().replace("", "")` is somewhat fragile as it:

  • Assumes backticks are the only wrapper characters
  • Depends on UnresolvedExpression.toString() implementation details
  • Could incorrectly strip backticks that are part of the actual field name

Consider adding a comment explaining the fallback behavior or handling other known UnresolvedExpression subtypes explicitly.

🔎 Example: More explicit fallback handling
 private String extractFieldName(Field f) {
   UnresolvedExpression inner = f.getField();

   if (inner instanceof QualifiedName) {
     List<String> parts = ((QualifiedName) inner).getParts();
     if (!parts.isEmpty()) {
       return String.join(".", parts);
     }
   }

-  // Fallback - return clean string
+  // Fallback: Remove backtick delimiters that may be added by the parser.
+  // This assumes field names don't contain backticks as part of their actual name.
   return inner.toString().replace("`", "");
 }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f86c52 and 32d3867.

📒 Files selected for processing (12)
  • core/src/main/java/org/opensearch/sql/analysis/Analyzer.java (2 hunks)
  • core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java (2 hunks)
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (6 hunks)
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1 hunks)
  • docs/category.json (2 hunks)
  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java (1 hunks)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (2 hunks)
  • ppl/src/main/antlr/OpenSearchPPLLexer.g4 (1 hunks)
  • ppl/src/main/antlr/OpenSearchPPLParser.g4 (3 hunks)
  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java (2 hunks)
  • ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java (2 hunks)
  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • docs/category.json
  • ppl/src/main/antlr/OpenSearchPPLParser.g4
  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • ppl/src/main/antlr/OpenSearchPPLLexer.g4
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
🧰 Additional context used
📓 Path-based instructions (9)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
  • core/src/main/java/org/opensearch/sql/analysis/Analyzer.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java
  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
  • ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
  • core/src/main/java/org/opensearch/sql/analysis/Analyzer.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java
  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
  • ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java
**/ppl/**/*.java

⚙️ CodeRabbit configuration file

**/ppl/**/*.java: - For PPL parser changes, verify grammar tests with positive/negative cases

  • Check AST generation for new syntax
  • Ensure corresponding AST builder classes are updated
  • Validate edge cases and boundary conditions

Files:

  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
  • ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java
core/src/main/java/**/*.java

⚙️ CodeRabbit configuration file

core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit

  • Public methods MUST have JavaDoc with @param, @return, and @throws
  • Follow existing function implementation patterns in the same package
  • New expression functions should follow ExpressionFunction interface patterns
  • Validate function naming follows project conventions (camelCase)

Files:

  • core/src/main/java/org/opensearch/sql/analysis/Analyzer.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

⚙️ CodeRabbit configuration file

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java: - Flag methods >50 lines - this file is known to be hard to read

  • Suggest extracting complex logic into helper methods
  • Check for code organization and logical grouping
  • Validate all RelNode types are handled

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
core/src/main/java/org/opensearch/sql/ast/**/*.java

⚙️ CodeRabbit configuration file

core/src/main/java/org/opensearch/sql/ast/**/*.java: - AST nodes must be immutable where possible

  • Follow visitor pattern for AST traversal
  • Ensure proper toString() implementation for debugging

Files:

  • core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java
**/*Test.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*Test.java: All new business logic requires unit tests
Name unit tests with *Test.java suffix in OpenSearch SQL

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify NULL input tests for all new functions

  • Check boundary condition tests (min/max values, empty inputs)
  • Validate error condition tests (invalid inputs, exceptions)
  • Ensure multi-document tests for per-document operations
  • Flag smoke tests without meaningful assertions
  • Check test naming follows pattern: test
  • Verify test data is realistic and covers edge cases
  • Verify test coverage for new business logic
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
core/src/main/java/org/opensearch/sql/expression/**/*.java

⚙️ CodeRabbit configuration file

core/src/main/java/org/opensearch/sql/expression/**/*.java: - New expression implementations must follow existing patterns

  • Type handling must be consistent with project type system
  • Error handling must use appropriate exception types
  • Null handling must be explicit and documented

Files:

  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
🧠 Learnings (5)
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes

Applied to files:

  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Verify AST generation for new PPL parser syntax

Applied to files:

  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • core/src/main/java/org/opensearch/sql/analysis/Analyzer.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java
  • ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • core/src/main/java/org/opensearch/sql/analysis/Analyzer.java
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🧬 Code graph analysis (1)
ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java (1)
legacy/src/main/java/org/opensearch/sql/legacy/utils/StringUtils.java (1)
  • StringUtils (17-117)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: WhiteSource Security Check
🔇 Additional comments (9)
core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java (1)

71-71: LGTM! MvExpand visitor follows established pattern.

The new visitMvExpand method correctly follows the visitor pattern established in this class, delegating to visitChildren as expected.

Also applies to: 458-460

ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java (1)

85-85: LGTM! MvExpand anonymization correctly implemented.

The visitMvExpand method properly anonymizes the mvexpand command by masking the field and limit values, following the established anonymization patterns in this class.

Also applies to: 658-667

core/src/main/java/org/opensearch/sql/analysis/Analyzer.java (1)

83-83: LGTM! MvExpand correctly designated as Calcite-only.

The visitMvExpand override properly throws a Calcite-only exception, consistent with other commands like Expand, Flatten, and Reverse that follow the same pattern in this file.

Also applies to: 717-720

ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (1)

866-875: LGTM! Good test coverage for mvexpand anonymization.

The new tests cover both the basic mvexpand command and the variant with a limit parameter, following the established test patterns in this file.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (4)

94-94: LGTM! Imports are necessary for MVExpand support.

The new imports for QualifiedName and MvExpand are correctly added to support the MVExpand command implementation.

Also applies to: 126-126


849-854: LGTM! Backward compatible integration.

The visitPatterns method correctly passes null for perDocLimit, maintaining the existing expand behavior without per-document limits.


3102-3102: LGTM! Backward compatible integration.

The visitExpand method correctly passes null for perDocLimit, maintaining the existing expand behavior without per-document limits.


3441-3494: LGTM! Clean integration of per-document limit support.

The buildExpandRelNode changes are well-designed:

  • The @Nullable Integer perDocLimit parameter clearly indicates optional behavior
  • The limit is correctly applied after uncollect but before building the right node (lines 3468-3470), ensuring it limits values per document
  • Backward compatibility is maintained for existing Expand and Patterns operations
  • The integration follows existing patterns in the codebase

Based on learnings: This follows existing patterns in CalciteRelNodeVisitor for Calcite integration.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)

1111-1118: The IGNORE type family fallback is an established design pattern, not a risk.

The second registration for INTERNAL_ITEM correctly implements a designed fallback mechanism:

  1. Intentional design: The SqlTypeFamily.IGNORE pattern is explicitly handled in PPLTypeChecker.java (line 81) and already used for other operators (IS_NULL, IS_NOT_NULL, IS_PRESENT), confirming it's a well-established, deliberate pattern—not a permissive edge case.

  2. Proper registration order: The first registration (lines 1103-1110) validates specific types (ARRAY+INTEGER, MAP+ANY), while the fallback (lines 1115-1118) only activates when the first doesn't match, enabling runtime resolution for UNDEFINED element types as required by MVEXPAND.

  3. Test coverage exists: Integration tests (CalciteMvExpandCommandIT.java) and planning tests (CalcitePPLMvExpandTest.java) cover array expansion scenarios including edge cases (empty arrays, null values, primitives). The testInternalItemAccessOnStructs test in CalciteExplainIT.java validates INTERNAL_ITEM access patterns on dynamic structures.

The implementation is sound and follows established patterns for deferred type resolution.

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 32d3867 and 07509ae.

📒 Files selected for processing (1)
  • docs/user/ppl/index.md (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist

Applied to files:

  • docs/user/ppl/index.md
🔇 Additional comments (2)
docs/user/ppl/index.md (2)

81-83: No duplicate entries found—all three commands appear once with unique entries.

The addtotals, addcoltotals, and mvexpand commands each appear only once in the documentation table at their respective lines (81–83). The mvexpand.md documentation file exists and is properly formatted. The table entries are correctly formatted with appropriate version and status information.

Likely an incorrect or invalid review comment.


83-83: mvexpand.md documentation file exists and is appropriately structured.

The mvexpand.md file is present at docs/user/ppl/cmd/mvexpand.md with comprehensive documentation including clear syntax definition, five detailed examples covering normal cases and edge cases (empty arrays, null values, missing fields, limit parameter), and explanatory notes. The documentation follows the established pattern of other PPL command docs.

Srikanth Padakanti added 2 commits December 18, 2025 17:18
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
dai-chen
dai-chen previously approved these changes Dec 19, 2025
@srikanthpadakanti
Copy link
Author

Hello @penghuo @ykmr1224 Can you please look into this PR. Thanks

Srikanth Padakanti added 2 commits January 2, 2026 17:11
Srikanth Padakanti added 2 commits January 2, 2026 20:54
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
docs/user/ppl/cmd/mvexpand.md (1)

7-10: Add language identifier to syntax code block.

The fenced code block is missing a language identifier, which triggers markdownlint MD040. For consistency with other code blocks in this file (which use ```ppl), add the language specifier.

Proposed fix
 ## Syntax
-```
+```ppl
 mvexpand <field> [limit=<int>]
</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🧹 Nitpick comments (1)</summary><blockquote>

<details>
<summary>core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)</summary><blockquote>

`3147-3150`: **String-based exception detection is fragile.**

The helper `isMissingFieldException` relies on parsing the exception message content. If the message format changes in `rexVisitor.analyze()` or the field resolution logic, this detection will silently break.

Consider alternatives:
- Define a custom `FieldNotFoundException` to catch explicitly
- Check the exception type more specifically if one exists
- Document this coupling prominently

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: Path: .coderabbit.yaml

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between b1f2e591d78441925e777de66b374f3239c9dc0e and f7d942d4ef51011987e0d4bc2020145fb8699bd1.

</details>

<details>
<summary>📒 Files selected for processing (11)</summary>

* `core/src/main/java/org/opensearch/sql/ast/tree/MvExpand.java`
* `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`
* `docs/user/ppl/cmd/mvexpand.md`
* `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java`
* `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java`
* `integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java`
* `integ-test/src/test/resources/expectedOutput/calcite/explain_mvexpand.yaml`
* `integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_mvexpand.yaml`
* `integ-test/src/test/resources/mvexpand_int_mapping.json`
* `integ-test/src/test/resources/mvexpand_missing_field_mapping.json`
* `integ-test/src/test/resources/mvexpand_not_array_mapping.json`

</details>

<details>
<summary>✅ Files skipped from review due to trivial changes (2)</summary>

* integ-test/src/test/resources/mvexpand_missing_field_mapping.json
* integ-test/src/test/resources/mvexpand_not_array_mapping.json

</details>

<details>
<summary>🚧 Files skipped from review as they are similar to previous changes (3)</summary>

* integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
* integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
* integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_mvexpand.yaml

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>📓 Path-based instructions (7)</summary>

<details>
<summary>integ-test/src/test/resources/**/*</summary>


**⚙️ CodeRabbit configuration file**

> `integ-test/src/test/resources/**/*`: - Verify test data is realistic and representative
> - Check data format matches expected schema
> - Ensure test data covers edge cases and boundary conditions
> 

Files:
- `integ-test/src/test/resources/mvexpand_int_mapping.json`
- `integ-test/src/test/resources/expectedOutput/calcite/explain_mvexpand.yaml`

</details>
<details>
<summary>**/*.java</summary>


**📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)**

> `**/*.java`: Use `PascalCase` for class names (e.g., `QueryExecutor`)
> Use `camelCase` for method and variable names (e.g., `executeQuery`)
> Use `UPPER_SNAKE_CASE` for constants (e.g., `MAX_RETRY_COUNT`)
> Keep methods under 20 lines with single responsibility
> All public classes and methods must have proper JavaDoc
> Use specific exception types with meaningful messages for error handling
> Prefer `Optional<T>` for nullable returns in Java
> Avoid unnecessary object creation in loops
> Use `StringBuilder` for string concatenation in loops
> Validate all user inputs, especially queries
> Sanitize data before logging to prevent injection attacks
> Use try-with-resources for proper resource cleanup in Java
> Maintain Java 11 compatibility when possible for OpenSearch 2.x
> Document Calcite-specific workarounds in code

Files:
- `core/src/main/java/org/opensearch/sql/ast/tree/MvExpand.java`
- `integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java`
- `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`

---

**⚙️ CodeRabbit configuration file**

> `**/*.java`: - Flag methods >50 lines as potentially too complex - suggest refactoring
> - Flag classes >500 lines as needing organization review
> - Check for dead code, unused imports, and unused variables
> - Identify code reuse opportunities across similar implementations
> - Assess holistic maintainability - is code easy to understand and modify?
> - Flag code that appears AI-generated without sufficient human review
> - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
> - Check for proper JavaDoc on public classes and methods
> - Flag redundant comments that restate obvious code
> - Ensure proper error handling with specific exception types
> - Check for Optional<T> usage instead of null returns
> - Validate proper use of try-with-resources for resource management
> 

Files:
- `core/src/main/java/org/opensearch/sql/ast/tree/MvExpand.java`
- `integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java`
- `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`

</details>
<details>
<summary>core/src/main/java/**/*.java</summary>


**⚙️ CodeRabbit configuration file**

> `core/src/main/java/**/*.java`: - New functions MUST have unit tests in the same commit
> - Public methods MUST have JavaDoc with @param, @return, and @throws
> - Follow existing function implementation patterns in the same package
> - New expression functions should follow ExpressionFunction interface patterns
> - Validate function naming follows project conventions (camelCase)
> 

Files:
- `core/src/main/java/org/opensearch/sql/ast/tree/MvExpand.java`
- `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`

</details>
<details>
<summary>core/src/main/java/org/opensearch/sql/ast/**/*.java</summary>


**⚙️ CodeRabbit configuration file**

> `core/src/main/java/org/opensearch/sql/ast/**/*.java`: - AST nodes must be immutable where possible
> - Follow visitor pattern for AST traversal
> - Ensure proper toString() implementation for debugging
> 

Files:
- `core/src/main/java/org/opensearch/sql/ast/tree/MvExpand.java`

</details>
<details>
<summary>**/test/**/*.java</summary>


**⚙️ CodeRabbit configuration file**

> `**/test/**/*.java`: - Verify NULL input tests for all new functions
> - Check boundary condition tests (min/max values, empty inputs)
> - Validate error condition tests (invalid inputs, exceptions)
> - Ensure multi-document tests for per-document operations
> - Flag smoke tests without meaningful assertions
> - Check test naming follows pattern: test<Functionality><Condition>
> - Verify test data is realistic and covers edge cases
> - Verify test coverage for new business logic
> - Ensure tests are independent and don't rely on execution order
> - Validate meaningful test data that reflects real-world scenarios
> - Check for proper cleanup of test resources
> 

Files:
- `integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java`

</details>
<details>
<summary>**/calcite/**/*.java</summary>


**⚙️ CodeRabbit configuration file**

> `**/calcite/**/*.java`: - Follow existing Calcite integration patterns
> - Verify RelNode visitor implementations are complete
> - Check RexNode handling follows project conventions
> - Validate SQL generation is correct and optimized
> - Ensure Calcite version compatibility
> - Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
> - Document any Calcite-specific workarounds
> - Test compatibility with Calcite version constraints
> 

Files:
- `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`

</details>
<details>
<summary>core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java</summary>


**⚙️ CodeRabbit configuration file**

> `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`: - Flag methods >50 lines - this file is known to be hard to read
> - Suggest extracting complex logic into helper methods
> - Check for code organization and logical grouping
> - Validate all RelNode types are handled
> 

Files:
- `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`

</details>

</details><details>
<summary>🧠 Learnings (9)</summary>

<details>
<summary>📓 Common learnings</summary>

Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes


Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes


</details>
<details>
<summary>📚 Learning: 2025-12-02T17:27:55.938Z</summary>

Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with *IT.java suffix in OpenSearch SQL


**Applied to files:**
- `integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java`

</details>
<details>
<summary>📚 Learning: 2025-12-02T17:27:55.938Z</summary>

Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes


**Applied to files:**
- `integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java`
- `integ-test/src/test/resources/expectedOutput/calcite/explain_mvexpand.yaml`

</details>
<details>
<summary>📚 Learning: 2025-12-02T17:27:55.938Z</summary>

Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : Name unit tests with *Test.java suffix in OpenSearch SQL


**Applied to files:**
- `integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java`

</details>
<details>
<summary>📚 Learning: 2025-12-11T05:27:39.856Z</summary>

Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the max option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the max option is intentional and correct behavior.


**Applied to files:**
- `integ-test/src/test/resources/expectedOutput/calcite/explain_mvexpand.yaml`
- `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`

</details>
<details>
<summary>📚 Learning: 2025-12-29T05:32:11.893Z</summary>

Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:11.893Z
Learning: In opensearch-project/sql, when creating custom Calcite RelNode classes that extend EnumerableLimitSort or other Calcite RelNode types, always override the copy method. Without overriding copy, the class will downgrade to its parent class type during copy operations, losing the custom implementation.


**Applied to files:**
- `integ-test/src/test/resources/expectedOutput/calcite/explain_mvexpand.yaml`

</details>
<details>
<summary>📚 Learning: 2025-12-02T17:27:55.938Z</summary>

Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor for Calcite integration


**Applied to files:**
- `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`

</details>
<details>
<summary>📚 Learning: 2025-12-29T05:32:03.491Z</summary>

Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.


**Applied to files:**
- `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`

</details>
<details>
<summary>📚 Learning: 2025-12-02T17:27:55.938Z</summary>

Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist


**Applied to files:**
- `docs/user/ppl/cmd/mvexpand.md`

</details>

</details><details>
<summary>🧬 Code graph analysis (1)</summary>

<details>
<summary>core/src/main/java/org/opensearch/sql/ast/tree/MvExpand.java (2)</summary><blockquote>

<details>
<summary>core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java (1)</summary>

* `AbstractNodeVisitor` (50-424)

</details>
<details>
<summary>legacy/src/main/java/org/opensearch/sql/legacy/domain/Field.java (1)</summary>

* `Field` (17-147)

</details>

</blockquote></details>

</details><details>
<summary>🪛 markdownlint-cli2 (0.18.1)</summary>

<details>
<summary>docs/user/ppl/cmd/mvexpand.md</summary>

8-8: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

</details>

<details>
<summary>🔇 Additional comments (8)</summary><blockquote>

<details>
<summary>integ-test/src/test/resources/expectedOutput/calcite/explain_mvexpand.yaml (2)</summary><blockquote>

`2-7`: **Logical plan correctly shows Unnest path.**

The logical plan structure is correct and includes LogicalUnnest, which is the appropriate Calcite operator for expanding array fields (mvexpand operation). This addresses the previous concern about the plan shortcircuiting.

---

`1-13`: **Verify per-document limit parameter representation and edge case coverage.**

According to the PR objectives, mvexpand supports an "optional limit parameter to restrict number of expanded values per event/document." However, this plan only shows a QUERY_SIZE_LIMIT (10000), which appears to be a system-level limit on total query results rather than a per-document array expansion limit.

Additionally, while the table name `mvexpand_edge_cases` suggests edge case testing, it's unclear from the plan alone which specific edge cases are covered (e.g., null arrays, empty arrays, non-array fields, large arrays).

Please verify:
1. Does this test exercise mvexpand without the limit parameter, or if a per-document limit is used, how is it represented in the Calcite plan?
2. What specific edge cases from the PR objectives (empty/null arrays, non-array fields, large arrays) are being validated by this explain test?



Based on coding guidelines for test resource files, ensure test data covers the documented edge cases and boundary conditions.

</blockquote></details>
<details>
<summary>core/src/main/java/org/opensearch/sql/ast/tree/MvExpand.java (1)</summary><blockquote>

`17-46`: **LGTM! Clean AST node implementation following established patterns.**

The `MvExpand` node correctly extends `UnresolvedPlan`, uses Lombok for boilerplate reduction, and properly implements the visitor pattern. The immutable `ImmutableList` return in `getChild()` is a good practice.

One minor consideration: the `limit` field accepts any `Integer` including negative values. Based on the Calcite visitor implementation (which throws for negative limits in join's `max` option), you may want to add validation here or document that validation occurs downstream.

</blockquote></details>
<details>
<summary>integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java (1)</summary><blockquote>

`702-723`: **LGTM! Test index entries properly defined.**

The new MVEXPAND test indices follow the established enum pattern and provide good coverage for edge cases (empty/null arrays, non-array fields, integer fields, missing fields). The `null` dataset for some entries is consistent with the pattern used elsewhere when test data is populated programmatically.

Minor nit: There's an extra blank line between `MVEXPAND_NOT_ARRAY` and `MVEXPAND_INT_FIELD` (line 712) that differs from the formatting of other entries, though this doesn't affect functionality.

</blockquote></details>
<details>
<summary>core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (2)</summary><blockquote>

`3104-3145`: **Well-structured MVExpand visitor with proper validation and edge-case handling.**

The implementation correctly:
1. Validates field type is ARRAY before expansion
2. Handles missing fields gracefully by preserving schema with empty results
3. Reuses the shared `buildExpandRelNode` infrastructure

Two considerations:

1. **Limit validation**: If `mvExpand.getLimit()` is negative, it passes through to `context.relBuilder.limit(0, perDocLimit)`. Should negative limits be rejected with a `SemanticCheckException`?

2. **String-based exception detection** (lines 3147-3150): The `isMissingFieldException` method relies on parsing the exception message string. This is fragile if the message format changes. Consider a more robust approach if feasible.

---

`3415-3425`: **Shared expand logic cleanly extended for per-document limits.**

The `buildExpandRelNode` method is cleanly updated to support the optional `perDocLimit` parameter. Applying the limit to the right-side (unnested) node before the correlate join is the correct approach for limiting expanded values per document.

Verified that existing callers (`visitExpand`, `visitPatterns`) pass `null` to preserve their existing behavior.

</blockquote></details>
<details>
<summary>integ-test/src/test/resources/mvexpand_int_mapping.json (1)</summary><blockquote>

`1-8`: **LGTM! Simple test mapping for non-array field scenario.**

This mapping correctly defines `skills` as a scalar `integer` type (not an array), which aligns with testing the `mvexpand` error handling for non-array fields. The `SemanticCheckException` thrown in `visitMvExpand` when the field type is not `ARRAY` should be exercised by tests using this mapping.

</blockquote></details>
<details>
<summary>docs/user/ppl/cmd/mvexpand.md (1)</summary><blockquote>

`1-166`: **Comprehensive documentation with clear examples.**

The documentation effectively covers:
- Syntax with required and optional parameters
- Multiple realistic examples using `array()` for deterministic doctests
- Edge cases: empty arrays, null arrays, single-value arrays, missing fields
- Clear explanation of output field naming behavior

The approach of using `eval` + `array()` for doctest stability is a good practice that was incorporated from prior review feedback.

</blockquote></details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Srikanth Padakanti added 2 commits January 2, 2026 21:08
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (2)

3107-3161: MVExpand implementation is solid, but consider extracting missing-field logic.

The method correctly handles the MVExpand semantics including:

  • Missing field → empty results with preserved schema (Splunk-compatible behavior, as noted in past review)
  • ARRAY type enforcement with clear error messaging
  • Per-document limit propagation via buildExpandRelNode

However, the method is ~42 lines, exceeding the 20-line guideline.

Optional refactor: Extract missing-field handling

Consider extracting the missing-field handling (lines 3128-3148) into a private helper method to improve readability:

private void handleMissingFieldForMvExpand(
    RelBuilder relBuilder, 
    String fieldName, 
    CalcitePlanContext context) {
  // Preserve schema by projecting NULL ARRAY column with the expected name
  final RelDataTypeFactory typeFactory = relBuilder.getTypeFactory();
  final RelDataType arrayAny =
      typeFactory.createArrayType(typeFactory.createSqlType(SqlTypeName.ANY), -1);
  
  relBuilder.projectPlus(
      List.of(
          relBuilder.alias(relBuilder.getRexBuilder().makeNullLiteral(arrayAny), fieldName)));
  
  // Force empty relation (no rows), preserving schema
  relBuilder.filter(relBuilder.literal(false));
}

Then in visitMvExpand:

try {
  arrayFieldRex = (RexInputRef) rexVisitor.analyze(field, context);
} catch (IllegalArgumentException e) {
  if (isMissingFieldException(e)) {
    handleMissingFieldForMvExpand(relBuilder, fieldName, context);
    return relBuilder.peek();
  }
  throw e;
}

Also, consider enhancing the JavaDoc to explicitly document the missing-field behavior:

/**
 * MVExpand command visitor.
 *
 * <p>For Calcite remote planning, mvexpand shares the same expansion mechanics as {@link Expand}:
 * it unnests the target multivalue field and joins back to the original relation. The additional
 * mvexpand semantics (such as an optional per-document limit) are surfaced via the MVExpand AST
 * node but reuse the same underlying RelBuilder pipeline as expand at this layer.
 *
 * <p><b>Missing field handling:</b> When the target field does not exist, mvexpand returns an 
 * EMPTY result set (0 rows) while preserving the schema by projecting a NULL ARRAY column. 
 * This behavior matches Splunk PPL semantics and avoids hard failures in chained queries.
 *
 * @param mvExpand MVExpand command to be visited
 * @param context CalcitePlanContext containing the RelBuilder and other context
 * @return RelNode representing records with the expanded multi-value field (or empty with preserved schema)
 * @throws SemanticCheckException if the target field exists but is not of ARRAY type
 */

3411-3464: Clean perDocLimit integration, but method exceeds 50-line guideline.

The buildExpandRelNode signature change is well-executed:

  • @Nullable Integer perDocLimit parameter added cleanly
  • All existing callers (visitExpand, visitPatterns) pass null → backward compatible
  • Limit applied correctly after uncollect and before building right node (lines 3438-3440)
  • Only MVExpand benefits from the new per-document limit feature

However, the method is 54 lines, exceeding the 50-line complexity guideline.

Given the method's length and Calcite API complexity, you may optionally consider extracting the right-node building logic (lines 3431-3441) into a helper:

private RelNode buildExpandRightNode(
    RexNode correlArrayFieldAccess,
    String arrayFieldName,
    @Nullable Integer perDocLimit,
    CalcitePlanContext context) {
  context
      .relBuilder
      .push(LogicalValues.createOneRow(context.relBuilder.getCluster()))
      .project(List.of(correlArrayFieldAccess), List.of(arrayFieldName))
      .uncollect(List.of(), false);
  
  if (perDocLimit != null) {
    context.relBuilder.limit(0, perDocLimit);
  }
  
  return context.relBuilder.build();
}

This would bring the main method closer to the guideline while maintaining clarity.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ca94e2 and e747edb.

📒 Files selected for processing (1)
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🧰 Additional context used
📓 Path-based instructions (4)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
core/src/main/java/**/*.java

⚙️ CodeRabbit configuration file

core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit

  • Public methods MUST have JavaDoc with @param, @return, and @throws
  • Follow existing function implementation patterns in the same package
  • New expression functions should follow ExpressionFunction interface patterns
  • Validate function naming follows project conventions (camelCase)

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

⚙️ CodeRabbit configuration file

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java: - Flag methods >50 lines - this file is known to be hard to read

  • Suggest extracting complex logic into helper methods
  • Check for code organization and logical grouping
  • Validate all RelNode types are handled

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🔇 Additional comments (2)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (2)

54-54: LGTM: Clean import additions.

The new imports for RelDataTypeFactory and MvExpand are necessary and properly used in the MVExpand implementation.

Also applies to: 126-126


849-854: LGTM: Backward-compatible signature updates.

The updates to visitExpand and visitPatterns correctly pass null for the new perDocLimit parameter, maintaining existing behavior while enabling MVExpand's per-document limit feature.

Also applies to: 3102-3102

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)

3415-3468: Method exceeds complexity threshold and missing parameter documentation.

This method has two issues:

  1. Length: At 54 lines, it exceeds the 50-line threshold. Consider extracting the right-node building logic (lines 3435-3445) into a helper method like buildExpandRightNode.

  2. Missing JavaDoc: The perDocLimit parameter lacks @param documentation. Document its purpose: "per-document limit for expansion; if non-null, limits the number of expanded values per input row."

As per coding guidelines, methods over 50 lines should be reviewed for complexity and refactored.

♻️ Duplicate comments (1)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)

3160-3170: Fragile exception detection via string matching.

This concern was already raised in a previous review. The isMissingFieldException method relies on regex pattern matching to detect missing field errors, which is brittle and will break silently if error message formats change in CalciteRexNodeVisitor.analyze().

Consider implementing the previously suggested solution: introduce a typed FieldNotFoundException in the codebase and catch it specifically in visitMvExpand, or at minimum use more defensive string matching.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e747edb and 587ccb2.

📒 Files selected for processing (1)
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🧰 Additional context used
📓 Path-based instructions (4)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
core/src/main/java/**/*.java

⚙️ CodeRabbit configuration file

core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit

  • Public methods MUST have JavaDoc with @param, @return, and @throws
  • Follow existing function implementation patterns in the same package
  • New expression functions should follow ExpressionFunction interface patterns
  • Validate function naming follows project conventions (camelCase)

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

⚙️ CodeRabbit configuration file

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java: - Flag methods >50 lines - this file is known to be hard to read

  • Suggest extracting complex logic into helper methods
  • Check for code organization and logical grouping
  • Validate all RelNode types are handled

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🧠 Learnings (5)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Verify AST generation for new PPL parser syntax
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🔇 Additional comments (2)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (2)

3133-3142: Verify empty expansion schema preservation is needed.

The missing field handling projects a NULL ARRAY(ANY) column then immediately filters it out with literal(false). While this preserves the schema before filtering (adding the field name), the indirection is notable.

Confirm this approach is required for schema consistency in downstream operations. If the added field is never observed (due to the immediate filter), consider whether a simpler approach could work.

Based on learnings, this treats missing fields as empty expansion matching Splunk behavior and avoiding hard failures in chained queries.


3442-3444: Per-document limit implementation looks correct.

The placement of the limit after uncollect but before building the right node achieves the intended per-document limiting behavior. Because the expansion happens within a correlate scope, each left row's array expansion is limited independently, which is the correct semantic for MVExpand's limit parameter.

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
@srikanthpadakanti
Copy link
Author

srikanthpadakanti commented Jan 5, 2026

@ykmr1224
can you please trigger the workflows and review my changes. thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev enhancement New feature or request PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add native support for mvexpand command in PPL

3 participants