[#11590] fix(doris): fix partition parsing for Doris 3.0+ format#11732
Open
jiangxt2 wants to merge 6 commits into
Open
[#11590] fix(doris): fix partition parsing for Doris 3.0+ format#11732jiangxt2 wants to merge 6 commits into
jiangxt2 wants to merge 6 commits into
Conversation
Code Coverage Report
Files
|
Contributor
There was a problem hiding this comment.
Pull request overview
Improves Apache Doris catalog compatibility (Doris 3.0+ SHOW CREATE TABLE output) by making partition parsing more tolerant to whitespace and more robust for LIST partition definitions, with corresponding unit test updates.
Changes:
- Relaxed partition header regex to tolerate whitespace and updated partition column splitting/backtick handling.
- Added LIST partition assignment parsing that can handle nested parentheses for multi-column LIST values.
- Updated/expanded
TestDorisUtilscases and migrated assertions to JUnit JupiterAssertions.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| catalogs/catalog-jdbc-doris/src/main/java/org/apache/gravitino/catalog/doris/utils/DorisUtils.java | Updates partition regex and adds LIST partition assignment parsing logic. |
| catalogs/catalog-jdbc-doris/src/test/java/org/apache/gravitino/catalog/doris/utils/TestDorisUtils.java | Expands partition parsing tests (including Doris 3.0+ spacing) and switches to JUnit Jupiter assertions. |
| } else if (RANGE_PARTITION.equals(partitionType)) { | ||
| return Optional.of(Transforms.range(new String[] {columns[0]})); | ||
| // Merge all lines to handle multi-line partition definitions | ||
| String mergedSql = createTableSql.replaceAll("\\n", " "); |
Comment on lines
+118
to
+125
| String[][] filedNames = | ||
| Arrays.stream(columns).map(s -> new String[] {s}).toArray(String[][]::new); | ||
| // Try to extract partition assignments | ||
| ListPartition[] assignments = extractListPartitionAssignments(mergedSql); | ||
| if (assignments.length > 0) { | ||
| return Optional.of(Transforms.list(filedNames, assignments)); | ||
| } | ||
| return Optional.of(Transforms.list(filedNames)); |
| // Locate "PARTITION <name> VALUES IN (" and extract the outer paren content manually | ||
| // to correctly handle multi-column partitions: VALUES IN (("a", 1), ("b", 2)) | ||
| Pattern headerPattern = | ||
| Pattern.compile("PARTITION\\s+(?:`(\\w+)`|(\\w+))\\s+VALUES\\s+IN\\s*\\("); |
Comment on lines
+153
to
156
| transform = DorisUtils.extractPartitionInfoFromSql(createTableSql); | ||
| Assertions.assertTrue(transform.isPresent()); | ||
| Assertions.assertEquals("list", transform.get().name()); | ||
| } |
- Add whitespace tolerance in partition regex (PARTITION BY LIST ( vs LIST() - Fix partition column extraction: split by comma + trim + backtick stripping - Support backtick-quoted partition names (PARTITION `p1` VALUES IN ...) - Add bracket-depth-aware parsing for multi-column LIST partitions Related to apache#11590 Co-Authored-By: Chang-Tong <zdcheerful@hotmail.com> Co-Authored-By: ArtificialIdoit <bill.sea@hotmail.com> Co-Authored-By: cwq222 <15503804976@163.com> Signed-off-by: jiangxt2 <jiangxt2@vip.qq.com>
a36570f to
badba44
Compare
- Replace regex-based replaceAll with char-level replace for newline merging - Fix typo: filedNames -> fieldNames - Allow backtick-quoted partition names with hyphens, dots and spaces by using [^`]+ instead of \w+ in headerPattern - Add partition name extraction assertions and boundary case tests: empty assignment list, multi-line SQL, backtick-quoted column names with special characters, partition names with spaces Signed-off-by: jiangxt2 <jiangxt2@vip.qq.com> Co-Authored-By: Chang-Tong <zdcheerful@hotmail.com> Co-Authored-By: ArtificialIdoit <bill.sea@hotmail.com> Co-Authored-By: cwq222 <15503804976@163.com>
badba44 to
b62a75e
Compare
Author
|
Thanks for the review. After checking each comment against the actual PR diff:
|
Contributor
|
@Copilot Please trigger the CI workflow. |
Signed-off-by: jiangxt2 <jiangxt2@vip.qq.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Partition Regex Fix (
DorisUtils.java)\\s*toPARTITION_INFO_PATTERNto tolerate whitespace betweenLIST/RANGEand(in Doris 3.0+SHOW CREATE TABLEoutput (PARTITION BY LIST (vsPARTITION BY LIST()Partition Column Extraction Fix (Copilot C3)
split(", ")tosplit(",")+trim()to handle commas with or without trailing spaceMulti-column LIST Partition Parsing
VALUES IN (("a", 1), ("b", 2))Backtick Partition Name Support (Copilot C4)
(\\w+)to(?:\\x60(\\w+)\\x60|(\\w+))to match both backtick-quoted and bare partition namesTest Coverage (Copilot C5)
PARTITION \\x60p1\\x60 VALUES IN (...))Does this PR introduce any user-facing change?
No. Partition parsing improvements are internal to Gravitino metadata loading.
How was this patch tested?
Unit tests in
TestDorisUtils:testExtractPartitionInfoFromSql: RANGE/LIST with/without spaces, multi-column LIST, backtick partition names, non-partitioned tablestestGeneratePartitionSqlFragment: RANGE MAXVALUE, LIST single/multi/multi-column valuesIntegration tests (Doris 4.0.6 and 3.0.6.2):
Related to #11590