Migrate case conversion and substr_index to fallible string view builder APIs#23074
Open
kosiew wants to merge 3 commits into
Open
Migrate case conversion and substr_index to fallible string view builder APIs#23074kosiew wants to merge 3 commits into
kosiew wants to merge 3 commits into
Conversation
…esting - Refactored case conversion paths in common.rs to utilize try_append_* methods. - Updated substr_index_general and map_strings in substrindex.rs to use try_append_* methods, and added an all-null regression test. - Enhanced strings.rs by adding a fallible StringViewArrayBuilder::try_append_value method, a try_append_placeholder method, and a corresponding builder test.
…yBuilder - Changed limit to field in string_view_overflow_error for better clarity. - Removed unreachable expect in StringViewArrayBuilder::append_placeholder to improve code quality. - Introduced make_long_view_checked and restored the receiver make_long_view for enhanced functionality. - Shortened the all-null test case in substr_index for better readability and maintainability.
… documentation - Reused new try_ensure_long_capacity in try_append_value. - Updated ensure_long_capacity to delegate to a fallible helper while preserving panic mode. - Added error documentation for try_append_placeholder.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
This PR continues the migration to fallible string builder APIs for string UDFs that previously relied on infallible
append_valueandappend_placeholdercalls.The non-ASCII case conversion paths in
string/common.rsand the generic implementations inunicode/substrindex.rscould panic when string view metadata exceeds ByteView'si32::MAXlimits. Converting these call sites to use fallible builder APIs allows overflow conditions to be propagated asDataFusionErrors instead of causing panics.This work is part of the broader effort in #22688 to eliminate panic-based overflow handling in string builders.
What changes are included in this PR?
Migrated non-ASCII case conversion paths in
string/common.rsto use:try_append_valuetry_append_placeholderMigrated
substr_index_generalandmap_stringsinunicode/substrindex.rsto use:try_append_valuetry_append_placeholderAdded fallible APIs to
StringViewArrayBuilder:try_append_valuetry_append_placeholdertry_ensure_long_capacityAdded overflow error generation for StringView-specific limits:
string_view_overflow_errorRefactored StringView view construction to validate:
Preserved existing infallible APIs (
append_value,append_placeholder,ensure_long_capacity) as wrappers around the fallible implementations, maintaining existing behavior for callers that still use them.Are these changes tested?
Yes.
Added the following unit tests:
string_view_builder_try_append_success_pathtest_substr_index_all_nullsThese tests verify:
substr_index_general.Existing case conversion and
substr_indextests continue to validate the normal execution paths. No dedicated overflow-triggering tests were added in this PR.Are there any user-facing changes?
No user-facing functionality changes are expected.
The primary behavioral change is that certain internal string-builder overflow conditions can now be returned as
DataFusionErrors rather than triggering panics, improving robustness in extreme cases.LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed.