Skip to content

Migrate case conversion and substr_index to fallible string view builder APIs#23074

Open
kosiew wants to merge 3 commits into
apache:mainfrom
kosiew:refactor-byte-accounting-03-22688
Open

Migrate case conversion and substr_index to fallible string view builder APIs#23074
kosiew wants to merge 3 commits into
apache:mainfrom
kosiew:refactor-byte-accounting-03-22688

Conversation

@kosiew

@kosiew kosiew commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

This PR continues the migration to fallible string builder APIs for string UDFs that previously relied on infallible append_value and append_placeholder calls.

The non-ASCII case conversion paths in string/common.rs and the generic implementations in unicode/substrindex.rs could panic when string view metadata exceeds ByteView's i32::MAX limits. Converting these call sites to use fallible builder APIs allows overflow conditions to be propagated as DataFusionErrors instead of causing panics.

This work is part of the broader effort in #22688 to eliminate panic-based overflow handling in string builders.

What changes are included in this PR?

  • Migrated non-ASCII case conversion paths in string/common.rs to use:

    • try_append_value
    • try_append_placeholder
  • Migrated substr_index_general and map_strings in unicode/substrindex.rs to use:

    • try_append_value
    • try_append_placeholder
  • Added fallible APIs to StringViewArrayBuilder:

    • try_append_value
    • try_append_placeholder
    • try_ensure_long_capacity
  • Added overflow error generation for StringView-specific limits:

    • string_view_overflow_error
  • Refactored StringView view construction to validate:

    • value length
    • buffer offsets
    • completed buffer count
  • Preserved existing infallible APIs (append_value, append_placeholder, ensure_long_capacity) as wrappers around the fallible implementations, maintaining existing behavior for callers that still use them.

Are these changes tested?

Yes.

Added the following unit tests:

  • string_view_builder_try_append_success_path
  • test_substr_index_all_nulls

These tests verify:

  • Successful operation of the new fallible StringView builder APIs.
  • Correct null propagation behavior in substr_index_general.

Existing case conversion and substr_index tests continue to validate the normal execution paths. No dedicated overflow-triggering tests were added in this PR.

Are there any user-facing changes?

No user-facing functionality changes are expected.

The primary behavioral change is that certain internal string-builder overflow conditions can now be returned as DataFusionErrors rather than triggering panics, improving robustness in extreme cases.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed.

kosiew added 3 commits June 22, 2026 14:26
…esting

- Refactored case conversion paths in common.rs to utilize try_append_* methods.
- Updated substr_index_general and map_strings in substrindex.rs to use try_append_* methods, and added an all-null regression test.
- Enhanced strings.rs by adding a fallible StringViewArrayBuilder::try_append_value method, a try_append_placeholder method, and a corresponding builder test.
…yBuilder

- Changed limit to field in string_view_overflow_error for better clarity.
- Removed unreachable expect in StringViewArrayBuilder::append_placeholder to improve code quality.
- Introduced make_long_view_checked and restored the receiver make_long_view for enhanced functionality.
- Shortened the all-null test case in substr_index for better readability and maintainability.
… documentation

- Reused new try_ensure_long_capacity in try_append_value.
- Updated ensure_long_capacity to delegate to a fallible helper while preserving panic mode.
- Added error documentation for try_append_placeholder.
@github-actions github-actions Bot added the functions Changes to functions implementation label Jun 22, 2026
@kosiew kosiew marked this pull request as ready for review June 22, 2026 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant