C: Use hb_string_T for token_T.value#687
Merged
marcoroth merged 47 commits intomarcoroth:mainfrom Mar 6, 2026
Merged
Conversation
hb_string_T for token_T.value`hb_string_T for token_T.value
cad26cb to
9488839
Compare
b29d6cc to
b867795
Compare
b867795 to
083a56b
Compare
d8870c7 to
3e47f21
Compare
3fecb53 to
8b26b90
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request changes the
token_T.valueto usehb_string_Tand adapts the call sites. Token values become non-owning slices into the source buffer, eliminating per-tokenstrdupcalls during lexing and parsing. As a result,token_copybecomes a shallow struct copy andtoken_freeno longer frees the value.Two constants,
HB_STRING_EMPTYandHB_STRING_NULL, are introduced to distinguish a valid empty string from the absence of a value. Call sites that previously checkedtoken->value == NULLnow usehb_string_is_null, while sites that checked for empty content usehb_string_is_empty.The public API is simplified by removing
herb_lex_fileandherb_lex_to_buffer.herb_lex_filehad an inconsistent lifetime contract, it read a file, lexed it, then freed the source, leaving tokens with dangling pointers. Callers should now read the file themselves and pass thesourcetoherb_lex.herb_lex_to_bufferwas only used by the C-CLI and C tests, so it moves to an internallex_helpers.hheader.Since token values are non-owning, callers must keep the source string alive for as long as tokens or AST nodes are in use. All existing bindings already satisfy this naturally, they hold a reference to the source (Ruby string, JNI string,
std::string, etc.) for the duration of the operation and convert to native objects before releasing it.Comparison
make bench_allocsBefore (current main
2990a34cc77681bcd45bb21b4c4320065c5ea129)After (this pull request)
Conclusion
This pull request does 50% less allocations while lexing and 40-50% less allocations while parsing, though overall, it has to slightly allocate more total bytes.