Skip to content

C: Use Arena Allocator for Lexing and Parsing#726

Merged
marcoroth merged 15 commits intomainfrom
arena
Mar 4, 2026
Merged

C: Use Arena Allocator for Lexing and Parsing#726
marcoroth merged 15 commits intomainfrom
arena

Conversation

@marcoroth
Copy link
Owner

@marcoroth marcoroth commented Oct 24, 2025

This pull request introduces arena allocation for the lexer and parser, replacing "per-object" malloc/free calls with bulk allocation from a memory arena.

All allocated AST nodes, tokens, and internal strings are placed into a single arena that is freed in one shot after the parse tree has been converted to the binding's native objects.

The arena is accessed through hb_allocator_T, a vtable-based allocator abstraction introduced in #1287. This pull request switches the default backend from malloc to hb_arena_T across all bindings and the CLI.

The malloc backend remains available as a fallback. The intent is to validate the arena approach in production and then, once confident, simplify the abstraction away and use hb_arena_T directly.

The hb_allocator_T interface was extended with a destroy function pointer and a high-level hb_allocator_init(allocator, type) constructor that takes an hb_allocator_type_T enum (HB_ALLOCATOR_ARENA, HB_ALLOCATOR_MALLOC).

A separate hb_allocator_init_with_size variant accepts a custom initial arena size for edge cases. The default arena size is controlled by a compile-time HB_ALLOCATOR_DEFAULT_ARENA_SIZE flag.

The extract API (herb_extract, herb_extract_ruby_*, herb_extract_html_*) was also updated to accept an hb_allocator_T*.

Depends on #1287

@marcoroth marcoroth changed the title C: Use Arena for Lexing and Parsing C: Use Arena Allocator for Lexing and Parsing Oct 24, 2025
@marcoroth marcoroth marked this pull request as ready for review February 14, 2026 01:40
#include <string.h>

#define hb_arena_for_each_page(allocator, page) \
#define hb_arena_for_each_page(allocator, _page) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to remove the unused argument or do you have future plans for it? The macro name doesn't suggest that.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point was to self-document the variable that's going to be available inside the loop. But maybe that's confusing?

hb_arena_for_each_page(arena, page) {
  total += page->position;
}

vs:

hb_arena_for_each_page(arena) {
  total += page->position;
  //       ^ where is page coming from?
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it is slightly confusing because clangd sees it as unused. So to me it looks less self-documenting and more like a leftover from an older implementation. But there are bigger fish to fry. 😃

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it in a8e2f0c 🙌🏼

@github-actions github-actions bot added the rust label Feb 18, 2026
@pkg-pr-new
Copy link

pkg-pr-new bot commented Feb 18, 2026

npx https://pkg.pr.new/@herb-tools/formatter@726
npx https://pkg.pr.new/@herb-tools/language-server@726
npx https://pkg.pr.new/@herb-tools/linter@726

commit: f6fb2c1

@github-actions
Copy link

github-actions bot commented Feb 18, 2026

🌿 Interactive Playground and Documentation Preview

A preview deployment has been built for this pull request. Try out the changes live in the interactive playground:


🌱 Grown from commit f6fb2c1


✅ Preview deployment has been cleaned up.

@marcoroth marcoroth added this to the v1.0.0 milestone Mar 1, 2026
marcoroth added a commit that referenced this pull request Mar 3, 2026
This pull request extracts a `hb_allocator_T` for providing an interface
for allocating memory in the lexer and parser. In this pull request we
implement both a `malloc`-based and a `hb_arena_T`-based allocator.

Additionally, this updates all call-sites and functions to accept a new
allocator that can later be swapped to use the `hb_arena_T`-based
allocator in #726.
@marcoroth marcoroth merged commit 31471e9 into main Mar 4, 2026
32 checks passed
@marcoroth marcoroth deleted the arena branch March 4, 2026 08:49
marcoroth added a commit that referenced this pull request Mar 4, 2026
Context: https: //github.com//pull/726#pullrequestreview-3811782126

Co-Authored-By: Michael Kohl <me@citizen428.net>
marcoroth added a commit that referenced this pull request Mar 4, 2026
Context: https: //github.com//pull/726#pullrequestreview-3811782126

Co-Authored-By: Michael Kohl <me@citizen428.net>
@marcoroth marcoroth mentioned this pull request Mar 9, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants