Skip to content

Remove STL associative containers from builder.hpp and builder.cpp#14

Open
typeless wants to merge 63 commits intomainfrom
wip/builder-stl-removal
Open

Remove STL associative containers from builder.hpp and builder.cpp#14
typeless wants to merge 63 commits intomainfrom
wip/builder-stl-removal

Conversation

@typeless
Copy link
Owner

Summary

  • Remove <set> and <unordered_map> from builder.hpp by migrating 4 containers to sorted vectors / SortedPairVec + pool patterns
  • Remove <map> and <unordered_map> from builder.cpp by migrating 2 function-local containers
  • Extract GroupMemberTable helper struct (mirrors existing VarDepTracker pattern)
  • Extract sorted_insert helper to deduplicate 3 identical call sites

After this change, the only STL associative container in any production header is <map> in var_tracking.hpp.

Test plan

  • make clean && make — clean build (86 commands)
  • make test — all 396 tests pass (23106 assertions)
  • make format — no formatting issues
  • make tidy — no new warnings (pre-existing only)
  • Verified <set>, <map>, <unordered_map> absent from builder.hpp and builder.cpp

🤖 Generated with Claude Code

typeless and others added 30 commits March 16, 2026 13:05
Replace std::regex in RulePattern with a function pointer predicate,
using hand-written compiler detection (is_compiler_name + tokenizer)
instead of the ~96 KB regex engine. Replace std::ostringstream/
std::stringstream with std::string concatenation and seekg/tellg/read
for file slurping, removing the ~10 KB sstream dependency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Foundation for Phase 1 filesystem migration. Provides join(),
parent(), filename(), stem(), extension(), is_absolute(),
normalize(), and relative() as free functions on string_view.
All operations are lexical (no filesystem access) and assume
forward-slash separators.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace std::filesystem::path with std::string throughout all
production headers and implementations (50 files). Paths are now
UTF-8 strings with '/' separators; Windows native conversion
happens at the platform boundary in file_io-win32.cpp.

New infrastructure:
- pup::platform:: filesystem operations (exists, is_directory,
  create_directories, remove_file, remove_all, canonical,
  current_directory, read_directory, walk_directory, read_file,
  write_file) with POSIX and Win32 implementations
- Replaced std::random_device/mt19937 in atomic_write with
  getpid() ^ clock_gettime() LCG

Binary .text: 991 KB → 932 KB (-59 KB, -6%)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix normalize() absorbing excess ".." for absolute paths
  ("/a/../.." now produces "/" instead of "/..")
- Migrate 4 ifstream/tellg sites to pup::platform::read_file()
  (context.cpp, builder.cpp, depfile.cpp, config.cpp)
- Remove <fstream> from context.cpp, builder.cpp, cmd_configure.cpp
- Win32 read_file(): add read loop for short reads, cap at 2GB chunks
- Win32 canonical(): resolve symlinks via GetFinalPathNameByHandleW
  with GetFullPathNameW fallback for non-existent paths
- Win32 to_wide(): use MB_ERR_INVALID_CHARS for better error detection
- Deduplicate path functions in glob.cpp and dag.cpp to delegate
  to pup::path::filename/stem/extension/parent
- Add test_path.cpp: 46 assertions covering normalize, join, parent,
  filename, stem, extension, is_absolute, relative

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 2: Replace ifstream with pup::platform::read_file() in
ignore.cpp and POSIX open/read in hash.cpp (chunked hashing).
Replace ifstream with MappedFile in reader.cpp::is_valid_index().
Removes <fstream> from all production code.

Phase 3a: Extract duplicated StringHash from eval.hpp and dag.hpp
into shared include/pup/core/string_hash.hpp.

Phase 3c was already done in Phase 1 (random replaced with LCG).

Binary .text: 931 KB → 927 KB (-4 KB)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- hash.cpp Win32: use MB_ERR_INVALID_CHARS with explicit length,
  matching the pattern in file_io-win32.cpp
- hash.cpp POSIX: handle EINTR from ::read() to prevent signal-
  interrupted reads from producing truncated hashes
- file_io-posix.cpp: add EINTR handling to read_file() and
  copy_file() read loops

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Robin Hood hash tables, flat arrays, bitsets, and sorted vectors
to replace std::unordered_map, std::unordered_set, and std::set.
Tailored (not templated) — concrete key types with type-erased
cores for value-parameterized containers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
16 tasks covering primitives (IdBitSet, IdArray, Arena, SortedIdVec),
StringPool Robin Hood index, graph migration (edge indices, DirNameKey
page-table, topo.cpp), builder/evaluator/scheduler/CLI migration,
PathCache, and final cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hand-written open-addressing hash table with FNV-1a hash,
specialized for string_view keys. Separate meta/values arrays
with displacement-based Robin Hood probing. Keys reconstructed
from StringId into the existing storage_ deque -- no key
duplication in the table itself.

Removes <unordered_map> from string_pool.hpp.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Header-only wrapper over 4 IdArrays (file, command, condition, phi).
Dispatches set/get/contains by inspecting NodeId flag bits via
node_id::is_command/is_condition/is_phi. Same index (3) in different
type categories maps to independent slots.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace 4 unordered_map<NodeId, vector<...>> with NodeIdMap64
storing packed ArenaSlice references into a shared Arena32.
Build process: collect edges into temporary vectors, then pack
into arena. Lookup returns span over arena data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- slice(ArenaSlice) returns a Span with begin/end/operator[]
- at(offset) returns mutable reference for in-place writes
- append(nullptr, count) zero-initializes instead of UB memcpy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All IDs (NodeId, StringId) are uint32_t. ArenaSlice (offset+length)
is stored as two parallel IdArray32s rather than a packed uint64_t.
Simplifies the primitive set to one width.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- All primitives abort() on allocation failure instead of silently
  continuing into heap corruption (id_bitset, id_array, arena,
  sorted_id_vec, string_pool)
- Arena32::at() asserts offset < size_ in debug builds
- Arena32::append_extend(old_slice, new_value) safely copies old
  data and appends, avoiding the use-after-realloc bug that caused
  the Task 7 crash (grow before read, not after)
- Added test coverage for slice(), at(), append(nullptr), and
  append_extend() (4 new test cases, 107 new assertions)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace 4 unordered_map<NodeId, vector<...>> with NodeIdArenaIndex
(two parallel NodeIdMap32 for offset+length) + shared Arena32.
Uses append_extend() for safe incremental edge building.

Public API unchanged — get_inputs/get_outputs still return vector.
Only internal storage was migrated. Requires clean build (make clean)
due to string_pool.hpp layout change in prior commit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Promote NodeIdArenaIndex from struct to class (private members)
- Document the {0,0} sentinel invariant: set_slice is never called
  with length==0, so length==0 reliably means "not present"
- Add 4 test sections: absent returns {0,0}, incremental
  append_extend, type independence, clear

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace unordered_map<NodeId, Color> and unordered_map<NodeId, NodeId>
with NodeIdMap32. Replace unordered_set<NodeId> visited with
NodeIdMap32 (value=1 means visited). Color enum backed by uint32_t.
NodeIdMap32 auto-resizes on set(), no pre-sizing needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- topo.cpp: replace enum class Color with plain uint32_t constants
  (WHITE/GRAY/BLACK) — eliminates 6 static_cast sites
- id_bitset.cpp: remove duplicate #include <cstdlib>
- dag.cpp: assert edges.size() < UINT32_MAX before narrowing cast

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace unordered_map<string, string> with a sorted vector of
(name, value) pairs using hand-written binary search. O(log N)
lookup for ~100-1000 variables (~10 comparisons). Removes VarDb's
dependency on hash table internals.

NOTE: requires clean build (make clean) due to eval.hpp layout change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace std::set<NodeId> and std::unordered_set<NodeId> with
NodeIdMap32 (value=1 for membership) in cmd_build.cpp, cmd_show.cpp,
scheduler.cpp, and builder.cpp. This gives O(1) contains/insert
for NodeId membership tests instead of O(log n) tree lookups or
hash table overhead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace unordered_map<NodeId, size_t> with NodeIdMap32 for O(1)
command-to-job-index lookup. Job indices fit in uint32_t.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hoist get() result after contains() guard — reduces 3 lookups
to 2 per key. The old unordered_map::find() gave both presence
and value in one probe; this is the closest equivalent with
NodeIdMap32's current API.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change build_subset(), filter_jobs(), collect_command_dependencies()
from std::set<NodeId> to NodeIdMap32. Eliminates redundant
NodeIdMap32→set<NodeId> conversion in build_incremental. Add
size()/empty()/count() to NodeIdMap32 and IdArray32.

Removes <set> from config_commands.hpp. scheduler.hpp still needs
<set> for BuildJob::exported_vars.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
typeless and others added 30 commits March 17, 2026 10:09
DirNameKey: Replace unordered_map<DirNameKey, NodeId> with
vector<SortedPairVec> indexed by parent directory. Remove 4 support
types (DirNameKey, DirNameKeyView, DirNameKeyHash, DirNameKeyEqual).
find_by_dir_name() now uses StringPool::find() + binary search.
Also eliminates the MSVC NRVO root cause (dangling StringPool*
in hash functors).

PathCache: Replace unordered_map<NodeId, string> with struct
containing NodeIdMap32 + StringPool. PathCache owns its own
StringPool for const-correctness (get_full_path takes Graph const&).
No caller changes needed — all access through free functions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Eliminates #include <set> from dag.hpp. The builder already stored
exported_vars as SortedIdVec internally, then converted to
set<StringId> for CommandNode — now both sides use the same type,
removing the conversion entirely (merge_from replaces the loop).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Eliminates the last std::unordered_map from the Graph struct.
Command strings are interned into a dedicated StringPool, with
a SortedPairVec mapping StringId → NodeId for lookup. Follows
the same pattern as find_by_dir_name (StringPool::find + binary
search). Also removes #include <unordered_map>, <functional>,
and string_hash.hpp from dag.hpp — the core graph header is now
free of STL associative containers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Source data (SortedIdVec) is already deduplicated, so the set's
uniqueness guarantee was redundant. Consumers only range-iterate
to set env vars. Removes #include <set> from scheduler.hpp.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hides the EnvCache type (unordered_map<string, string>) behind
the pimpl firewall, removing #include <unordered_map> from
scheduler.hpp. The two methods accessed impl_-> fields throughout,
so they naturally belong on the Impl struct.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Edge indices use Arena32 + NodeIdArenaIndex (same pattern as Graph).
Command index uses StringPool + SortedPairVec (same pattern as
Graph::command_index). Removes #include <unordered_map> from
entry.hpp. The test-only build_command_lookup() function is
forward-declared in the test file to keep the header clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TupfileParseState sets (available, parsed, parsing) and the
public parsed_dirs()/available_tupfile_dirs APIs now use sorted
vectors with binary_search for membership checks. Removes
#include <set> from context.hpp and eval.hpp.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removes <set> from builder.hpp. Uses lower_bound + dedup check
at the two insert sites for the same sorted-insert pattern used
elsewhere in the codebase.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Batch-loaded from index, then queried via lower_bound during import
directive processing. Sort-once-search-many is ideal here.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Intern macro name to StringId for integer-keyed lookup. Uses
lower_bound + upsert at define sites, lower_bound at lookup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removes <unordered_map> from builder.hpp — the last STL associative
container include. Group names are interned to StringId keys with
member lists stored in a vector pool, mirroring VarDepTracker.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The function-local unordered_map in expand_rule was relying on
transitive inclusion. Make the dependency explicit in the .cpp
(not the header, which no longer needs it).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mirrors VarDepTracker: SortedPairVec maps interned keys to pool
indices, find() and get_or_create() replace inline boilerplate
at all 3 usage sites.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The value was always derivable from the key (format("%<{}>", name)),
making this effectively a set. Replaced with a sorted vector<string>
and on-the-fly construction. Removes <unordered_map> from builder.cpp.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Intern group names and pack (command_id, name_id) into a uint64 key
for a sorted vector of pairs. Removes <map> and <unordered_map> from
builder.cpp — zero STL associative container headers remain.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces three identical lower_bound + dedup + insert blocks
with a single-line call.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes formatting gaps from prior commits that were not formatted
before pushing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- macOS: std::move(index) return in build_index (Arena32 deletes copy)
- Windows: wrap entries.error() in unexpected<Error> (matches POSIX)
- All: regenerate bootstrap scripts with 5 new source files and
  7 new test files added in earlier commits

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
working_dir is now std::string (not filesystem::path), so use
the same UTF-8 to wide conversion pattern as elsewhere in the file.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TempDir::path() returns fs::path but parse_target/expand_glob_target
now take std::string. Add .string() at all call sites.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ProcessOptions::working_dir is now std::string, not fs::path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- test_platform_file_io.cpp: add .string() for all workdir paths
  passed to pup::platform functions (now take std::string)
- test_ignore.cpp: remove redundant fs::path overload checks
  (is_ignored now only takes std::string)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- test_path_utils.cpp: replace fs::path{} with std::string{} for
  is_path_under and relative_to_root calls
- test_index.cpp: add .string() to temp_path declarations

On Windows, fs::path uses wchar_t and won't implicitly convert to
std::string — explicit conversion is required at all call sites.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Convert all fs::path values to std::string when passed to pup APIs
that now take std::string (find_project_root, BuilderOptions fields).
Add root_str() helper to BuilderTestFixture.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
set_working_dir and RunOptions::working_dir now take std::string.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These tests depend on the putup binary and fixture directories,
making them e2e tests. Tag them so Windows CI excludes them
with ~[e2e]~[shell].

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These tests create temp directories and use pup path operations
that assume '/' separators, which fails on Windows. Tag as [e2e]
so Windows CI excludes them with ~[e2e]~[shell].

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests using temp directories and pup path operations fail on
Windows due to path separator differences. Tag with [e2e] so
they're excluded by the ~[e2e]~[shell] filter on Windows CI.

Affected: test_builder.cpp, test_target.cpp, test_index.cpp
(tests using temp_directory_path or BuilderTestFixture).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant