Add new-format network_expansion templater#102
Open
nick-gorman wants to merge 14 commits into
Open
Conversation
Extends the new-format templater with two output tables: network_expansion_options and network_transmission_path_expansion_costs. A single expansion_id keyed by expansion_type (forward / reverse / constraint_relaxation) unifies physical paths and constraint-group relaxations under one schema, so downstream consumers don't have to join back to network_transmission_paths to classify rows. Option selection picks the lowest dollars-per-MW per expansion_id using the first year with complete costs. Cost is divided by max(forward, reverse) so an asymmetric option can be represented in the translator as a single extendable PyPSA Link. Known-table discovery for the local cache moves from hard-coded lists to a static manifest (known_tables.yaml), so augmentation tables can be enumerated by prefix — necessary because v7.5 has one table per flow path x scenario. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
At nem_regions and single_region the network paths table is aggregated before the expansion templater runs, so flow-path augmentation entries keyed by raw IASR sub-region path IDs no longer line up with the surviving paths. Drop intra-region entries and re-key cross-region ones (NNSW-SQ -> NSW-QLD, suffixes preserved); at single_region drop all flow-path augmentations entirely. REZ and constraint-group entries are unaffected: REZ entries remap automatically via the geo-to-path lookup built from the already-aggregated paths, and constraint groups stay valid at all granularities since they can still bite on REZ-to- region lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audited the file against the CLAUDE.md rules that landed on
new-format-network-tables and applied the same treatment we did to the
transmission tests: full-DataFrame assertions instead of row-count / set
membership / iloc / pd.isna probes, full log lines instead of marker +
per-name any() checks, and csv_str_to_df for empty expected outputs.
Also dropped four of the five DataFrame-builder helpers (_fp_costs,
_rez_options, _rez_costs, _paths_table) — they hid only short, non-private
column lists, and inlining via csv_str_to_df reads more consistently with
the rest of the file. _fp_options is kept because its column list pulls in
two private constants from the source module that csv_str_to_df can't
reference; a file-level docstring records the rationale.
While reviewing test_first_year_with_complete_costs_warns_..., the source
warning was too terse ("No year with complete costs for 'X'; skipping.")
to be useful in a real run. Replaced with a message that names the failure
mode and the likely cause.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same audit pass as the test_network_expansion.py cleanup, applied to the three
new-format integration tests:
- Drop redundant `assert "..." in result` lines from _new_format; the
immediately-following `result["..."]` access raises KeyError with the same
diagnostic and is consistent with the other two integration tests.
- Drop three module-level column constants (_FP_AUG_COST_COLS,
_REZ_AUG_OPTION_COLS, _REZ_AUG_COST_COLS) and the
pd.DataFrame([(...)], columns=...) input pattern that depended on them;
inline as csv_str_to_df instead. Keep _FP_AUG_OPTION_COLS, which still
pulls two private constants from the source module.
Left the `set(expansion["expansion_id"]) == {...}` content checks in the
nem_regions and single_region integration tests in place: they pin the
intersection of granularity, REZ remapping, and augmentation filtering at
the orchestrator level — worth the slight maintenance cost over the strict
"presence + columns + row count" rule.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The unit test for _new_parallel_path_rows in test_network_expansion.py exercised the helper's content but no integration test triggered the un-suffixed-corridor branch via create_ispypsa_inputs_template. A refactor that dropped the _append_new_parallel_paths(...) call from create_template.py would have passed all tests. Extend test_create_ispypsa_inputs_template_new_format with two suffixed siblings (CNSW-SNW (NTH), (STH)) in flow_path_transfer_capability plus an un-suffixed CNSW-SNW augmentation. The new `assert "CNSW-SNW" in set(paths["path_id"])` is the load-bearing check — its comment names _append_new_parallel_paths so a regression failure points straight at the broken wiring. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
_log_fuzzy_match emitted one log line per row of the input series, so when callers passed e.g. one cost row per year for the same option name, each name-matching decision was logged N times. In a real run with several years × dozens of expansions, this produced hundreds of redundant lines that masked the actually-distinct decisions. Dedup with sorted(set(zip(...))) so each (original, match) decision appears exactly once. The CLAUDE.md exception that lets fuzzy matching log per-decision (rather than as a summary) is preserved — one line per distinct decision is the audit unit, not one line per row. Sorted output also gives stable ordering across runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous wording ("aggregate individual row contents into a sorted
list") was being read as a blanket rule. The actual concern was redundant
firings of one logical event (e.g. once per year per option), not
per-decision logs where each line is a distinct audit point. Recast the
rule around that, with the existing fuzzy-match log promoted from
"exception" to canonical example of the per-decision pattern.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two issues from a code-review pass. The parallel-path append (for augmentation corridors with no matching existing path, e.g. CNSW-SNW alongside CNSW-SNW_NTH/_STH) used to live in create_template.py, where its position in the call sequence was an implicit contract: if reordered, those corridors would silently misclassify as constraint groups in the expansion output. Moved the append into _template_network_transmission so the contract is enforced where the paths are built. Pulled the design rationale (corridor-keyed augmentations, why a synthetic third Link, why explicit zero capacity) into the docstring of _append_new_parallel_paths in its new home. _build_geo_from_to_path_id_map collapsed duplicate geo_from values for subregions, relying on the implicit guarantee that REZ option tables never contain subregion IDs — a hidden precondition. Threaded rez_ids through _template_network_expansion so the map is built from REZ rows only. The collision becomes structurally impossible. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A multi-agent docstring sweep flagged four examples that had drifted from the code or under-documented their outputs: - _aggregate_to_nem_regions: the example dropped the N1-NSW NaN row from the returned limits, even though _remap_limit_path_ids keeps any row whose path_id is in the rename map. - _append_new_parallel_paths: the rationale spends a paragraph on why limits are explicit zeros, not NaN — but the example only showed the paths half of the returned tuple. Added the limits side so the zero-capacity rows are visible alongside the rationale. - _template_network_expansion: the example didn't include the new required rez_ids input, so a reader couldn't reproduce it. - _aggregate_flow_path_augmentations_to_nem_regions: shown only as set notation over keys, hiding the dict-of-DataFrames structure and the "Flow path" column rewrite. Aligned with the wrapper's format. Also fixed a stray double-space in the def line of the same function. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds direct tests for utilities and branches that were only covered end-to-end (or not at all): the IASR-prefix typo absorption, the no-numeric-capacity INFO log, em-dash alignment, earliest-complete-year selection, the unknown-granularity ValueError, and the small parsing helpers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The use_new_table_format=true path had no test driving it via the CLI, which is the surface most likely to regress when AEMO updates the 7.5 workbook or the templater logic shifts. Added a parameterised CLI test over all three regional granularities that asserts row counts derived from named structural quantities (flow paths, REZs, parallel-path injections, REZs without limits, etc.) plus referential integrity between paths/limits and options/costs. To keep CI off the workbook binary, the input fixture is committed as parsed CSVs at tests/test_workbook_table_cache/7.5/. The existing 6.0 truncated fixture was moved into a sibling 6.0/ subdir so the two versions sit alongside each other and serve different purposes (truncated unit-test inputs vs full e2e inputs) without aliasing. Flag flips for subprocess CLI tests need to cross the process boundary, so feature_flags.py now honours an ISPYPSA_USE_NEW_TABLE_FORMAT env override on top of the YAML default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The two tests had identical inputs and exercised the same code path; collapsing them follows the combined output-plus-log pattern already used elsewhere in this file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
list_templater_output_files returned the old-format file names regardless of feature flag, so for new-format runs doit's target list pointed at files that don't exist. The task therefore re-ran on every invocation, silently masking the cache-skipping behaviour. Added a feature-flag branch returning _NEW_FORMAT_TEMPLATE_OUTPUTS for new-format runs. The bug was discovered by extending the new-format CLI test to do a second invocation and assert up-to-date detection. That assertion lives in a new mechanism test sibling to the existing 6.0 mechanism test — test_create_ispypsa_inputs_task_new_format — covering the same fresh-run / up-to-date / config_changed / extensive-trigger flow against the new-format CLI path. The new-format coverage is split into a parallel test file rather than parameterising the existing tests. Trades some duplication during the transition for a cleaner handover when 6.0 is dropped: the legacy file is deleted and the new-format file is renamed in place, no diffs inside test bodies. Helpers are split the same way — format-agnostic infrastructure (run_cli_command, build_mock_config, etc.) stays shared in cli_test_helpers.py; the 7.5-specific fixtures live in cli_test_helpers_new_table_formats.py. Step-by-step handover plan is documented in the new test file's module docstring. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds FEATURE_FLAG_CLEANUP[use_new_table_format] markers at every site that will need attention when the feature flag is retired. A single grep across the repo will surface the full removal checklist instead of relying on recall of where the gating lives. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
network_expansiontemplater that turns IASR flow-path and REZaugmentation tables into two ISPyPSA inputs:
network_expansion_options(selectedleast-cost option per expandable element) and
network_transmission_path_expansion_costs(long-form $/MW cost trajectory).
create_ispypsa_inputs_templatebehind the existinguse_new_table_formatfeature flag, with a granularity-aware filter that drops orre-keys augmentation entries when paths are aggregated to NEM regions / single region.
_template_network_transmissionto inject zero-capacity parallel paths foraugmentation corridors (e.g.
CNSW-SNW) that exist alongside suffixed siblings(
CNSW-SNW_NTH/_STH) — without this, the orchestrator misclassifies them asconstraint relaxations.
6.0/,7.5/) anddrives required-table discovery from a checked-in
known_tables.yamlmanifest so theaugmentation prefixes can be enumerated per version.
network_expansion_optionsnow keys on(expansion_id, expansion_type)with allowed values
forward/reverse/constraint_relaxation; cost-per-MW divisordocumented as
max(forward, reverse).them,
_financial_year_string_to_end_year_inthelper, deduplicated fuzzy-match loglines, and CLAUDE.md additions covering I/O-example docstrings, integration-test
scope, and the "no hidden preconditions" rule.
Example output
network_expansion_options— physical paths emit forward+reverse rows; constraintgroups (ids not in
network_transmission_paths) emit oneconstraint_relaxationrow:network_transmission_path_expansion_costs— long-form, $/MW of the larger directionalcapacity:
Where the changes live