Skip to content

Add new-format network_expansion templater#102

Open
nick-gorman wants to merge 14 commits into
mainfrom
new-format-network-expansion
Open

Add new-format network_expansion templater#102
nick-gorman wants to merge 14 commits into
mainfrom
new-format-network-expansion

Conversation

@nick-gorman
Copy link
Copy Markdown
Member

@nick-gorman nick-gorman commented May 7, 2026

Summary

  • Adds a new-format network_expansion templater that turns IASR flow-path and REZ
    augmentation tables into two ISPyPSA inputs: network_expansion_options (selected
    least-cost option per expandable element) and network_transmission_path_expansion_costs
    (long-form $/MW cost trajectory).
  • Wires the templater into create_ispypsa_inputs_template behind the existing
    use_new_table_format feature flag, with a granularity-aware filter that drops or
    re-keys augmentation entries when paths are aggregated to NEM regions / single region.
  • Extends _template_network_transmission to inject zero-capacity parallel paths for
    augmentation corridors (e.g. CNSW-SNW) that exist alongside suffixed siblings
    (CNSW-SNW_NTH/_STH) — without this, the orchestrator misclassifies them as
    constraint relaxations.
  • Partitions the IASR table cache by workbook version on disk (6.0/, 7.5/) and
    drives required-table discovery from a checked-in known_tables.yaml manifest so the
    augmentation prefixes can be enumerated per version.
  • Schema updates: network_expansion_options now keys on (expansion_id, expansion_type)
    with allowed values forward/reverse/constraint_relaxation; cost-per-MW divisor
    documented as max(forward, reverse).
  • Supporting changes: env-var override for feature flags so subprocess CLI tests can flip
    them, _financial_year_string_to_end_year_int helper, deduplicated fuzzy-match log
    lines, and CLAUDE.md additions covering I/O-example docstrings, integration-test
    scope, and the "no hidden preconditions" rule.

Example output

network_expansion_options — physical paths emit forward+reverse rows; constraint
groups (ids not in network_transmission_paths) emit one constraint_relaxation row:

expansion_id  expansion_type          allowed_expansion  expansion_option
CQ-NQ         forward                 1000               Option 1   # asymmetric path: forward MW from selected option
CQ-NQ         reverse                 1200               Option 1   # ...and reverse MW from same option (least $/MW)
DN1-CNSW      forward                 500                Option 2a  # source had reverse_mw = NaN
DN1-CNSW      reverse                 0                  Option 2a  # NaN -> 0 (option provides no expansion this direction)
N1-NNSW       forward                 1660               Option 1   # REZ path: symmetric (forward == reverse)
N1-NNSW       reverse                 1660               Option 1
SWQLD1        constraint_relaxation   330                Option 1   # constraint group: single row, not in network_transmission_paths

network_transmission_path_expansion_costs — long-form, $/MW of the larger directional
capacity:

expansion_id  year  cost
CQ-NQ         2025  416666.67    # 500M / max(1000, 1200) = 500M / 1200 (asymmetric -> divisor is the larger side)
CQ-NQ         2026  425000.00    # next-year cost / same divisor (escalation visible across years)
N1-NNSW       2025  3539566.27   # 5.875B / 1660 (symmetric -> unambiguous divisor)
N1-NNSW       2026  3593401.81
SWQLD1        2025  1515.15      # 500k / 330 (constraint group: divisor is its own capacity)

Where the changes live

src/ispypsa/
├── feature_flags.py                            # env-var overrides for subprocess tests
├── cli/dodo.py                                 # version-aware cache target list
├── iasr_table_caching/
│   ├── known_tables.yaml                       # NEW — per-version table manifest
│   └── local_cache.py                          # version-partitioned cache, prefix-driven aug discovery
├── templater/
│   ├── create_template.py                      # new-format branch wires in network_expansion
│   ├── network_expansion.py                    # NEW — orchestrator + helpers (~970 lines)
│   ├── transmission.py                         # _append_new_parallel_paths + flow_path_options arg
│   └── helpers.py                              # _financial_year_string_to_end_year_int, dedup'd fuzzy log
└── validation/schemas/
    ├── network_expansion_options.yaml          # expansion_type column, composite uniqueness
    └── network_transmission_path_expansion_costs.yaml  # cost-per-MW divisor doc

tests/
├── test_workbook_table_cache/
│   ├── 6.0/                                    # existing fixtures, moved
│   └── 7.5/                                    # NEW — fixtures for new-format pathway
├── test_templater/
│   ├── test_network_expansion.py               # NEW — ~870 lines, per-helper coverage
│   ├── test_transmission.py                    # parallel-path wiring
│   └── test_create_ispypsa_inputs_template.py  # integration wiring
├── test_cli/
│   ├── cli_test_helpers_new_table_formats.py   # NEW
│   └── test_create_ispypsa_inputs_new_table_formats.py  # NEW — end-to-end CLI run
└── test_iasr_table_caching/test_local_cache.py # version-partitioned cache assertions

scripts/build_75_test_cache.py                  # NEW — one-off to regenerate 7.5 fixtures
CLAUDE.md                                       # I/O example, integration test, hidden-precondition rules

nick-gorman and others added 13 commits May 6, 2026 11:10
Extends the new-format templater with two output tables:
network_expansion_options and network_transmission_path_expansion_costs.
A single expansion_id keyed by expansion_type (forward / reverse /
constraint_relaxation) unifies physical paths and constraint-group
relaxations under one schema, so downstream consumers don't have to
join back to network_transmission_paths to classify rows.

Option selection picks the lowest dollars-per-MW per expansion_id using
the first year with complete costs. Cost is divided by max(forward,
reverse) so an asymmetric option can be represented in the translator
as a single extendable PyPSA Link.

Known-table discovery for the local cache moves from hard-coded lists
to a static manifest (known_tables.yaml), so augmentation tables can
be enumerated by prefix — necessary because v7.5 has one table per
flow path x scenario.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
At nem_regions and single_region the network paths table is aggregated
before the expansion templater runs, so flow-path augmentation entries
keyed by raw IASR sub-region path IDs no longer line up with the
surviving paths. Drop intra-region entries and re-key cross-region
ones (NNSW-SQ -> NSW-QLD, suffixes preserved); at single_region drop
all flow-path augmentations entirely. REZ and constraint-group entries
are unaffected: REZ entries remap automatically via the geo-to-path
lookup built from the already-aggregated paths, and constraint groups
stay valid at all granularities since they can still bite on REZ-to-
region lines.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audited the file against the CLAUDE.md rules that landed on
new-format-network-tables and applied the same treatment we did to the
transmission tests: full-DataFrame assertions instead of row-count / set
membership / iloc / pd.isna probes, full log lines instead of marker +
per-name any() checks, and csv_str_to_df for empty expected outputs.

Also dropped four of the five DataFrame-builder helpers (_fp_costs,
_rez_options, _rez_costs, _paths_table) — they hid only short, non-private
column lists, and inlining via csv_str_to_df reads more consistently with
the rest of the file. _fp_options is kept because its column list pulls in
two private constants from the source module that csv_str_to_df can't
reference; a file-level docstring records the rationale.

While reviewing test_first_year_with_complete_costs_warns_..., the source
warning was too terse ("No year with complete costs for 'X'; skipping.")
to be useful in a real run. Replaced with a message that names the failure
mode and the likely cause.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same audit pass as the test_network_expansion.py cleanup, applied to the three
new-format integration tests:

- Drop redundant `assert "..." in result` lines from _new_format; the
  immediately-following `result["..."]` access raises KeyError with the same
  diagnostic and is consistent with the other two integration tests.
- Drop three module-level column constants (_FP_AUG_COST_COLS,
  _REZ_AUG_OPTION_COLS, _REZ_AUG_COST_COLS) and the
  pd.DataFrame([(...)], columns=...) input pattern that depended on them;
  inline as csv_str_to_df instead. Keep _FP_AUG_OPTION_COLS, which still
  pulls two private constants from the source module.

Left the `set(expansion["expansion_id"]) == {...}` content checks in the
nem_regions and single_region integration tests in place: they pin the
intersection of granularity, REZ remapping, and augmentation filtering at
the orchestrator level — worth the slight maintenance cost over the strict
"presence + columns + row count" rule.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The unit test for _new_parallel_path_rows in test_network_expansion.py
exercised the helper's content but no integration test triggered the
un-suffixed-corridor branch via create_ispypsa_inputs_template. A refactor
that dropped the _append_new_parallel_paths(...) call from create_template.py
would have passed all tests.

Extend test_create_ispypsa_inputs_template_new_format with two suffixed
siblings (CNSW-SNW (NTH), (STH)) in flow_path_transfer_capability plus an
un-suffixed CNSW-SNW augmentation. The new
`assert "CNSW-SNW" in set(paths["path_id"])` is the load-bearing check —
its comment names _append_new_parallel_paths so a regression failure
points straight at the broken wiring.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
_log_fuzzy_match emitted one log line per row of the input series, so
when callers passed e.g. one cost row per year for the same option name,
each name-matching decision was logged N times. In a real run with several
years × dozens of expansions, this produced hundreds of redundant lines
that masked the actually-distinct decisions.

Dedup with sorted(set(zip(...))) so each (original, match) decision
appears exactly once. The CLAUDE.md exception that lets fuzzy matching
log per-decision (rather than as a summary) is preserved — one line per
distinct decision is the audit unit, not one line per row. Sorted output
also gives stable ordering across runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous wording ("aggregate individual row contents into a sorted
list") was being read as a blanket rule. The actual concern was redundant
firings of one logical event (e.g. once per year per option), not
per-decision logs where each line is a distinct audit point. Recast the
rule around that, with the existing fuzzy-match log promoted from
"exception" to canonical example of the per-decision pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two issues from a code-review pass.

The parallel-path append (for augmentation corridors with no matching
existing path, e.g. CNSW-SNW alongside CNSW-SNW_NTH/_STH) used to live
in create_template.py, where its position in the call sequence was an
implicit contract: if reordered, those corridors would silently
misclassify as constraint groups in the expansion output. Moved the
append into _template_network_transmission so the contract is enforced
where the paths are built. Pulled the design rationale (corridor-keyed
augmentations, why a synthetic third Link, why explicit zero capacity)
into the docstring of _append_new_parallel_paths in its new home.

_build_geo_from_to_path_id_map collapsed duplicate geo_from values for
subregions, relying on the implicit guarantee that REZ option tables
never contain subregion IDs — a hidden precondition. Threaded rez_ids
through _template_network_expansion so the map is built from REZ rows
only. The collision becomes structurally impossible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A multi-agent docstring sweep flagged four examples that had drifted
from the code or under-documented their outputs:

- _aggregate_to_nem_regions: the example dropped the N1-NSW NaN row
  from the returned limits, even though _remap_limit_path_ids keeps any
  row whose path_id is in the rename map.
- _append_new_parallel_paths: the rationale spends a paragraph on why
  limits are explicit zeros, not NaN — but the example only showed the
  paths half of the returned tuple. Added the limits side so the
  zero-capacity rows are visible alongside the rationale.
- _template_network_expansion: the example didn't include the new
  required rez_ids input, so a reader couldn't reproduce it.
- _aggregate_flow_path_augmentations_to_nem_regions: shown only as set
  notation over keys, hiding the dict-of-DataFrames structure and the
  "Flow path" column rewrite. Aligned with the wrapper's format.

Also fixed a stray double-space in the def line of the same function.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds direct tests for utilities and branches that were only covered
end-to-end (or not at all): the IASR-prefix typo absorption, the
no-numeric-capacity INFO log, em-dash alignment, earliest-complete-year
selection, the unknown-granularity ValueError, and the small parsing
helpers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The use_new_table_format=true path had no test driving it via the CLI,
which is the surface most likely to regress when AEMO updates the 7.5
workbook or the templater logic shifts. Added a parameterised CLI test
over all three regional granularities that asserts row counts derived
from named structural quantities (flow paths, REZs, parallel-path
injections, REZs without limits, etc.) plus referential integrity
between paths/limits and options/costs.

To keep CI off the workbook binary, the input fixture is committed as
parsed CSVs at tests/test_workbook_table_cache/7.5/. The existing 6.0
truncated fixture was moved into a sibling 6.0/ subdir so the two
versions sit alongside each other and serve different purposes
(truncated unit-test inputs vs full e2e inputs) without aliasing.

Flag flips for subprocess CLI tests need to cross the process boundary,
so feature_flags.py now honours an ISPYPSA_USE_NEW_TABLE_FORMAT env
override on top of the YAML default.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The two tests had identical inputs and exercised the same code path;
collapsing them follows the combined output-plus-log pattern already
used elsewhere in this file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
list_templater_output_files returned the old-format file names regardless
of feature flag, so for new-format runs doit's target list pointed at
files that don't exist. The task therefore re-ran on every invocation,
silently masking the cache-skipping behaviour. Added a feature-flag
branch returning _NEW_FORMAT_TEMPLATE_OUTPUTS for new-format runs.

The bug was discovered by extending the new-format CLI test to do a
second invocation and assert up-to-date detection. That assertion lives
in a new mechanism test sibling to the existing 6.0 mechanism test —
test_create_ispypsa_inputs_task_new_format — covering the same fresh-run
/ up-to-date / config_changed / extensive-trigger flow against the
new-format CLI path.

The new-format coverage is split into a parallel test file rather than
parameterising the existing tests. Trades some duplication during the
transition for a cleaner handover when 6.0 is dropped: the legacy file
is deleted and the new-format file is renamed in place, no diffs inside
test bodies. Helpers are split the same way — format-agnostic
infrastructure (run_cli_command, build_mock_config, etc.) stays shared
in cli_test_helpers.py; the 7.5-specific fixtures live in
cli_test_helpers_new_table_formats.py. Step-by-step handover plan is
documented in the new test file's module docstring.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

❌ Patch coverage is 96.38554% with 9 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/ispypsa/feature_flags.py 66.66% 1 Missing and 1 partial ⚠️
src/ispypsa/iasr_table_caching/local_cache.py 84.61% 2 Missing ⚠️
src/ispypsa/templater/create_template.py 86.66% 1 Missing and 1 partial ⚠️
src/ispypsa/templater/network_expansion.py 98.93% 1 Missing and 1 partial ⚠️
src/ispypsa/templater/transmission.py 95.65% 0 Missing and 1 partial ⚠️
Files with missing lines Coverage Δ
src/ispypsa/templater/helpers.py 98.13% <100.00%> (+0.01%) ⬆️
src/ispypsa/templater/transmission.py 98.73% <95.65%> (+0.18%) ⬆️
src/ispypsa/feature_flags.py 81.81% <66.66%> (-18.19%) ⬇️
src/ispypsa/iasr_table_caching/local_cache.py 74.28% <84.61%> (+5.32%) ⬆️
src/ispypsa/templater/create_template.py 89.18% <86.66%> (-1.44%) ⬇️
src/ispypsa/templater/network_expansion.py 98.93% <98.93%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Adds FEATURE_FLAG_CLEANUP[use_new_table_format] markers at every site that
will need attention when the feature flag is retired. A single grep across
the repo will surface the full removal checklist instead of relying on
recall of where the gating lives.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nick-gorman nick-gorman requested a review from EllieKallmier May 13, 2026 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant