Add initial set of generation and storage schemas by EllieKallmier · Pull Request #93 · Open-ISP/ISPyPSA

EllieKallmier · 2026-04-14T07:33:15Z

Adds YAML validation schemas for generation and storage input tables to the initial set of ISPyPSA input table schemas.

Schemas added:

generators_existing_planned.yaml - Existing and planned generator characteristics
generators_new_entrant.yaml - New entrant generator technology options
storage_existing_planned.yaml - Existing and planned storage unit characteristics
storage_new_entrant.yaml - New entrant storage technology options (battery, PHES)
costs_connection.yaml - Connection cost data
costs_fuel_prices.yaml - Fuel price projections
costs_new_entrant_build.yaml - Build cost projections for new entrants
emissions_reduction.yaml - Emissions reduction targets/constraints

Still to be added: policy table schemas.

Schemas follow structure as outlined in #85 and mostly follow the draft table structures given in the alternative templater review, with tweaks to match the 2026 (v7.5) IASR workbook data.

codecov · 2026-04-14T07:36:49Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
see 6 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

nick-gorman

This looks great Ellie, just a few comments / questions.

I made a bunch of comments on just one table, but they might apply in other cases. But this should be obvious.
In the inline comments I suggested validating the fuel types in the fuel cost table off the generator tables. But could also make sense to do this in reverse, if the fuel cost table exists require that the generator table values exist in the fuel cost table. But then this raises the question of what happens when the fuel cost table isn't provided. Another option would be to have a canonical values yaml (or csv) and we validate off it for things like fuel types and technology types etc. This is a broader issue than fuel types but just using it as an exmaple.
Do you think we should have a standard way of separating descriptions notes as pertaining to source / IASR vs ISPyPSA behaviour.

nick-gorman · 2026-04-16T22:15:18Z

+      If costs are given by region in source data, create separate rows for each
+      geo_id in the region with the same cost.


Great idea to have this note. Do you think we should consistently use a heading like "Source notes", "Data preparation notes" to preface notes like this. Anyway not a big one, just a thought.

nick-gorman · 2026-04-16T22:18:33Z

+      Standardised technology name mapping to new entrant technologies in
+      `generators_new_entrant` and/or `storage_new_entrant` tables.
+
+      If blank: treat as geo_id-level VRE connection cost.


Should this be more explicit. Maybe, "If blank: the cost is applied to all new entrant technologies in geo_id which have not been provided a technology specific connection cost for the applicable year."

nick-gorman · 2026-04-16T22:20:14Z

+    type: int
+    required: true
+    description: >
+      Financial year in which this cost applies.


Maybe just "Year" to leave open the cost being applied on calendar year basis.

Yep I hear this - maybe also though I'd add a note (but maybe in the description?) that says something along the lines of "Year type is either financial or calendar year based on config" (phrased better).

Side thought: I'm not sure how we want to handle converting data that's given as FY into calendar (or vice versa) when it's just a single value for each year? I don't think a big deal but good to take a standard approach

On the side thought, agree, but I would kind of leave this up to the user. I.e. the templater doesn't have a calendar year option, but if the user wants to fill out the ISPyPSA tables with their own inpus they are free to specify them as calendar years.

nick-gorman · 2026-04-16T22:22:50Z

+  If absent: no dynamic marginal costs calculated. Requires later user input of
+  fixed or dynamic marginal cost.


Should we be more specific about where these inputs would need to be provided if they aren't given here?

nick-gorman · 2026-04-16T22:36:38Z

+
+      Should match fuel_type values in `generators_existing_planned` or


Should these be validated?

nick-gorman · 2026-04-16T22:47:32Z

+  fuel_type:
+    type: string
+    required: true
+    description: Fuel type used by the generator or storage unit.


Should we note here the model behaviour if the fuel_type (or maybe even fuel_type price_node combo) for a generator is missing? Is this allowed? Maybe its the same behaviour as if the whole table is missing but just applied to a singe generator. Should this be stated at the table description level, noting that if absent behaviour applies to the whole table and on row basis. And I guess this commet applies to other tables as well.

I have also been wondering about this, and I think it's still for me a bit of a question around whether what I've sort of proposed for the missing table case is the best way forward. But yes as it stands I think it makes sense to add some cross-validation here to the summary tables and as you say add a table-level note re: if absent behaviour.

nick-gorman · 2026-04-16T22:54:31Z

+      Note: for all power stations except for Kogan Gas, this should exactly match
+      the `power_station` column.
+
+      If absent: assume no dynamic marginal price is calculated for this model run.


Does this mean zero cost or a fixed cost?

I was thinking it would mean a fixed price (user to supply), or that users would need to provide dynamic prices (if that's desired). I can clarify this wording to be more specifically about the calculation of dynamic prices and the consequence/required action.

I'm also not totally convinced this is the best way to handle this scenario, I just think that this is a case with an obvious set of options that might be desirable to users AND can offer a simplification. I would be very happy to chat more about this case and whether there's a nicer way to implement!

Maybe if fuel_price_node isn't supplied then the user needs to add a column called fuel_price, which would just specify one fixed price. Or you could allow numeric values in fuel_price_node which override the dynamic values, and just not allow NaN. The user can set to zero if thats what they want.

nick-gorman · 2026-04-16T23:04:30Z

+    required: false
+    units: '%'
+    description: >
+      Maximum allowed state of charge (%). Must be between 0.0 and 100.0,


validation?

yes for sure - at the time I was thinking it might be easier to define column-level validation rules (in the schema) once we had a more solid plan for how the validation will be handled (i.e. if there's an existing package or smth we want to use). But yeah it's probably just as easy to define clearly now and update that as we go. I might in that case use the custom_validation attribute at the column level and start defining a suite of standard validations. Lmk if you have other ideas or preferences!

also - could use allowed_values but atm that's defined as a list of permitted values, which doesn't translate well here; we could instead update the definition of that attribute.

Ok, yep, I agree it could make sense to wait till we know more about how we will be doing the validation.

nick-gorman · 2026-04-16T23:04:58Z

+    type: date
+    required: false
+    description: >
+      Date when the storage unit begins operation. Format: %d/%m/%Y


Validation?

…, cross-table references, and validation rules

…_values_from in new entrants tables

EllieKallmier · 2026-05-07T02:29:31Z

Collection of my (edited) thoughts and updates:

New schema additions for validation metadata

Super open to not doing this/changing stuff around - particularly as validation “enforcement” gets set up and might require certain structures etc.. But to summarise - shift away from embedding validation constraints as prose in description fields (e.g. "All values should be >= $0.0") and towards explicit fields at the column level. The new fields I’m testing out are:

Field	Meaning
`gte`	Greater than or equal to (inclusive lower bound)
`gt`	Strictly greater than (exclusive lower bound)
`lte`	Less than or equal to (inclusive upper bound)
`lt`	Strictly less than (exclusive upper bound)
`format`	Format string for date or string pattern validation (e.g. `"%d/%m/%Y"`)
`allowed_values`	Explicit list of permitted values
`allowed_values_from`	Reference to a column in another table whose values define the permitted set
`nan_fill`	Value applied to null/NaN cells at the validation enforcement step - IF `required : false` for this column.

The intent is that these fields can be read directly by validation enforcement code without parsing prose. Table-level custom_validation is reserved for more complex cross-table/cross-column conditions that can't be expressed this way, and allowed_values stays as a static list (discrete). The idea being that these fields only exist as needed per column.

Clarified semantics of `required` vs `nan_fill`

The column-level required field defines: whether a column can contain null values at validation time (without raising an error).

For columns that are required: false but have a sensible default value, nan_fill specifies the value that will be applied to any null/NaN cells during the validation enforcement step. For example:

lcf_build: type: float required: false units: '%' gte: 0.0 nan_fill: 100.0

This says: the column is optional; if cells are null, fill them with 100.0 (i.e., no locational scaling); validate that all resulting values are >= 0.

Standardised description sub-headers

Description fields now use consistent sub-headers to distinguish different types of content:

Source tables: — the IASR workbook tables drawn on by the templater to produce this output table
Source notes: — templater transformation notes (what has already happened to get to this point; not enforced by the validator)
If absent: — behavioural consequence when the entire table or column is not present
If absent (or empty): — used where both the column being absent and individual null cells have the same consequence

I’m not locked in on having the source notes stuff here, maybe it’s more useful for me at this stage during the templater refactor but will become redundant/doubled as the templater documentation gets filled out? But the idea being that Source notes: defines stuff that has happened before reaching validation, in part documenting the templater behaviour but also in particular where new rows have been added that don’t exist in the IASR workbook. Happy to lose this subheading if it feels like a double up/not that useful!

Simplified dynamic marginal cost data requirements

Basically - I was getting stuck on how to imply this kind of flexibly required if/then structure in the schema without having a more concrete picture of the validation and where it sits in the flow, and if/how we required users to manually fill some values at different points etc - so I made an executive decision to simplify and remove this optionality/flexibility for the moment so I can just keep moving!

Previously, the generator tables used an optional_sets field to indicate that fuel_price_node (now fuel_price_mapping), vom, and heat_rate were all required together or all absent. This approach was removed. These columns — along with fom — are now simply required: true in both new entrant generator and storage tables.

The same change applies to costs_fuel_prices, which has been made required: true at the table level (was previously optional). This is noted in the schema with a comment explaining the intent to revisit optionality of SRMC-related data once the validation enforcement design is clearer.

I do think this is important to revisit just with a more serious think about the user interaction piece (where and how to enforce)!

Column identifier rename: `fuel_price_node` → `fuel_price_mapping`

I renamed the column fuel_price_node in the generator tables and costs_fuel_prices back to fuel_price_mapping. Because - in the IASR v7.5 workbook structure: fuel prices are not specifically linked to a geographic node - for many generators they are identified by a mapping ID that doesn't correspond to a specific location. There’s no region/subregion/rez/location column in fuel price tables anymore basically!

Cross-table references: new entrant technology validation

The technology column in both generators_new_entrant and storage_new_entrant now has:

`allowed_values_from:

costs_new_entrant_build: technology`

This enforces that any technology defined in the new entrant asset tables also has a build cost entry.

Open question for discussion: This validates that new entrant assets have build costs, but it doesn't validate that all entries in costs_new_entrant_build correspond to technologies actually defined in the asset tables (which in my mind would be about catching stale or misspelled entries).

nick-gorman

Hey Ellie,

This looks really good, big fan.

I think there are some broader questions about how we implement referential integrity, which we haven't quiet answered, similar to your question on costs_new_entrant_build. But I think that is something we can keep thinking through and doesn't need to block this PR by any means. I might open a discussion on referential integrity.

nick-gorman · 2026-04-17T01:03:54Z

+      Note: for all power stations except for Kogan Gas, this should exactly match
+      the `power_station` column.
+
+      If absent: assume no dynamic marginal price is calculated for this model run.


Maybe if fuel_price_node isn't supplied then the user needs to add a column called fuel_price, which would just specify one fixed price. Or you could allow numeric values in fuel_price_node which override the dynamic values, and just not allow NaN. The user can set to zero if thats what they want.

nick-gorman · 2026-04-20T22:34:31Z

+    required: false
+    units: '%'
+    description: >
+      Maximum allowed state of charge (%). Must be between 0.0 and 100.0,


Ok, yep, I agree it could make sense to wait till we know more about how we will be doing the validation.

nick-gorman · 2026-05-13T01:20:40Z

+  fuel_type:
+    type: string
+    required: true
+    allowed_values: ["Black Coal", "Brown Coal", "Liquid Fuel", "Gas", "Water", "Solar", "Wind", "Biomass", "Biomethane"]


I'd did something similar in network_geography.yaml, but I wonder if hard coding like this is actually bad as it prevents people defining new fuel costs. Thinking again of a toy example type case where I might just want to define whatever fuels I like Coal, Hydrogen, etc

Another way of doing this might be force the generation tables to only have fuel types that exist in costs_fuel_prices.

yeahh I thought about this too, and after the chat this morning about the role of validator totally agree hard-coding isn't ideal here. And yep I like that option - will implement!

Co-authored-by: nick-gorman <40549624+nick-gorman@users.noreply.github.com>

…mn references

add initial set of generation and storage schemas

6352d3e

EllieKallmier added this to the Technical debt clearing house milestone Apr 14, 2026

EllieKallmier requested a review from nick-gorman April 14, 2026 07:33

EllieKallmier added type: feature New feature or request category: data-validation Relates to data validation practices across any module - e.g tables, schema or enforcement labels Apr 14, 2026

nick-gorman approved these changes Apr 16, 2026

View reviewed changes

EllieKallmier added 2 commits May 7, 2026 11:49

tighten generation, storage, and cost schemas - add value constraints…

2963704

…, cross-table references, and validation rules

tidy: remove custom validation from build costs as covered by allowed…

246a5a7

…_values_from in new entrants tables

EllieKallmier requested a review from nick-gorman May 11, 2026 23:50

nick-gorman approved these changes May 13, 2026

View reviewed changes

EllieKallmier and others added 2 commits May 13, 2026 12:30

Apply suggestion from @nick-gorman

8e23088

Co-authored-by: nick-gorman <40549624+nick-gorman@users.noreply.github.com>

replace hard-coded allowed_values with allowed_values_from table/colu…

c1ec748

…mn references

		If costs are given by region in source data, create separate rows for each
		geo_id in the region with the same cost.

		If absent: no dynamic marginal costs calculated. Requires later user input of
		fixed or dynamic marginal cost.


		Should match fuel_type values in `generators_existing_planned` or

Conversation

EllieKallmier commented Apr 14, 2026

Uh oh!

codecov Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

nick-gorman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nick-gorman Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EllieKallmier commented May 7, 2026

Collection of my (edited) thoughts and updates:

New schema additions for validation metadata

Clarified semantics of required vs nan_fill

Standardised description sub-headers

Simplified dynamic marginal cost data requirements

Column identifier rename: fuel_price_node → fuel_price_mapping

Cross-table references: new entrant technology validation

Uh oh!

nick-gorman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Apr 14, 2026 •

edited

Loading

nick-gorman Apr 16, 2026 •

edited

Loading

Clarified semantics of `required` vs `nan_fill`

Column identifier rename: `fuel_price_node` → `fuel_price_mapping`