Improve station filtering with quality evaluation pipeline#74
Merged
Conversation
Import all stations from NOAA/TIICON, and then perform quality assessment as a second step.
23a0264 to
edb6df0
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR replaces ad-hoc, source-specific station filtering with a single quality evaluation pipeline that generates an auditable quality.json used for runtime filtering, and updates the repository’s station data set accordingly.
Changes:
- Adds many new TICON station JSON records to the dataset.
- Updates Prettier configuration to ignore the generated
quality.json. - (Per description) Introduces a quality evaluation pipeline and runtime filtering based on
quality.json.
Reviewed changes
Copilot reviewed 70 out of 1842 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| data/ticon/atka-9461710-usa-noaa.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/astoria-9439040-usa-noaa.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/arun_platform-arn-gbr-cco.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/artificial_is-8537614-usa-noaa.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/armitage_island-9449932-usa-noaa.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/arko-2545-swe-smhi.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/arena_cove-9416841-usa-noaa.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/apra_harbor-1630000-usa-noaa.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/apalachicola-8728690-usa-noaa.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/anzio-anz-ita-ispra.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/annapolis-8575512-usa-noaa.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/anklam-9660001-deu-wsv.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/andenes-anx-nor-nhs.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/anchorage-9455920-usa-noaa.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/american_falls_ny-9063010-usa-noaa.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/althagen-9650024-deu-wsv.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/alpena_mi-9075065-usa-noaa.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/alliance-8761727-usa-noaa.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/alitak-9457804-usa-noaa.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/alicante_i-ali-esp-da_mm.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/alexandria-807-egy-uhslc_fd.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/alblasserdam-albsdm-nld-rws_hist.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/albany-8518995-usa-noaa.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/alameda-9414750-usa-noaa.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/aguadilla-9759412-usa-noaa.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/adak_island-9461380-usa-noaa.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/acapulcoapi_radar-41-mex-unam.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/acapulcoapi_nivelcbs-41-mex-unam.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/acajutla-082-slv-uhslc_fd.json | Adds a new station record (inputs to quality evaluation). |
| data/ticon/aberdeen-9441187-usa-noaa.json | Adds a new station record (inputs to quality evaluation). |
| .prettierignore | Ignores generated quality.json to avoid formatting churn. |
Comments suppressed due to low confidence (5)
data/ticon/arun_platform-arn-gbr-cco.json:27
- These datum values appear internally inconsistent:
MHWis greater thanMHHW, andMLLWis greater thanMLW. If your quality pipeline rejects datum ordering violations (as described in the PR), this station will be auto-rejected. Suggest verifying the sourced datum values and correcting them (or, if these values are intentionally preserved from upstream despite being invalid, update the station metadata to clearly indicate the known datum ordering issue).
"MHHW": 2.122,
"MHW": 2.124,
"MSL": 0,
"MTL": 0.055,
"MLW": -2.013,
"MLLW": -2.012,
data/ticon/anklam-9660001-deu-wsv.json:23
MHWis greater thanMHHW, which is a datum ordering violation. If the new evaluation pipeline hard-rejects datum ordering problems, this record will be rejected; please verify/correct the datum values or flag this explicitly as a known upstream issue.
"MHHW": 0.002,
"MHW": 0.003,
data/ticon/alliance-8761727-usa-noaa.json:23
MHHWis lower thanMHW, which is a datum ordering violation. This will likely trigger the pipeline’s datum hard-gate; please verify/correct the datum values (or annotate as a known upstream issue if keeping the original values is intentional).
"MHHW": -0.004,
"MHW": 0.005,
data/ticon/arun_platform-arn-gbr-cco.json:20
- This record’s
datumsshow clear ordering issues (e.g.,MHW > MHHW,MLLW > MLW), so the disclaimer "No obvious issues" is misleading. Suggest updating the disclaimer to reflect the known issue if keeping the values as-is.
"disclaimers": "No obvious issues",
data/ticon/american_falls_ny-9063010-usa-noaa.json:2
- The station name casing/abbreviation looks inconsistent with typical place naming (e.g., state abbreviation). Consider normalizing to a consistent convention used elsewhere in the dataset (for example, "American Falls, NY" or "American Falls (NY)") to improve readability.
"name": "American Falls Ny",
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
18c4bf5 to
edb6df0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The previous approach filtered TICON stations during import using ad-hoc rules for proximity to other stations. This replaces that with a single, auditable quality evaluation pipeline that treats all stations uniformly.
Quality evaluation pipeline
tools/evaluate-quality.tsevaluates all stations and producesquality.json, which is checked in and used as the source of truth for filtering stations at runtime.The evaluation:
noaa/rws_histsuffixes in TICON), and missing or zero-amplitude essential constituents (M2, S2, K1, O1).0000-0000) receive benefit-of-the-doubt scores; subordinate stations inherit epoch/recency from their reference; NOAA reference stations with subordinate dependents are protected from dedup loss.Quality result
The evaluation produces
quality.json(generated, checked in), which contains a record for each station with itsacceptedstatus,score(0–100), contributingfactors,issuesfound,reasonfor rejection if applicable, and whether it was markedredundantin deduplication.Rejection breakdown:
superseded: 630 — TICON stations duplicating NOAA/RWS data we have from the primary sourceduplicate: 955 — deemed a duplicate of another station based on proximity and similarity, with the lower-scoring station rejecteddatum: 421 — definitional datum ordering violations (MHW ≤ MSL, MSL ≤ MLW, MLW < LAT, MHW < MLLW, HAT < MHHW)range: 169 — negligible tidal range < 2cmconstituents: 40 — missing essential constituents, zero-amplitude essentials, or P1 > K1 (physically impossible)The
issuesarray provides detailed flags for each station, which can be used for debugging or further analysis.Here is an example record for an accepted and rejected station:
[ { "id": "noaa/8725649", "accepted": true, "score": 80, "factors": { "epoch": 1, "recency": 0.75, "source": 1, "quality": 1, "amplitude": 0.75, "coverage": 0.181 }, "issues": ["K2 amplitude (0.0360) exceeds S2 (0.0200)"] }, { "id": "noaa/8762483", "accepted": false, "score": 0, "factors": { "epoch": 1, "recency": 0.75, "source": 1, "quality": 1, "amplitude": 0, "coverage": 0 }, "issues": [ "MLW (-0.04) < MLLW (-0.039)", "Missing constituents for prediction: M2, S2, K1, O1" ], "reason": "constituents" } ]Runtime filtering
quality.jsonis loaded at runtime. The npm module now exports bothstations(filtered) andallStations(unfiltered):