Skip to content

feat: investigate using happyGISCO for improved NUTS estimation #45

@bk86a

Description

@bk86a

Context

Eurostat maintains happyGISCO, a Python client for GISCO web services. It provides two relevant capabilities:

  • place2coord() — geocodes a place name to coordinates
  • coord2nuts() — returns the NUTS region for given coordinates

This project's API works with country + postal code only (no coordinates), so the interesting part is that happyGISCO could bridge the gap: take a postal code that is missing from TERCET, geocode it to coordinates, and then look up the NUTS region — all via Eurostat's own services.

Potential improvements

1. Fallback for unknown postal codes

When the API receives a postal code not found in TERCET or the estimates table, it currently returns 404. With happyGISCO, it could instead:

  1. Geocode the postal code + country via place2coord()
  2. Pass the resulting coordinates to coord2nuts()
  3. Return the NUTS region as an approximate match

This would reduce 404s without requiring the postal code to be pre-registered in any data file.

2. Better NUTS estimation in the monitor

The postal code monitor currently estimates NUTS for missing codes by querying neighboring postal codes (±1, ±2, etc.) — a rough heuristic. Instead, it could geocode the postal code via place2coord() and then use coord2nuts() for a more authoritative NUTS lookup, without relying on Nominatim coordinates.

3. Validation of existing estimates

place2coord() + coord2nuts() could cross-validate entries in tercet_missing_codes.csv — geocode each postal code via GISCO's own geocoder, look up the NUTS region, and flag any where the estimated NUTS3 disagrees.

Questions to investigate

  • Does place2coord() reliably resolve European postal codes (e.g. place2coord("1010", country="AT"))?
  • Is GISCO's find-nuts service rate-limited? What throughput can we expect for batch validation?
  • How does it handle edge cases (coastal areas, border regions, overseas territories)?
  • Would adding happyGISCO as a dependency be appropriate, or should we call the GISCO API directly?
  • Licensing compatibility (happyGISCO is EUPL, same as this project)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions