Skip to content

Fix/gfs preprocessing#83

Open
AoufNihed wants to merge 5 commits intoopenclimatefix:mainfrom
AoufNihed:fix/gfs-preprocessing
Open

Fix/gfs preprocessing#83
AoufNihed wants to merge 5 commits intoopenclimatefix:mainfrom
AoufNihed:fix/gfs-preprocessing

Conversation

@AoufNihed
Copy link

@AoufNihed AoufNihed commented Apr 1, 2025

GFS Preprocessing Fix: Longitude Range and Dimension Naming

Problem

The GFS data preprocessing had several issues:

  1. Longitude range mismatch (-180° to 180° vs 0° to 360°)
  2. Dimension naming conflict between 'variable' and 'channel'
  3. Inefficient chunking strategy affecting performance

Solution

Implemented fixes in gfs_preprocessing.py and gfs_dataset.py:

  1. Longitude Range Standardization:

    • Convert all longitudes to [0°, 360°) range
    • Added sorting to ensure consistent ordering
    • Updated UK region selection to handle wrap-around (350° to 2°)
  2. Dimension Structure:

    • Standardized on 'channel' dimension name
    • Properly stacked variables into channel dimension
    • Added validation checks for required dimensions
  3. Performance Optimization:

    • Implemented efficient chunking strategy
    • Set optimal chunk sizes for different dimensions
    • Added cleanup of existing chunk encoding

Testing

The changes can be verified by:

  1. Loading preprocessed data and checking longitude range
  2. Confirming dimension names and structure
  3. Verifying UK region selection

- Add dedicated GFS processing module
- Fix longitude range to [0, 360)
- Use 'channel' dimension consistently
- Add verification function
- Update CLI and documentation
- Add dedicated GFS processing module
- Fix longitude range to [0, 360)
- Use 'channel' dimension consistently
- Add verification function
- Update CLI and documentation
@AoufNihed
Copy link
Author

Hey @peterdudfield
GFS Preprocessing: Fix Longitude Range and Dimension Structure

  • Fixed longitude range to [0°, 360°] for consistent UK region selection
  • Standardized dimension naming to use 'channel' instead of 'variable'
  • Optimized chunking for better performance
  • Added data validation checks

Tested with sample GFS data and verified UK region selection works correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant