Added Germany PV & GFS data download pipelines by Sharkyii · Pull Request #124 · openclimatefix/open-data-pvnet

Sharkyii · 2026-01-22T17:38:54Z

Description

This PR adds automated data download pipelines for Germany, including:

Solar PV generation data using the Bundesnetzagentur SMARD API (15-minute resolution)
GFS meteorological data download notebook for Germany using NOAA public datasets

I will proceed with Step 5 (Model Training Pipeline Integration).

Relates #121

Checklist:

My code follows OCF's coding style guidelines
I have performed a self-review of my own code
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
I have checked my code and corrected any misspellings

Sharkyii · 2026-01-26T16:18:01Z

@peterdudfield till now i have downloaded the data for 1 year and uploaded the generational data for Germany here:
Hugging Face: https://huggingface.co/datasets/Shark26/germany_pv_data
S3: s3://germany_pv_data
Will soon test the data via training the model
could you please check it once @siddharth7113

Removed hardcoded zarr paths for GSP and GFS data in Germany configuration.

Sharkyii · 2026-02-03T17:17:47Z

@peterdudfield @siddharth7113
i added scripts to download the data and still i am not able to run save_sample and train the model because of this error
python save_samples.py `

+datamodule.sample_output_dir="./samples" +datamodule.num_train_samples=10
+datamodule.num_val_samples=5
CONFIG
├── trainer
│ └── target: lightning.pytorch.trainer.trainer.Trainer
│ accelerator: auto
│ devices: auto
│ min_epochs: null
│ max_epochs: null
│ reload_dataloaders_every_n_epochs: 0
│ num_sanity_val_steps: 8
│ fast_dev_run: false
│ accumulate_grad_batches: 4
│ log_every_n_steps: 50
│
├── model
│ └── target: pvnet.models.multimodal.multimodal.Model
│ output_quantiles:
│ - 0.02
│ - 0.1
│ - 0.25
│ - 0.5
│ - 0.75
│ - 0.9
│ - 0.98
│ nwp_encoders_dict:
│ gfs:
│ target: pvnet.models.multimodal.encoders.encoders3d.ResConv3DNet2
│ partial: true
│ in_channels: 14
│ out_features: 32
│ n_res_blocks: 1
│ hidden_channels: 6
│ image_size_pixels: 2
│ output_network:
│ target: pvnet.models.multimodal.linear_networks.networks.ResFCNet2
│ partial: true
│ fc_hidden_features: 128
│ n_res_blocks: 6
│ res_block_layers: 2
│ dropout_frac: 0.0
│ embedding_dim: 16
│ include_sun: true
│ include_gsp_yield_history: false
│ include_site_yield_history: false
│ forecast_minutes: 480
│ history_minutes: 60
│ nwp_history_minutes:
│ gfs: 180
│ nwp_forecast_minutes:
│ gfs: 540
│ nwp_interval_minutes:
│ gfs: 180
│ optimizer:
│ target: pvnet.optimizers.EmbAdamWReduceLROnPlateau
│ lr: 0.0001
│ weight_decay: 0.01
│ amsgrad: true
│ patience: 5
│ factor: 0.1
│ threshold: 0.002
│
├── datamodule
│ └── target: pvnet.data.DataModule
│ configuration: C:\Users\SNEH\open-data-pvnet\src\open_data_pvnet\configs\PVNet_configs\datamodule\configuration\germany_configuration.yaml
│ num_workers: 8
│ prefetch_factor: 2
│ batch_size: 8
│ train_period:
│ - null
│ - '2023-06-30'
│ val_period:
│ - '2023-07-01'
│ - '2023-12-31'
│ sample_output_dir: ./samples
│ num_train_samples: 10
│ num_val_samples: 5
│
├── callbacks
│ └── early_stopping:
│ target: lightning.pytorch.callbacks.EarlyStopping
│ monitor: ${resolve_monitor_loss:${model.output_quantiles}}
│ mode: min
│ patience: 10
│ min_delta: 0
│ learning_rate_monitor:
│ target: lightning.pytorch.callbacks.LearningRateMonitor
│ logging_interval: epoch
│ model_summary:
│ target: lightning.pytorch.callbacks.ModelSummary
│ max_depth: 3
│ model_checkpoint:
│ target: lightning.pytorch.callbacks.ModelCheckpoint
│ monitor: ${resolve_monitor_loss:${model.output_quantiles}}
│ mode: min
│ save_top_k: 1
│ save_last: true
│ every_n_epochs: 1
│ verbose: false
│ filename: epoch={epoch}-step={step}
│ dirpath: PLACEHOLDER/${model_name}
│ auto_insert_metric_name: false
│ save_on_train_epoch_end: false
│
├── logger
│ └── wandb:
│ target: lightning.pytorch.loggers.wandb.WandbLogger
│ project: GFS_TEST_RUN
│ name: ${model_name}
│ save_dir: PLACEHOLDER
│ offline: false
│ id: ${oc.env:WANDB_RUN_ID}
│ log_model: true
│ prefix: ''
│ job_type: train
│ group: ''
│ tags: []
│
└── seed
└── 2727831
----- Saving val samples -----
Error executing job with overrides: ['+datamodule.sample_output_dir=./samples', '+datamodule.num_train_samples=10', '+datamodule.num_val_samples=5']
Traceback (most recent call last):
File "C:\Users\open-data-pvnet\src\open_data_pvnet\scripts\save_samples.py", line 171, in main
val_dataset = get_dataset(
^^^^^^^^^^^^
File "C:\Users\open-data-pvne\src\open_data_pvnet\scripts\save_samples.py", line 106, in get_dataset
return dataset_cls(config_path, start_time=start_time, end_time=end_time)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\SNEH\AppData\Local\Programs\Python\Python312\Lib\site-packages\ocf_data_sampler\torch_datasets\datasets\pvnet_uk.py", line 254, in init
super().init(config_filename, start_time, end_time, gsp_ids)
File "C:\Users\SNEH\AppData\Local\Programs\Python\Python312\Lib\site-packages\ocf_data_sampler\torch_datasets\datasets\pvnet_uk.py", line 100, in init
datasets_dict = get_dataset_dict(config.input_data, gsp_ids=gsp_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\SNEH\AppData\Local\Programs\Python\Python312\Lib\site-packages\ocf_data_sampler\load\load_dataset.py", line 24, in get_dataset_dict
da_gsp = open_gsp(
^^^^^^^^^
File "C:\Users\SNEH\AppData\Local\Programs\Python\Python312\Lib\site-packages\ocf_data_sampler\load\gsp.py", line 60, in open_gsp
raise ValueError(
ValueError: Some GSP IDs in the GSP generation data are not available in the locations file.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
PS C:\Users\open-data-pvnet\src\open_data_pvnet

Updated zarr_path to an empty string for flexibility.

Refactor process_grib function to handle multiple levels and improve error handling.

Sharkyii added 3 commits January 22, 2026 22:55

downloading files added

a22067c

downloading files added

28ef7af

Clean notebook outputs

2a534f2

Sharkyii mentioned this pull request Jan 22, 2026

Country selection and coordination for PVNet training #121

Open

Sharkyii marked this pull request as ready for review January 22, 2026 17:44

Sharkyii marked this pull request as draft January 22, 2026 17:44

Sharkyii added 3 commits February 3, 2026 16:58

done

ba49d7c

Delete downloading_gfs_germany.ipynb

1936170

cleared

dc71c71

Removed hardcoded zarr paths for GSP and GFS data in Germany configuration.

Sharkyii added 4 commits February 9, 2026 12:38

updated germany_data_processor notebook

7bd0bc7

sneh

a05c720

Modify zarr_path to use an empty string

24dcbd3

Updated zarr_path to an empty string for flexibility.

1

f602e41

Refactor process_grib function to handle multiple levels and improve error handling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added Germany PV & GFS data download pipelines#124

Added Germany PV & GFS data download pipelines#124
Sharkyii wants to merge 10 commits intoopenclimatefix:mainfrom
Sharkyii:data/Germany

Sharkyii commented Jan 22, 2026

Uh oh!

Sharkyii commented Jan 26, 2026

Uh oh!

Sharkyii commented Feb 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Sharkyii commented Jan 22, 2026

Description

Checklist:

Uh oh!

Sharkyii commented Jan 26, 2026

Uh oh!

Sharkyii commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Sharkyii commented Feb 3, 2026 •

edited

Loading