Skip to content

Added Germany PV & GFS data download pipelines#124

Draft
Sharkyii wants to merge 10 commits intoopenclimatefix:mainfrom
Sharkyii:data/Germany
Draft

Added Germany PV & GFS data download pipelines#124
Sharkyii wants to merge 10 commits intoopenclimatefix:mainfrom
Sharkyii:data/Germany

Conversation

@Sharkyii
Copy link

Description

This PR adds automated data download pipelines for Germany, including:

  • Solar PV generation data using the Bundesnetzagentur SMARD API (15-minute resolution)
  • GFS meteorological data download notebook for Germany using NOAA public datasets

I will proceed with Step 5 (Model Training Pipeline Integration).

Relates #121

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@Sharkyii Sharkyii marked this pull request as ready for review January 22, 2026 17:44
@Sharkyii Sharkyii marked this pull request as draft January 22, 2026 17:44
@Sharkyii
Copy link
Author

@peterdudfield till now i have downloaded the data for 1 year and uploaded the generational data for Germany here:
Hugging Face: https://huggingface.co/datasets/Shark26/germany_pv_data
S3: s3://germany_pv_data
Will soon test the data via training the model
could you please check it once @siddharth7113

Removed hardcoded zarr paths for GSP and GFS data in Germany configuration.
@Sharkyii
Copy link
Author

Sharkyii commented Feb 3, 2026

@peterdudfield @siddharth7113
i added scripts to download the data and still i am not able to run save_sample and train the model because of this error
python save_samples.py `

+datamodule.sample_output_dir="./samples" +datamodule.num_train_samples=10
+datamodule.num_val_samples=5
CONFIG
├── trainer
│ └── target: lightning.pytorch.trainer.trainer.Trainer
│ accelerator: auto
│ devices: auto
│ min_epochs: null
│ max_epochs: null
│ reload_dataloaders_every_n_epochs: 0
│ num_sanity_val_steps: 8
│ fast_dev_run: false
│ accumulate_grad_batches: 4
│ log_every_n_steps: 50

├── model
│ └── target: pvnet.models.multimodal.multimodal.Model
│ output_quantiles:
│ - 0.02
│ - 0.1
│ - 0.25
│ - 0.5
│ - 0.75
│ - 0.9
│ - 0.98
│ nwp_encoders_dict:
│ gfs:
target: pvnet.models.multimodal.encoders.encoders3d.ResConv3DNet2
partial: true
│ in_channels: 14
│ out_features: 32
│ n_res_blocks: 1
│ hidden_channels: 6
│ image_size_pixels: 2
│ output_network:
target: pvnet.models.multimodal.linear_networks.networks.ResFCNet2
partial: true
│ fc_hidden_features: 128
│ n_res_blocks: 6
│ res_block_layers: 2
│ dropout_frac: 0.0
│ embedding_dim: 16
│ include_sun: true
│ include_gsp_yield_history: false
│ include_site_yield_history: false
│ forecast_minutes: 480
│ history_minutes: 60
│ nwp_history_minutes:
│ gfs: 180
│ nwp_forecast_minutes:
│ gfs: 540
│ nwp_interval_minutes:
│ gfs: 180
│ optimizer:
target: pvnet.optimizers.EmbAdamWReduceLROnPlateau
│ lr: 0.0001
│ weight_decay: 0.01
│ amsgrad: true
│ patience: 5
│ factor: 0.1
│ threshold: 0.002

├── datamodule
│ └── target: pvnet.data.DataModule
│ configuration: C:\Users\SNEH\open-data-pvnet\src\open_data_pvnet\configs\PVNet_configs\datamodule\configuration\germany_configuration.yaml
│ num_workers: 8
│ prefetch_factor: 2
│ batch_size: 8
│ train_period:
│ - null
│ - '2023-06-30'
│ val_period:
│ - '2023-07-01'
│ - '2023-12-31'
│ sample_output_dir: ./samples
│ num_train_samples: 10
│ num_val_samples: 5

├── callbacks
│ └── early_stopping:
target: lightning.pytorch.callbacks.EarlyStopping
│ monitor: ${resolve_monitor_loss:${model.output_quantiles}}
│ mode: min
│ patience: 10
│ min_delta: 0
│ learning_rate_monitor:
target: lightning.pytorch.callbacks.LearningRateMonitor
│ logging_interval: epoch
│ model_summary:
target: lightning.pytorch.callbacks.ModelSummary
│ max_depth: 3
│ model_checkpoint:
target: lightning.pytorch.callbacks.ModelCheckpoint
│ monitor: ${resolve_monitor_loss:${model.output_quantiles}}
│ mode: min
│ save_top_k: 1
│ save_last: true
│ every_n_epochs: 1
│ verbose: false
│ filename: epoch={epoch}-step={step}
│ dirpath: PLACEHOLDER/${model_name}
│ auto_insert_metric_name: false
│ save_on_train_epoch_end: false

├── logger
│ └── wandb:
target: lightning.pytorch.loggers.wandb.WandbLogger
│ project: GFS_TEST_RUN
│ name: ${model_name}
│ save_dir: PLACEHOLDER
│ offline: false
│ id: ${oc.env:WANDB_RUN_ID}
│ log_model: true
│ prefix: ''
│ job_type: train
│ group: ''
│ tags: []

└── seed
└── 2727831
----- Saving val samples -----
Error executing job with overrides: ['+datamodule.sample_output_dir=./samples', '+datamodule.num_train_samples=10', '+datamodule.num_val_samples=5']
Traceback (most recent call last):
File "C:\Users\open-data-pvnet\src\open_data_pvnet\scripts\save_samples.py", line 171, in main
val_dataset = get_dataset(
^^^^^^^^^^^^
File "C:\Users\open-data-pvne\src\open_data_pvnet\scripts\save_samples.py", line 106, in get_dataset
return dataset_cls(config_path, start_time=start_time, end_time=end_time)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\SNEH\AppData\Local\Programs\Python\Python312\Lib\site-packages\ocf_data_sampler\torch_datasets\datasets\pvnet_uk.py", line 254, in init
super().init(config_filename, start_time, end_time, gsp_ids)
File "C:\Users\SNEH\AppData\Local\Programs\Python\Python312\Lib\site-packages\ocf_data_sampler\torch_datasets\datasets\pvnet_uk.py", line 100, in init
datasets_dict = get_dataset_dict(config.input_data, gsp_ids=gsp_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\SNEH\AppData\Local\Programs\Python\Python312\Lib\site-packages\ocf_data_sampler\load\load_dataset.py", line 24, in get_dataset_dict
da_gsp = open_gsp(
^^^^^^^^^
File "C:\Users\SNEH\AppData\Local\Programs\Python\Python312\Lib\site-packages\ocf_data_sampler\load\gsp.py", line 60, in open_gsp
raise ValueError(
ValueError: Some GSP IDs in the GSP generation data are not available in the locations file.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
PS C:\Users\open-data-pvnet\src\open_data_pvnet

Updated zarr_path to an empty string for flexibility.
1
Refactor process_grib function to handle multiple levels and improve error handling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant