hamsci_LSTID_detection

Automated detection of Large Scale Traveling Ionospheric Disturbances (LSTIDs) from amateur radio spot data (RBN, PSKReporter, WSPRNet) stored in Madrigal HDF5 format.

Developed by the HamSCI NASA Space Weather Operations to Research (SWO2R) Team:

Nathaniel Frissell W2NAF
Nicholas Callahan
Diego Sanchez KD2RLM
Bill Engelke AB4EJ
Mary Lou West KC2NMC

Requirements

Python 3.11+ on Linux (tested on Ubuntu 22.04 / WSL2).

cartopy==0.24.1
dask==2024.8.2
h5py==3.11.0
matplotlib==3.10.8
numpy==1.26.4
pandas==2.2.2
pillow==12.1.1
polars==1.35.1
pyarrow==16.1.0
pysolar==0.13
scipy==1.13.1
statsmodels==0.14.2

Data

Input data is Madrigal daily HDF5 files named rsd{YYYY-MM-DD}.01.hdf5, downloaded from the Cedar Madrigal database (instrument 8308).

A download script is provided in download_madrigal/:

Install the Madrigal web API client:
```
pip install madrigalWeb
```
Edit download_madrigal/download_madrigal_daily_hdf5.sh — set startDate, endDate, and your user_email / user_fullname / user_affiliation.

Run:

chmod +x download_madrigal/download_madrigal_daily_hdf5.sh
./download_madrigal/download_madrigal_daily_hdf5.sh

Files will be downloaded to data/madrigal/.

Instructions

Clone the repository and install:
```
pip install -e .
```
Place Madrigal HDF5 data files into the directory specified by data_dir in your config.
- Files must be named rsd{YYYY-MM-DD}.01.hdf5.
Create a config file in the config/ directory. Each config file defines a complete experiment (date range, region, frequency band, data paths, and all algorithm parameters). Use config/config_test.json as a starting template.

Run the pipeline, passing your config:

python run_LSTID_detection.py -p config/config_test.json

The config/ directory is intended to hold one JSON file per experiment or parameter set, making runs fully reproducible and easy to track. Key config options: data_dir, cache_dir, region_name (see scripts/regions.py), freq (MHz), distance_range, use_cache, n_workers.

Algorithm Description

1. Data Loading (`scripts/hdf5_loader.py`)

Reads Madrigal HDF5 files in chunks, applies geographic, frequency, and distance filters using Polars, and bins spots into a 2D histogram (10 km × 1 min bins). Results are cached as Parquet files.

2. Preprocessing (`scripts/heatmap_preprocess.py`)

Trims to daylight hours, applies column-wise MAD normalisation, a 2D Gaussian filter (σ = 4.2), and rescales to uint8.

3. Edge Detection (`scripts/edge_detect.py`)

Builds quantile threshold stacks (q = 0.4, 0.5, 0.6), smooths with LOWESS, removes outliers by absolute deviation, and selects the most stable edge.

4. Sinusoidal Fitting (`scripts/sinusoid_fitting.py`)

Computes a 15-minute rolling coefficient of variation (CV) on the detected edge.
Selects the largest contiguous stable region (CV < 0.05) between 1300–2300 UTC.
Fits a 2nd-degree polynomial to detrend the edge.
Applies an optional 1–4.5 hr Butterworth bandpass filter.
Runs curve_fit for period guesses T = [1, 1.5, 2, 2.5, 3, 3.5, 4] hr; best R² selected.

Processing is multiprocessed — each day runs as an independent worker via ProcessPoolExecutor.

Plotting

Plotting is handled by modules in scripts/. The general structure has two layers:

Panel functions (plot_subplots.py) — individual plot panels that draw onto axes passed in explicitly. These are the building blocks.
Composite plot functions (plot_lstid_paper.py, dfs_thesis.py) — assemble one or more panels into a full figure and save it to disk. Each function takes fit_result, optionally df, and **plot_params (unpacked from the plotting block in your config).

To enable or add a plot, edit the plotting block inside process_one_day() in run_LSTID_detection.py, immediately after the sin_fit() call:

fit_result = sin_fit(edge_result, **fit_params)

# ← add or uncomment plot calls here
plot_all_panels_individual(fit_result, df, **plot_params)
thesis_plot_all_panels_full(fit_result, df, **plot_params)

Any composite function imported from the plotting modules can be dropped in at this point. The **plot_params dict supplies output_dir, ylim, and cb_pad from the config automatically.

Synthetic Data Generation

Generates Madrigal-format HDF5 files compatible with the pipeline for ML training. Dates use 2030+ to avoid collision with real data.

# Generate synthetic HDF5 files
python synthetic_data_gen/gen_synthetic_hdf5.py --n_days 200 --output_dir data/synthetic_hdf5

# Build unified ML training manifest (synthetic + real pipeline output)
python synthetic_data_gen/build_ml_manifest.py \
    --synthetic_manifest data/synthetic_hdf5/manifest.csv \
    --real_csv_dir output/summary_csv \
    --real_hdf5_dir data/madrigal \
    --output data/ml_manifest.csv --skip_missing_hdf5

Each synthetic day embeds a sinusoidal LSTID of randomised amplitude (0–100 km) and period (1–4.5 hr) on a diurnal quadratic baseline. 20% of days are flat (A = 0), 20% are high-noise, and 30% contain random data gaps. The manifest records T_hr and amplitude_km as regression targets alongside a curriculum_phase label (1 = clear, 2 = ambiguous). Use --phase1_max_amp and --phase1_min_amp to control the phase boundary.

Tests

python -m pytest tests/ -v

Acknowledgments

This work was supported by NASA Grants 80NSSC21K1772, 80NSSC23K0848 and United States National Science Foundation (NSF) Grant AGS-2045755.

Name		Name	Last commit message	Last commit date
Latest commit History 224 Commits
config		config
download_madrigal		download_madrigal
scripts		scripts
synthetic_data_gen		synthetic_data_gen
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_LSTID_detection.py		run_LSTID_detection.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

hamsci_LSTID_detection

Requirements

Data

Instructions

Algorithm Description

1. Data Loading (`scripts/hdf5_loader.py`)

2. Preprocessing (`scripts/heatmap_preprocess.py`)

3. Edge Detection (`scripts/edge_detect.py`)

4. Sinusoidal Fitting (`scripts/sinusoid_fitting.py`)

Plotting

Synthetic Data Generation

Tests

Acknowledgments

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

hamsci_LSTID_detection

Requirements

Data

Instructions

Algorithm Description

1. Data Loading (scripts/hdf5_loader.py)

2. Preprocessing (scripts/heatmap_preprocess.py)

3. Edge Detection (scripts/edge_detect.py)

4. Sinusoidal Fitting (scripts/sinusoid_fitting.py)

Plotting

Synthetic Data Generation

Tests

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

1. Data Loading (`scripts/hdf5_loader.py`)

2. Preprocessing (`scripts/heatmap_preprocess.py`)

3. Edge Detection (`scripts/edge_detect.py`)

4. Sinusoidal Fitting (`scripts/sinusoid_fitting.py`)

Packages