Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
80 changes: 80 additions & 0 deletions .github/workflows/ci-backend.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# CI for this repository (Rust + optional WASM UI build).
name: CI

on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
rust:
runs-on: ubuntu-latest

# Set the working-directory for all steps to the repo root
defaults:
run:
working-directory: .

steps:
- name: Check out code
uses: actions/checkout@v4

- name: Set up Rust
uses: dtolnay/rust-toolchain@stable
with:
toolchain: stable
components: clippy, rustfmt # Install clippy (lint) and rustfmt (format)

- name: Cache Cargo
uses: actions/cache@v4
with:
path: |
~/.cargo/bin/
~/.cargo/registry/index/
~/.cargo/registry/cache/
~/.cargo/git/db/
target/
key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}

- name: Check Formatting (cargo fmt)
run: cargo fmt -- --check

- name: Run Lint (cargo clippy)
run: cargo clippy --all-targets --all-features

- name: Run Build (cargo build)
run: cargo build --all-targets --all-features --verbose

- name: Run Tests (cargo test)
run: cargo test --all-features --verbose

ui:
runs-on: ubuntu-latest
steps:
- name: Check out code
uses: actions/checkout@v4

- name: Set up Rust (stable + wasm target)
uses: dtolnay/rust-toolchain@stable
with:
toolchain: stable
targets: wasm32-unknown-unknown

- name: Cache Cargo
uses: actions/cache@v4
with:
path: |
~/.cargo/bin/
~/.cargo/registry/index/
~/.cargo/registry/cache/
~/.cargo/git/db/
target/
key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}

- name: Install trunk
run: cargo install trunk --locked

- name: Build Yew UI
run: trunk build --release
working-directory: inference-ui
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
/target
data/
malaria-model/
.idea/
/data/Parasitized
/data/plasmodium-phonecamera
/data/tuberculosis-phonecamera
/plasmodium-images
/mpidb_crops/
/data/Uninfected
inference-ui/target/
inference-ui/dist/
49 changes: 25 additions & 24 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[package]
name = "Burn_model"
name = "malaria_model"
version = "0.1.0"
edition = "2021"

Expand All @@ -14,6 +14,7 @@ burn-train = "0.19.0"
image = "0.25.9"
rand = "0.9.2"
serde = { version = "1.0", features = ["derive"] }
csv = "1.3"
anyhow = "1.0"
rayon = "1.10.0"

Expand Down
151 changes: 151 additions & 0 deletions DEV_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# Developer Guide

## Crop Generation Strategy (MP-IDB + Uninfected)

This project trains on **image crops** rather than full microscopy frames.
The crops are generated by the `mpidb_prep` utility:

```bash
cargo run --bin mpidb_prep -- <data_root> <out_root> <crop_size> <min_mask_area>
```

Defaults:

- `data_root`: `data`
- `out_root`: `mpidb_crops`
- `crop_size`: `128`
- `min_mask_area`: `25`

The output is:

- A directory of saved crop images (`.png`)
- A `manifest.csv` describing labels for each crop

## Data Sources

### Infected (MP-IDB)
For each malaria species directory under `data/` (e.g. `data/Falciparum`, `data/Malariae`, ...), the tool expects:

- `data/<Species>/img/` : RGB microscopy images
- `data/<Species>/gt/` : corresponding binary masks (ground truth)

The tool matches files by **exact filename** (e.g. `gt/XYZ.jpg` corresponds to `img/XYZ.jpg`).

### Uninfected (negative samples)
The tool also reads:

- `data/Uninfected/`

These images are used as **negative examples** (uninfected / no malaria).

## Key Design Choice

### Why crops?
The MP-IDB infected dataset provides **segmentation masks** that localize parasites. Using these masks lets us generate **parasite-centered crops**.
This increases the signal-to-noise ratio during training compared to training on full images.

### Weak stage labels
MP-IDB stage labels are **image-level** (inferred from filename tokens like `R/T/S/G`).
That means stage labels are *weak* with respect to individual parasite crops.
To make this usable, we train stage prediction as **presence probability per crop** (multi-label), acknowledging it’s weak supervision.

## Infected Crop Algorithm (from masks)

For each infected `gt` mask image:

1. **Load image + mask**
- Image is read as RGB (`RgbImage`)
- Mask is read as grayscale (`GrayImage`)

2. **Connected components**
- We scan the mask and run a BFS connected-components search using **4-neighborhood** (`left/right/up/down`).
- Each connected component is assumed to correspond to one parasite region.

3. **Filter tiny components**
- Components with `area < min_mask_area` are discarded.
- This removes small artifacts/noise in masks.

4. **Bounding box extraction**
- For each component we compute its bounding box `(min_x, min_y, max_x, max_y)`.

5. **Context padding**
- We expand the bounding box by a fraction of its size.
- Current padding fraction is fixed in code:
- `pad_frac = 0.25` (25% padding)

6. **Square padding**
- The padded crop rectangle is converted to a **square** by taking:
- `side = max(crop_w, crop_h)`
- The original rectangular crop is centered into the square canvas.

7. **Resize to training size**
- Final crop is resized to `crop_size x crop_size` (default `128x128`) using `FilterType::Triangle`.

8. **Save crop**
- Output path:
- `mpidb_crops/<Species>/<source_image_id>_<component_index>.png`

## Uninfected Crop Algorithm

Uninfected images do not have masks, so we generate a single crop per image:

1. **Load image** as RGB
2. **Center square crop**
- Take the largest centered square from the image (uses the smaller of width/height).
3. **Resize to `crop_size`** (default `128x128`)
4. **Save crop**
- Output path:
- `mpidb_crops/Uninfected/<source_image_id>_0.png`

## Stage Label Inference

Stages are inferred from tokens in the `source_image_id` (filename stem).
A stage flag is set to 1 if the token exists:

- `R` => ring
- `T` => trophozoite
- `S` => schizont
- `G` => gametocyte

Tokenization splits on `-`, `_`, and spaces.

Important: this is **not** a per-parasite stage ground truth.
It is treated as **multi-label “presence” supervision**.

## Manifest Schema

The tool writes `mpidb_crops/manifest.csv` with columns:

- `crop_path` : absolute/relative path string written by the tool
- `infected` : `1` for infected crops, `0` for uninfected crops
- `species` : one of `Falciparum|Malariae|Ovale|Vivax|Uninfected`
- `stage_r` : `0/1`
- `stage_t` : `0/1`
- `stage_s` : `0/1`
- `stage_g` : `0/1`
- `source_image_id` : the stem of the original image filename (used for splitting)

## Leakage-Safe Splitting

When training, the dataset is split using `source_image_id` so that:

- All crops derived from the same original image stay in the same split

This prevents leakage where near-identical parasite crops from one image appear in both train and validation.

## Practical Notes / Debugging

- If you get **zero crops for a species**, check:
- `data/<Species>/gt` exists and contains masks
- `data/<Species>/img` contains the same filenames
- masks are not empty and have non-zero pixels

- If you see **too many tiny crops**, increase `min_mask_area`.

- If crops cut off parasite context, increase the padding fraction in `crop_and_square_pad` (currently `0.25`).

## Where the Code Lives

- Crop tool: `src/bin/mpidb_prep.rs`
- Manifest-based dataset loader: `src/data.rs` (`MpIdbDataset`)
- Training entry point: `src/training.rs`
Loading