Rust-Cameroon · Christiantyemele · Jan 15, 2026 · Jan 15, 2026 · Jan 15, 2026 · Jan 15, 2026
diff --git a/.github/workflows/ci-backend.yml b/.github/workflows/ci-backend.yml
@@ -0,0 +1,80 @@
+# CI for this repository (Rust + optional WASM UI build).
+name: CI
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+
+jobs:
+  rust:
+    runs-on: ubuntu-latest
+
+    # Set the working-directory for all steps to the repo root
+    defaults:
+      run:
+        working-directory: .
+
+    steps:
+      - name: Check out code
+        uses: actions/checkout@v4
+
+      - name: Set up Rust
+        uses: dtolnay/rust-toolchain@stable
+        with:
+          toolchain: stable
+          components: clippy, rustfmt # Install clippy (lint) and rustfmt (format)
+
+      - name: Cache Cargo
+        uses: actions/cache@v4
+        with:
+          path: |
+            ~/.cargo/bin/
+            ~/.cargo/registry/index/
+            ~/.cargo/registry/cache/
+            ~/.cargo/git/db/
+            target/
+          key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
+
+      - name: Check Formatting (cargo fmt)
+        run: cargo fmt -- --check
+
+      - name: Run Lint (cargo clippy)
+        run: cargo clippy --all-targets --all-features
+
+      - name: Run Build (cargo build)
+        run: cargo build --all-targets --all-features --verbose
+
+      - name: Run Tests (cargo test)
+        run: cargo test --all-features --verbose
+
+  ui:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Check out code
+        uses: actions/checkout@v4
+
+      - name: Set up Rust (stable + wasm target)
+        uses: dtolnay/rust-toolchain@stable
+        with:
+          toolchain: stable
+          targets: wasm32-unknown-unknown
+
+      - name: Cache Cargo
+        uses: actions/cache@v4
+        with:
+          path: |
+            ~/.cargo/bin/
+            ~/.cargo/registry/index/
+            ~/.cargo/registry/cache/
+            ~/.cargo/git/db/
+            target/
+          key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
+
+      - name: Install trunk
+        run: cargo install trunk --locked
+
+      - name: Build Yew UI
+        run: trunk build --release
+        working-directory: inference-ui
diff --git a/.gitignore b/.gitignore
@@ -1,6 +1,11 @@
 /target
-data/
 malaria-model/
 .idea/
+/data/Parasitized
+/data/plasmodium-phonecamera
+/data/tuberculosis-phonecamera
+/plasmodium-images
+/mpidb_crops/
+/data/Uninfected
 inference-ui/target/
 inference-ui/dist/
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/Cargo.toml b/Cargo.toml
@@ -1,5 +1,5 @@
 [package]
-name = "Burn_model"
+name = "malaria_model"
 version = "0.1.0"
 edition = "2021"
 
@@ -14,6 +14,7 @@ burn-train = "0.19.0"
 image = "0.25.9"
 rand = "0.9.2"
 serde = { version = "1.0", features = ["derive"] }
+csv = "1.3"
 anyhow = "1.0"
 rayon = "1.10.0"
 

diff --git a/DEV_GUIDE.md b/DEV_GUIDE.md
@@ -0,0 +1,151 @@
+# Developer Guide
+
+## Crop Generation Strategy (MP-IDB + Uninfected)
+
+This project trains on **image crops** rather than full microscopy frames.
+The crops are generated by the `mpidb_prep` utility:
+
+```bash
+cargo run --bin mpidb_prep -- <data_root> <out_root> <crop_size> <min_mask_area>
+```
+
+Defaults:
+
+- `data_root`: `data`
+- `out_root`: `mpidb_crops`
+- `crop_size`: `128`
+- `min_mask_area`: `25`
+
+The output is:
+
+- A directory of saved crop images (`.png`)
+- A `manifest.csv` describing labels for each crop
+
+## Data Sources
+
+### Infected (MP-IDB)
+For each malaria species directory under `data/` (e.g. `data/Falciparum`, `data/Malariae`, ...), the tool expects:
+
+- `data/<Species>/img/` : RGB microscopy images
+- `data/<Species>/gt/` : corresponding binary masks (ground truth)
+
+The tool matches files by **exact filename** (e.g. `gt/XYZ.jpg` corresponds to `img/XYZ.jpg`).
+
+### Uninfected (negative samples)
+The tool also reads:
+
+- `data/Uninfected/`
+
+These images are used as **negative examples** (uninfected / no malaria).
+
+## Key Design Choice
+
+### Why crops?
+The MP-IDB infected dataset provides **segmentation masks** that localize parasites. Using these masks lets us generate **parasite-centered crops**.
+This increases the signal-to-noise ratio during training compared to training on full images.
+
+### Weak stage labels
+MP-IDB stage labels are **image-level** (inferred from filename tokens like `R/T/S/G`).
+That means stage labels are *weak* with respect to individual parasite crops.
+To make this usable, we train stage prediction as **presence probability per crop** (multi-label), acknowledging it’s weak supervision.
+
+## Infected Crop Algorithm (from masks)
+
+For each infected `gt` mask image:
+
+1. **Load image + mask**
+   - Image is read as RGB (`RgbImage`)
+   - Mask is read as grayscale (`GrayImage`)
+
+2. **Connected components**
+   - We scan the mask and run a BFS connected-components search using **4-neighborhood** (`left/right/up/down`).
+   - Each connected component is assumed to correspond to one parasite region.
+
+3. **Filter tiny components**
+   - Components with `area < min_mask_area` are discarded.
+   - This removes small artifacts/noise in masks.
+
+4. **Bounding box extraction**
+   - For each component we compute its bounding box `(min_x, min_y, max_x, max_y)`.
+
+5. **Context padding**
+   - We expand the bounding box by a fraction of its size.
+   - Current padding fraction is fixed in code:
+     - `pad_frac = 0.25` (25% padding)
+
+6. **Square padding**
+   - The padded crop rectangle is converted to a **square** by taking:
+     - `side = max(crop_w, crop_h)`
+   - The original rectangular crop is centered into the square canvas.
+
+7. **Resize to training size**
+   - Final crop is resized to `crop_size x crop_size` (default `128x128`) using `FilterType::Triangle`.
+
+8. **Save crop**
+   - Output path:
+     - `mpidb_crops/<Species>/<source_image_id>_<component_index>.png`
+
+## Uninfected Crop Algorithm
+
+Uninfected images do not have masks, so we generate a single crop per image:
+
+1. **Load image** as RGB
+2. **Center square crop**
+   - Take the largest centered square from the image (uses the smaller of width/height).
+3. **Resize to `crop_size`** (default `128x128`)
+4. **Save crop**
+   - Output path:
+     - `mpidb_crops/Uninfected/<source_image_id>_0.png`
+
+## Stage Label Inference
+
+Stages are inferred from tokens in the `source_image_id` (filename stem).
+A stage flag is set to 1 if the token exists:
+
+- `R` => ring
+- `T` => trophozoite
+- `S` => schizont
+- `G` => gametocyte
+
+Tokenization splits on `-`, `_`, and spaces.
+
+Important: this is **not** a per-parasite stage ground truth.
+It is treated as **multi-label “presence” supervision**.
+
+## Manifest Schema
+
+The tool writes `mpidb_crops/manifest.csv` with columns:
+
+- `crop_path` : absolute/relative path string written by the tool
+- `infected` : `1` for infected crops, `0` for uninfected crops
+- `species` : one of `Falciparum|Malariae|Ovale|Vivax|Uninfected`
+- `stage_r` : `0/1`
+- `stage_t` : `0/1`
+- `stage_s` : `0/1`
+- `stage_g` : `0/1`
+- `source_image_id` : the stem of the original image filename (used for splitting)
+
+## Leakage-Safe Splitting
+
+When training, the dataset is split using `source_image_id` so that:
+
+- All crops derived from the same original image stay in the same split
+
+This prevents leakage where near-identical parasite crops from one image appear in both train and validation.
+
+## Practical Notes / Debugging
+
+- If you get **zero crops for a species**, check:
+  - `data/<Species>/gt` exists and contains masks
+  - `data/<Species>/img` contains the same filenames
+  - masks are not empty and have non-zero pixels
+
+- If you see **too many tiny crops**, increase `min_mask_area`.
+
+- If crops cut off parasite context, increase the padding fraction in `crop_and_square_pad` (currently `0.25`).
+
+## Where the Code Lives
+
+- Crop tool: `src/bin/mpidb_prep.rs`
+- Manifest-based dataset loader: `src/data.rs` (`MpIdbDataset`)
+- Training entry point: `src/training.rs`