From 94c7495641ea7e30ee8ea3d88c2265333096e2b4 Mon Sep 17 00:00:00 2001 From: Joe Futrelle Date: Fri, 29 May 2026 09:06:33 -0400 Subject: [PATCH 1/3] README with context and usage --- README.md | 97 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 96 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 792d765..4cb4bbc 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,97 @@ # ifcb-features -Python implementation of IFCB segmentation and feature extraction code + +A Python implementation of segmentation and feature extraction for +Imaging FlowCytobot (IFCB) imagery. + +IFCB is a submersible imaging flow cytometer that captures images of individual +plankton cells and other particles. Each raw IFCB sample (a "bin") contains many +regions of interest (ROIs) — small grayscale images, one per imaged particle. +This library takes those ROIs and, for each one: + +1. **Segments** the particle from the background, producing a binary "blob" mask. +2. **Extracts features** from the blob and the original ROI — morphological + measurements (area, biovolume, axis lengths, perimeter, convexity, Feret + diameters, …). + +The implementation is designed to reproduce the numerical output of the original +MATLAB IFCB feature extraction code as closely as possible, so that features +computed here are comparable with historical IFCB datasets. + +## Installation + +The package targets Python 3.7+ and depends on `numpy`, `scipy`, +`scikit-image`, and `scikit-learn`, plus two WHOI packages installed directly +from GitHub ([`pyifcb`](https://github.com/joefutrelle/pyifcb) for reading IFCB +data and [`phasepack`](https://github.com/WHOIGit/phasepack) for phase +congruency used during segmentation). + +```bash +pip install git+https://github.com/WHOIGit/ifcb-features.git +``` + +Or, for local development: + +```bash +git clone https://github.com/WHOIGit/ifcb-features.git +cd ifcb-features +pip install -e . +``` + +## Usage + +The main entry point is [`extract_slim_features.py`](extract_slim_features.py). +It reads whole IFCB bins (via `pyifcb`), computes the per-ROI feature set, and +writes the results to disk: + +```bash +python extract_slim_features.py [--bins BIN1 BIN2 ...] +``` + +- `data_directory` — directory of IFCB data (read via `pyifcb`). +- `output_directory` — where the outputs are written. +- `--bins` — optional list of bin names (e.g. `D20240423T115846_IFCB127`) to + process; if omitted, every bin in the data directory is processed. + +For each sample this produces two files in the output directory: + +- `_features_v4.csv` — one row per ROI, with a `roi_number` column and one + column per feature. +- `_blobs_v4.zip` — the segmented blob masks, one 1-bit PNG per ROI. + +### Docker + +A container image is built and published to the GitHub Container Registry, with +the batch extractor as its entry point: + +```bash +docker run --rm \ + -v /path/to/ifcb/data:/data \ + -v /path/to/output:/output \ + ghcr.io/whoigit/ifcb-features:latest \ + /data /output --bins D20240423T115846_IFCB127 +``` + +You can also build it locally: + +```bash +docker build -t ifcb-features . +``` + +## License + +MIT — see [LICENSE](LICENSE). + +--- + +## Note: deprecated "non-slim" features + +Earlier versions of this code also computed a larger set of features — among +them Histogram of Oriented Gradients (HOG), ring/wedge power spectra, invariant +moments, and texture and symmetry statistics. These are **deprecated** and +retained only for historical reasons; they are not part of the output of +`extract_slim_features.py`. + +The underlying machinery still exists on the `RoiFeatures` and `BlobFeatures` +classes in [`ifcb_features/all.py`](ifcb_features/all.py) for anyone who needs +to reproduce older results, but new work should rely on the slim feature set +produced by the batch extractor. From 828645fd63d0cd02e8224b1d5ceaff4b139e3341 Mon Sep 17 00:00:00 2001 From: Joe Futrelle Date: Fri, 29 May 2026 09:14:41 -0400 Subject: [PATCH 2/3] feature description --- FEATURES.md | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++ README.md | 3 ++- 2 files changed, 68 insertions(+), 1 deletion(-) create mode 100644 FEATURES.md diff --git a/FEATURES.md b/FEATURES.md new file mode 100644 index 0000000..46115df --- /dev/null +++ b/FEATURES.md @@ -0,0 +1,66 @@ +# Slim features + +This document describes the per-ROI features produced by +[`extract_slim_features.py`](extract_slim_features.py) — the columns of each +`_features_v4.csv` file (alongside the `roi_number` column that identifies +the ROI within the bin). + +A few conventions apply throughout: + +- Each ROI is segmented into one or more **blobs** (connected regions). The + blobs are ordered largest-area first. +- Features whose names are *not* prefixed with `summed` describe the **largest + blob** in the ROI. +- Features prefixed with `summed` are aggregated across **all** blobs in the ROI. +- All lengths and areas are in **pixels** (or pixels² / pixels³ as appropriate); + No physical scaling is applied here. +- Definitions are chosen to match the original MATLAB IFCB feature code so that + values are comparable with historical datasets. + +## Largest-blob features + +| Feature | Description | +| --- | --- | +| `Area` | Number of pixels in the blob. | +| `Biovolume` | Estimated 3-D volume of the particle (pixels³), computed with the Moberg & Sosik algorithm — either a solid-of-revolution model or a distance-map model is selected automatically based on the blob's shape. | +| `BoundingBox_xwidth` | Width (in x) of the blob's axis-aligned bounding box. | +| `BoundingBox_ywidth` | Height (in y) of the blob's axis-aligned bounding box. | +| `ConvexArea` | Area (pixels²) enclosed by the blob's convex hull. | +| `ConvexPerimeter` | Perimeter length of the blob's convex hull. | +| `Eccentricity` | Eccentricity of the ellipse with the same second moments as the blob (0 = circle, approaching 1 = elongated). | +| `EquivDiameter` | Diameter of a circle with the same area as the blob. | +| `Extent` | Fraction of the bounding box filled by the blob (area ÷ bounding-box area). | +| `MajorAxisLength` | Length of the major axis of the best-fit ellipse. | +| `MinorAxisLength` | Length of the minor axis of the best-fit ellipse. | +| `Orientation` | Angle (degrees) of the ellipse's major axis relative to the horizontal. | +| `Perimeter` | Perimeter length of the blob boundary (Benkrid perimeter estimate). | +| `RepresentativeWidth` | Representative width of the particle from the Moberg & Sosik biovolume model. | +| `Solidity` | Area ÷ convex area; how completely the blob fills its convex hull (1 = fully convex). | +| `SurfaceArea` | Estimated 3-D surface area (pixels²) from the Moberg & Sosik biovolume model. | +| `maxFeretDiameter` | Maximum Feret (caliper) diameter — the largest distance across the blob over all orientations. | +| `minFeretDiameter` | Minimum Feret (caliper) diameter — the smallest such distance. | + +## Whole-ROI and summed features + +| Feature | Description | +| --- | --- | +| `numBlobs` | Number of blobs segmented from the ROI. | +| `summedArea` | Sum of `Area` over all blobs. | +| `summedBiovolume` | Sum of `Biovolume` over all blobs. | +| `summedConvexArea` | Sum of `ConvexArea` over all blobs. | +| `summedConvexPerimeter` | Sum of `ConvexPerimeter` over all blobs. | +| `summedMajorAxisLength` | Sum of `MajorAxisLength` over all blobs. | +| `summedMinorAxisLength` | Sum of `MinorAxisLength` over all blobs. | +| `summedPerimeter` | Sum of `Perimeter` over all blobs. | +| `summedSurfaceArea` | Sum of `SurfaceArea` over all blobs. | + +## Derived ratios + +These are computed from the features above. Each is set to `NaN` when its +denominator is zero (e.g. an empty ROI with no blobs). + +| Feature | Description | +| --- | --- | +| `Area_over_PerimeterSquared` | `Area` ÷ `Perimeter²` — a dimensionless compactness measure (largest blob). | +| `Area_over_Perimeter` | `Area` ÷ `Perimeter` (largest blob). | +| `summedConvexPerimeter_over_Perimeter` | `summedConvexPerimeter` ÷ `summedPerimeter`, aggregated across all blobs — a measure of overall boundary roughness. | \ No newline at end of file diff --git a/README.md b/README.md index 4cb4bbc..28cdcac 100644 --- a/README.md +++ b/README.md @@ -55,7 +55,8 @@ python extract_slim_features.py [--bins BIN1 For each sample this produces two files in the output directory: - `_features_v4.csv` — one row per ROI, with a `roi_number` column and one - column per feature. + column per feature. See [FEATURES.md](FEATURES.md) for a description of each + feature. - `_blobs_v4.zip` — the segmented blob masks, one 1-bit PNG per ROI. ### Docker From ee8d3d535f2c3b749c5f3ebab1a516cce8c46012 Mon Sep 17 00:00:00 2001 From: Joe Futrelle Date: Fri, 29 May 2026 09:35:11 -0400 Subject: [PATCH 3/3] Changing targeted Python version note --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 28cdcac..a20e89c 100644 --- a/README.md +++ b/README.md @@ -19,7 +19,7 @@ computed here are comparable with historical IFCB datasets. ## Installation -The package targets Python 3.7+ and depends on `numpy`, `scipy`, +The package targets Python 3.10+ and depends on `numpy`, `scipy`, `scikit-image`, and `scikit-learn`, plus two WHOI packages installed directly from GitHub ([`pyifcb`](https://github.com/joefutrelle/pyifcb) for reading IFCB data and [`phasepack`](https://github.com/WHOIGit/phasepack) for phase