Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions FEATURES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Slim features

This document describes the per-ROI features produced by
[`extract_slim_features.py`](extract_slim_features.py) — the columns of each
`<bin>_features_v4.csv` file (alongside the `roi_number` column that identifies
the ROI within the bin).

A few conventions apply throughout:

- Each ROI is segmented into one or more **blobs** (connected regions). The
blobs are ordered largest-area first.
- Features whose names are *not* prefixed with `summed` describe the **largest
blob** in the ROI.
- Features prefixed with `summed` are aggregated across **all** blobs in the ROI.
- All lengths and areas are in **pixels** (or pixels² / pixels³ as appropriate);
No physical scaling is applied here.
- Definitions are chosen to match the original MATLAB IFCB feature code so that
values are comparable with historical datasets.

## Largest-blob features

| Feature | Description |
| --- | --- |
| `Area` | Number of pixels in the blob. |
| `Biovolume` | Estimated 3-D volume of the particle (pixels³), computed with the Moberg & Sosik algorithm — either a solid-of-revolution model or a distance-map model is selected automatically based on the blob's shape. |
| `BoundingBox_xwidth` | Width (in x) of the blob's axis-aligned bounding box. |
| `BoundingBox_ywidth` | Height (in y) of the blob's axis-aligned bounding box. |
| `ConvexArea` | Area (pixels²) enclosed by the blob's convex hull. |
| `ConvexPerimeter` | Perimeter length of the blob's convex hull. |
| `Eccentricity` | Eccentricity of the ellipse with the same second moments as the blob (0 = circle, approaching 1 = elongated). |
| `EquivDiameter` | Diameter of a circle with the same area as the blob. |
| `Extent` | Fraction of the bounding box filled by the blob (area ÷ bounding-box area). |
| `MajorAxisLength` | Length of the major axis of the best-fit ellipse. |
| `MinorAxisLength` | Length of the minor axis of the best-fit ellipse. |
| `Orientation` | Angle (degrees) of the ellipse's major axis relative to the horizontal. |
| `Perimeter` | Perimeter length of the blob boundary (Benkrid perimeter estimate). |
| `RepresentativeWidth` | Representative width of the particle from the Moberg & Sosik biovolume model. |
| `Solidity` | Area ÷ convex area; how completely the blob fills its convex hull (1 = fully convex). |
| `SurfaceArea` | Estimated 3-D surface area (pixels²) from the Moberg & Sosik biovolume model. |
| `maxFeretDiameter` | Maximum Feret (caliper) diameter — the largest distance across the blob over all orientations. |
| `minFeretDiameter` | Minimum Feret (caliper) diameter — the smallest such distance. |

## Whole-ROI and summed features

| Feature | Description |
| --- | --- |
| `numBlobs` | Number of blobs segmented from the ROI. |
| `summedArea` | Sum of `Area` over all blobs. |
| `summedBiovolume` | Sum of `Biovolume` over all blobs. |
| `summedConvexArea` | Sum of `ConvexArea` over all blobs. |
| `summedConvexPerimeter` | Sum of `ConvexPerimeter` over all blobs. |
| `summedMajorAxisLength` | Sum of `MajorAxisLength` over all blobs. |
| `summedMinorAxisLength` | Sum of `MinorAxisLength` over all blobs. |
| `summedPerimeter` | Sum of `Perimeter` over all blobs. |
| `summedSurfaceArea` | Sum of `SurfaceArea` over all blobs. |

## Derived ratios

These are computed from the features above. Each is set to `NaN` when its
denominator is zero (e.g. an empty ROI with no blobs).

| Feature | Description |
| --- | --- |
| `Area_over_PerimeterSquared` | `Area` ÷ `Perimeter²` — a dimensionless compactness measure (largest blob). |
| `Area_over_Perimeter` | `Area` ÷ `Perimeter` (largest blob). |
| `summedConvexPerimeter_over_Perimeter` | `summedConvexPerimeter` ÷ `summedPerimeter`, aggregated across all blobs — a measure of overall boundary roughness. |
98 changes: 97 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,98 @@
# ifcb-features
Python implementation of IFCB segmentation and feature extraction code

A Python implementation of segmentation and feature extraction for
Imaging FlowCytobot (IFCB) imagery.

IFCB is a submersible imaging flow cytometer that captures images of individual
plankton cells and other particles. Each raw IFCB sample (a "bin") contains many
regions of interest (ROIs) — small grayscale images, one per imaged particle.
This library takes those ROIs and, for each one:

1. **Segments** the particle from the background, producing a binary "blob" mask.
2. **Extracts features** from the blob and the original ROI — morphological
measurements (area, biovolume, axis lengths, perimeter, convexity, Feret
diameters, …).

The implementation is designed to reproduce the numerical output of the original
MATLAB IFCB feature extraction code as closely as possible, so that features
computed here are comparable with historical IFCB datasets.

## Installation

The package targets Python 3.10+ and depends on `numpy`, `scipy`,
`scikit-image`, and `scikit-learn`, plus two WHOI packages installed directly
from GitHub ([`pyifcb`](https://github.com/joefutrelle/pyifcb) for reading IFCB
data and [`phasepack`](https://github.com/WHOIGit/phasepack) for phase
congruency used during segmentation).

```bash
pip install git+https://github.com/WHOIGit/ifcb-features.git
```

Or, for local development:

```bash
git clone https://github.com/WHOIGit/ifcb-features.git
cd ifcb-features
pip install -e .
```

## Usage

The main entry point is [`extract_slim_features.py`](extract_slim_features.py).
It reads whole IFCB bins (via `pyifcb`), computes the per-ROI feature set, and
writes the results to disk:

```bash
python extract_slim_features.py <data_directory> <output_directory> [--bins BIN1 BIN2 ...]
```

- `data_directory` — directory of IFCB data (read via `pyifcb`).
- `output_directory` — where the outputs are written.
- `--bins` — optional list of bin names (e.g. `D20240423T115846_IFCB127`) to
process; if omitted, every bin in the data directory is processed.

For each sample this produces two files in the output directory:

- `<bin>_features_v4.csv` — one row per ROI, with a `roi_number` column and one
column per feature. See [FEATURES.md](FEATURES.md) for a description of each
feature.
- `<bin>_blobs_v4.zip` — the segmented blob masks, one 1-bit PNG per ROI.

### Docker

A container image is built and published to the GitHub Container Registry, with
the batch extractor as its entry point:

```bash
docker run --rm \
-v /path/to/ifcb/data:/data \
-v /path/to/output:/output \
ghcr.io/whoigit/ifcb-features:latest \
/data /output --bins D20240423T115846_IFCB127
```

You can also build it locally:

```bash
docker build -t ifcb-features .
```

## License

MIT — see [LICENSE](LICENSE).

---

## Note: deprecated "non-slim" features

Earlier versions of this code also computed a larger set of features — among
them Histogram of Oriented Gradients (HOG), ring/wedge power spectra, invariant
moments, and texture and symmetry statistics. These are **deprecated** and
retained only for historical reasons; they are not part of the output of
`extract_slim_features.py`.

The underlying machinery still exists on the `RoiFeatures` and `BlobFeatures`
classes in [`ifcb_features/all.py`](ifcb_features/all.py) for anyone who needs
to reproduce older results, but new work should rely on the slim feature set
produced by the batch extractor.
Loading