|
| 1 | +# ContactSampleOp Design |
| 2 | + |
| 3 | +## Goal |
| 4 | +Generate training samples for segment merge classifier from raw volumes. |
| 5 | +Output: Arrow/Feather files (cross-language: Python + TypeScript). |
| 6 | + |
| 7 | +## Inputs |
| 8 | +- `candidate_layer`: Candidate segmentation (also used for meshes) |
| 9 | +- `reference_layer`: Proofread reference segmentation |
| 10 | +- `affinity_layer`: 3-channel affinity volume (X, Y, Z axes) |
| 11 | + |
| 12 | +## Output Schema (Feather) |
| 13 | +| Column | Type | Description | |
| 14 | +|--------|------|-------------| |
| 15 | +| `seg_a`, `seg_b` | int64 | Segment pair IDs | |
| 16 | +| `should_merge` | int64 | 1=merge, 0=no merge | |
| 17 | +| `n_contacts` | int64 | Actual contact count | |
| 18 | +| `contacts` | list[list[float64]] | (max_contact_vx, 4) - [x, y, z, aff] in nm | |
| 19 | +| `pointcloud_a`, `pointcloud_b` | list[list[float64]] | (n_points, 3) surface points in nm | |
| 20 | +| `chunk_coord` | list[int64] | Chunk start coordinates (voxels) | |
| 21 | +| `chunk_size` | list[int64] | Chunk dimensions (voxels) | |
| 22 | +| `crop_pad` | list[int64] | Padding used (voxels) | |
| 23 | +| `candidate_path` | string | Candidate segmentation path | |
| 24 | +| `reference_path` | string | Reference segmentation path | |
| 25 | +| `affinity_path` | string | Affinity volume path | |
| 26 | + |
| 27 | +## Processing Steps |
| 28 | + |
| 29 | +1. **Read volumes** (parallel) - candidate, proofread, affinity with padding |
| 30 | + |
| 31 | +2. **Compute overlaps** - Between candidate segments and proofread connected components |
| 32 | + |
| 33 | +3. **Filter bad segments** (BEFORE contact detection): |
| 34 | + - **Small**: total segment size < `min_seg_size_vx` |
| 35 | + - **Mergers**: overlap 2+ proofread CCs with >= `min_overlap_vx` each |
| 36 | + - **Unclaimed**: no proofread CC overlap >= `min_overlap_vx` |
| 37 | + |
| 38 | +4. **Blackout** excluded segments (set to 0) |
| 39 | + |
| 40 | +5. **Find contacts** - Detect voxel boundaries between remaining segments |
| 41 | + - Check X, Y, Z axes separately, use axis-specific affinity |
| 42 | + - Average affinities when voxel touches neighbor on multiple axes |
| 43 | + - Filter to kernel region (inside padding) |
| 44 | + |
| 45 | +6. **Filter contact pairs**: |
| 46 | + - Low count (< `min_contact_vx`) |
| 47 | + - High count (> `max_contact_vx`) |
| 48 | + |
| 49 | +7. **Download meshes** - Only for segments in valid pairs, clip to bbox |
| 50 | + |
| 51 | +8. **Generate samples** per valid pair: |
| 52 | + - Compute affinity-weighted center of mass (COM) |
| 53 | + - Crop mesh points to sphere around COM (radius = min(crop_pad * resolution)) |
| 54 | + - Sample `n_pointcloud_points` from each mesh (seed=42) |
| 55 | + - Label: 1 if both segments overlap same proofread CC, else 0 |
| 56 | + - Pad contacts to fixed size |
| 57 | + |
| 58 | +9. **Write feather** - Empty chunks produce files with 0 rows |
| 59 | + |
| 60 | +## Parameters |
| 61 | +| Parameter | Default | Description | |
| 62 | +|-----------|---------|-------------| |
| 63 | +| `output_path` | required | Output directory for feather files | |
| 64 | +| `crop_pad` | (0,0,0) | Padding in voxels | |
| 65 | +| `min_seg_size_vx` | 2000 | Min overlap voxels per segment | |
| 66 | +| `min_overlap_vx` | 1000 | Min overlap for valid label | |
| 67 | +| `min_contact_vx` | 5 | Min contacts per pair | |
| 68 | +| `max_contact_vx` | 2048 | Max contacts (array size) | |
| 69 | +| `n_pointcloud_points` | 2048 | Points per mesh | |
0 commit comments