Block-matching motion estimation for image sequences. Takes a source/reference image and a target image, splits the target into NxN blocks, and finds the best matching block for each in the source using a three-step search. Outputs motion vectors and residuals; can reconstruct the target from source + vectors + residuals.
Video codecs (MPEG-1, H.261, H.264) avoid storing every frame independently. Instead they encode the difference between a frame and a prediction from a previously-decoded frame. This prediction is improved by compensating for motion — each block in the current frame is predicted by shifting a block from the reference frame by some (dx, dy) called a motion vector. The difference after compensation is the residual.
This project implements that motion estimation step for two images:
- Pad both images so dimensions are divisible by
block_size - For each
block_size×block_sizeblock in the target:- Extract a search window (±
search_paddingpx) from the source - Convert both to YCrCb; matching is done on the Y (luminance) channel only since the human eye is less sensitive to colour detail
- Run a three-step search to find the best-matching block:
- Start at the centre of the search window, check 9 positions spaced by
step - Move the centre to the best match, halve
step, repeat untilstep < 1
- Start at the centre of the search window, check 9 positions spaced by
- Compute the motion vector and the residual (pixel differences)
- If MSE < threshold, the match is good enough — store zero residual instead
- Extract a search window (±
- Reconstruct: for each block, take source block at motion-vector offset, add its residual
The three-step search visits O(log n) positions instead of O(n²) for exhaustive search. It does not guarantee the global minimum MSE but works well for typical motion magnitudes.
| Mode | Match metric | Residual stored on | Use case |
|---|---|---|---|
ycbcr (default) |
Y-channel MSE | YCrCb difference | Standard, good compression |
bgr |
Y-channel MSE | BGR difference | Full-color reconstruction |
y_only |
Y-channel MSE | Y-channel only | Grayscale, most efficient |
uv run python main.pyDrag & drop images onto the drop zones, or click to browse. Configure block size, threshold, and mode in the controls bar.
# Single frame pair
uv run python -m visioncompressor.cli single source.png target.png -o output/
# Batch process numbered frames (0.png, 1.png, 2.png ...)
uv run python -m visioncompressor.cli batch input_folder/ -o output/
# Create video from reconstructed frames
uv run python -m visioncompressor.cli video output/reconstructed/ -o video.avi
# Options
uv run python -m visioncompressor.cli single --mode bgr --block-size 16 --threshold 24 a.png b.pngVisionCompressor/
├── main.py # GUI entry point
├── visioncompressor/
│ ├── cli.py # CLI entry point
│ ├── core/
│ │ ├── block_matcher.py # Three-step search, MSE matching
│ │ ├── reconstructor.py # Frame reconstruction, batch, video
│ │ └── types.py # Default constants
│ ├── gui/
│ │ ├── main_window.py # PyQt6 GUI
│ │ └── workers.py # QThread workers
│ └── utils/
│ └── mse.py # Mean squared error
└── tests/
└── test_block_matcher.py
Requires Python ≥ 3.13 and uv.
git clone https://github.com/YounesBensafia/VisionCompressor.git
cd VisionCompressor
uv syncThis creates a virtual environment and installs all dependencies (opencv-python, matplotlib, PyQt6) pinned from uv.lock. All commands below should be run with uv run python ... or after source .venv/bin/activate.