Discovering hidden AI watermark patterns through signal analysis
This project reverse-engineers AI watermarking technologies by analyzing AI-generated and AI-edited images. We use signal processing techniques to discover watermark structures without access to proprietary neural network encoders/decoders.
| Analysis | Images | Detection Rate | Key Finding |
|---|---|---|---|
| Nano-150k Investigation | 123,268 | 99.9% | Multi-layer frequency + spatial watermarking |
| SynthID Analysis | 250 | 84% | Spread-spectrum phase encoding |
Analysis of 123,268 AI-edited image pairs from the Nano-150k dataset to detect and characterize embedded watermarks.
AI-edited images contain multi-layer watermarks using both frequency domain (DCT/DFT) and spatial domain (color shifts) embedding techniques. The watermarks are invisible to humans but detectable via statistical analysis.
| Metric | Rate | Description |
|---|---|---|
| Frequency Domain Modifications | 100.0% | All images show spectral changes |
| Significant Color Shifts | 95.3% | Mean shift > 1.0 in RGB channels |
| Perceptual Hash Changes | 66.0% | Invisible modifications detected |
| LSB Anomalies | 10.2% | Least significant bit patterns |
| 2+ Watermark Indicators | 99.9% | Multi-layer evidence |
| 3+ Watermark Indicators | 69.2% | Strong multi-layer evidence |
0 indicators: 0 ( 0.0%)
1 indicator: 122 ( 0.1%)
2 indicators: 37,832 (30.7%) βββββββββββββββ
3 indicators: 74,525 (60.5%) ββββββββββββββββββββββββββββββ
4 indicators: 10,789 ( 8.8%) ββββ
| Category | Image Pairs | Avg Freq Diff | Watermark Strength |
|---|---|---|---|
| hairstyle | 16,012 | 1.786 | High |
| sweet_headshot | 16,008 | 1.759 | High |
| black_headshot | 17,700 | 1.735 | High |
| background | 32,765 | 1.037 | Medium |
| time-change | 18,178 | 1.028 | Medium |
| action | 22,605 | 1.013 | Medium |
- Total Processing Time: 170.2 minutes
- Processing Rate: 12.1 pairs/second
- Success Rate: 100% (0 failed loads)
Analysis of 250 AI-generated images from Google Gemini to reverse-engineer SynthID watermarking.
SynthID uses spread-spectrum phase encoding in the frequency domainβnot LSB replacement or simple noise addition. The watermark embeds information through precise phase relationships at specific carrier frequencies.
| Carrier Frequency | Phase Coherence | Description |
|---|---|---|
| (Β±14, Β±14) | 99.99% | Primary diagonal carrier |
| (Β±126, Β±14) | 99.97% | Secondary horizontal |
| (Β±98, Β±14) | 99.94% | Tertiary carrier |
| (Β±128, Β±128) | 99.92% | Center frequency |
| (Β±210, Β±14) | 99.77% | Extended carrier |
| (Β±238, Β±14) | 99.71% | Edge carrier |
- Noise Correlation: ~0.218 between watermarked images
- Structure Ratio: ~1.32
- Detection Threshold: correlation > 0.179
reverse-SynthID/
βββ π README.md # This file
βββ π requirements.txt # Python dependencies
β
βββ π watermark_investigation/ # Nano-150k Analysis (NEW)
β βββ WATERMARK_EXTRACTED.png # Final extracted watermark
β βββ WATERMARK_FINAL_ANALYSIS.png # Comprehensive visualization
β βββ WATERMARK_enhanced_difference.png # Enhanced pattern
β βββ WATERMARK_frequency_spectrum.png # Frequency domain
β βββ WATERMARK_signed_pattern.png # Signed watermark
β βββ watermark_FULL_123k_results.json # Complete results
β βββ watermark_evidence/ # Visual evidence
β βββ *.py # Analysis scripts
β
βββ π» src/
β βββ analysis/
β β βββ synthid_codebook_finder.py # Pattern discovery
β β βββ deep_synthid_analysis.py # Frequency analysis
β βββ extraction/
β βββ synthid_codebook_extractor.py # Codebook extraction & detection
β
βββ π― artifacts/
β βββ codebook/
β β βββ synthid_codebook.pkl # Extracted codebook (9 MB)
β β βββ synthid_codebook_meta.json # Carrier frequencies
β βββ visualizations/ # Watermark images
β
βββ π data/
β βββ pure_white/ # 250 Gemini AI images
β
βββ π docs/
β βββ SYNTHID_CODEBOOK_ANALYSIS.md # Technical documentation
β
βββ πΌοΈ assets/
βββ synthid-watermark.jpeg # Cover image
git clone https://github.com/yourusername/reverse-SynthID.git
cd reverse-SynthID
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Full analysis on all 123k pairs (takes ~3 hours)
python watermark_investigation/watermark_full_123k_analysis.py
# Extract final watermark visualization
python watermark_investigation/extract_final_watermark.py
# Quick sample analysis (1000 pairs)
python watermark_investigation/watermark_full_analysis.pypython src/extraction/synthid_codebook_extractor.py detect "path/to/image.png" \
--codebook "artifacts/codebook/synthid_codebook.pkl"Output:
Detection Results:
Watermarked: True
Confidence: 1.0000
Correlation: 0.5355
Phase Match: 0.9571
Structure Ratio: 1.2753
python src/extraction/synthid_codebook_extractor.py extract "data/pure_white/" \
--output "./my_codebook.pkl"# Comprehensive pattern discovery
python src/analysis/synthid_codebook_finder.py
# Deep frequency analysis
python src/analysis/deep_synthid_analysis.py- Frequency Domain Analysis: Compute FFT differences between original and edited images
- LSB Pattern Detection: Analyze least significant bit distributions for anomalies
- Color Shift Measurement: Detect systematic RGB channel modifications
- Perceptual Hashing: Compare perceptual hashes to find invisible changes
- Multi-Indicator Scoring: Combine multiple detection methods for confidence
- Pattern Discovery: Analyze noise patterns across multiple images to find consistent structures
- Frequency Analysis: Use FFT to identify carrier frequencies with phase modulation
- Phase Coherence: Measure phase consistency at carrier frequencies
- Codebook Extraction: Build reference patterns from averaged signals
- Detection: Compare test image against codebook using correlation metrics
- Embedding Domains: Frequency (DCT/DFT) + Spatial (color shifts)
- Detection Methods: FFT analysis, LSB statistics, perceptual hashing
- Signal Strength: Mean freq diff ~1.32, color shifts 32-35 pixel values
- Robustness: Survives JPEG compression, consistent across edit types
- Categories Analyzed: background, action, time-change, headshot, hairstyle
- Embedding Domain: Frequency (FFT phase)
- Signal Strength: ~0.1-0.15 pixel values
- Carrier Count: 100+ frequency locations
- Robustness: Survives moderate compression
Nano-150k Multi-Indicator Detection:
def detect_watermark(original, edited):
indicators = 0
# 1. Frequency domain analysis
freq_diff = compute_fft_difference(original, edited)
if freq_diff > 0.5:
indicators += 1
# 2. Color shift detection
color_shift = compute_color_shift(original, edited)
if any(abs(shift) > 1.0 for shift in color_shift):
indicators += 1
# 3. LSB anomaly detection
lsb_deviation = compute_lsb_deviation(edited)
if any(dev > 0.02 for dev in lsb_deviation):
indicators += 1
# 4. Perceptual hash comparison
phash_dist = compute_phash_distance(original, edited)
if 5 < phash_dist <= 30:
indicators += 1
return indicators >= 2, indicatorsSynthID Detection:
def detect_synthid(image, codebook):
# 1. Extract noise pattern
noise = image - denoise(image)
# 2. Check carrier phase coherence
fft = fft2(noise)
phase_match = check_phases(fft, codebook.carriers)
# 3. Correlate with reference
correlation = correlate(noise, codebook.reference)
# 4. Apply decision thresholds
is_watermarked = (
correlation > 0.179 and
phase_match > 0.5 and
0.8 < structure_ratio < 1.8
)
return is_watermarked, confidence- SynthID: Identifying AI-generated images
- Arxiv Paper - SynthID-Image: Image watermarking at internet scale
This project is for research and educational purposes only. SynthID is proprietary technology owned by Google DeepMind. The extracted patterns and detection methods are intended for:
- Academic research on watermarking techniques
- Security analysis of AI-generated content identification
- Understanding spread-spectrum encoding methods
Research and educational use only. See LICENSE for details.
Made with π¬ by reverse engineering enthusiasts








