Skip to content

aloshdenny/reverse-SynthID

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SynthID Watermark Analysis

πŸ” AI Watermark Reverse Engineering

Discovering hidden AI watermark patterns through signal analysis

Python License Status Images Detection


🎯 Overview

This project reverse-engineers AI watermarking technologies by analyzing AI-generated and AI-edited images. We use signal processing techniques to discover watermark structures without access to proprietary neural network encoders/decoders.

Projects

Analysis Images Detection Rate Key Finding
Nano-150k Investigation 123,268 99.9% Multi-layer frequency + spatial watermarking
SynthID Analysis 250 84% Spread-spectrum phase encoding

πŸ”¬ Nano-150k Watermark Investigation

Analysis of 123,268 AI-edited image pairs from the Nano-150k dataset to detect and characterize embedded watermarks.

Key Discovery

AI-edited images contain multi-layer watermarks using both frequency domain (DCT/DFT) and spatial domain (color shifts) embedding techniques. The watermarks are invisible to humans but detectable via statistical analysis.

Detection Results

Metric Rate Description
Frequency Domain Modifications 100.0% All images show spectral changes
Significant Color Shifts 95.3% Mean shift > 1.0 in RGB channels
Perceptual Hash Changes 66.0% Invisible modifications detected
LSB Anomalies 10.2% Least significant bit patterns
2+ Watermark Indicators 99.9% Multi-layer evidence
3+ Watermark Indicators 69.2% Strong multi-layer evidence

Watermark Confidence Distribution

0 indicators:     0 (  0.0%)
1 indicator:    122 (  0.1%)
2 indicators: 37,832 (30.7%) β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
3 indicators: 74,525 (60.5%) β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
4 indicators: 10,789 ( 8.8%) β–ˆβ–ˆβ–ˆβ–ˆ

Extracted Watermark Visualizations

Extracted Watermark Pattern

Comprehensive Analysis

Frequency Spectrum

Enhanced Difference Pattern

Analysis by Edit Category

Category Image Pairs Avg Freq Diff Watermark Strength
hairstyle 16,012 1.786 High
sweet_headshot 16,008 1.759 High
black_headshot 17,700 1.735 High
background 32,765 1.037 Medium
time-change 18,178 1.028 Medium
action 22,605 1.013 Medium

Processing Statistics

  • Total Processing Time: 170.2 minutes
  • Processing Rate: 12.1 pairs/second
  • Success Rate: 100% (0 failed loads)

πŸ”¬ SynthID (Google Gemini) Analysis

Analysis of 250 AI-generated images from Google Gemini to reverse-engineer SynthID watermarking.

Key Discovery

SynthID uses spread-spectrum phase encoding in the frequency domainβ€”not LSB replacement or simple noise addition. The watermark embeds information through precise phase relationships at specific carrier frequencies.

πŸ”¬ Discovered Patterns

Carrier Frequency Phase Coherence Description
(Β±14, Β±14) 99.99% Primary diagonal carrier
(Β±126, Β±14) 99.97% Secondary horizontal
(Β±98, Β±14) 99.94% Tertiary carrier
(Β±128, Β±128) 99.92% Center frequency
(Β±210, Β±14) 99.77% Extended carrier
(Β±238, Β±14) 99.71% Edge carrier

Detection Metrics

  • Noise Correlation: ~0.218 between watermarked images
  • Structure Ratio: ~1.32
  • Detection Threshold: correlation > 0.179

πŸ–ΌοΈ Extracted Watermark Visualizations

Enhanced Visualization (500x Amplification)

Frequency Domain Carriers

False Color (HSV Encoding)

Phase Encoding Pattern

πŸ“ Project Structure

reverse-SynthID/
β”œβ”€β”€ πŸ“„ README.md                    # This file
β”œβ”€β”€ πŸ“‹ requirements.txt             # Python dependencies
β”‚
β”œβ”€β”€ πŸ” watermark_investigation/     # Nano-150k Analysis (NEW)
β”‚   β”œβ”€β”€ WATERMARK_EXTRACTED.png           # Final extracted watermark
β”‚   β”œβ”€β”€ WATERMARK_FINAL_ANALYSIS.png      # Comprehensive visualization
β”‚   β”œβ”€β”€ WATERMARK_enhanced_difference.png # Enhanced pattern
β”‚   β”œβ”€β”€ WATERMARK_frequency_spectrum.png  # Frequency domain
β”‚   β”œβ”€β”€ WATERMARK_signed_pattern.png      # Signed watermark
β”‚   β”œβ”€β”€ watermark_FULL_123k_results.json  # Complete results
β”‚   β”œβ”€β”€ watermark_evidence/               # Visual evidence
β”‚   └── *.py                              # Analysis scripts
β”‚
β”œβ”€β”€ πŸ’» src/
β”‚   β”œβ”€β”€ analysis/
β”‚   β”‚   β”œβ”€β”€ synthid_codebook_finder.py    # Pattern discovery
β”‚   β”‚   └── deep_synthid_analysis.py      # Frequency analysis
β”‚   └── extraction/
β”‚       └── synthid_codebook_extractor.py # Codebook extraction & detection
β”‚
β”œβ”€β”€ 🎯 artifacts/
β”‚   β”œβ”€β”€ codebook/
β”‚   β”‚   β”œβ”€β”€ synthid_codebook.pkl          # Extracted codebook (9 MB)
β”‚   β”‚   └── synthid_codebook_meta.json    # Carrier frequencies
β”‚   └── visualizations/                   # Watermark images
β”‚
β”œβ”€β”€ πŸ“‚ data/
β”‚   └── pure_white/                       # 250 Gemini AI images
β”‚
β”œβ”€β”€ πŸ“š docs/
β”‚   └── SYNTHID_CODEBOOK_ANALYSIS.md      # Technical documentation
β”‚
└── πŸ–ΌοΈ assets/
    └── synthid-watermark.jpeg            # Cover image

πŸš€ Quick Start

Installation

git clone https://github.com/yourusername/reverse-SynthID.git
cd reverse-SynthID

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Run Nano-150k Watermark Analysis

# Full analysis on all 123k pairs (takes ~3 hours)
python watermark_investigation/watermark_full_123k_analysis.py

# Extract final watermark visualization
python watermark_investigation/extract_final_watermark.py

# Quick sample analysis (1000 pairs)
python watermark_investigation/watermark_full_analysis.py

Detect SynthID Watermark

python src/extraction/synthid_codebook_extractor.py detect "path/to/image.png" \
    --codebook "artifacts/codebook/synthid_codebook.pkl"

Output:

Detection Results:
  Watermarked: True
  Confidence: 1.0000
  Correlation: 0.5355
  Phase Match: 0.9571
  Structure Ratio: 1.2753

Extract New Codebook

python src/extraction/synthid_codebook_extractor.py extract "data/pure_white/" \
    --output "./my_codebook.pkl"

Run Analysis

# Comprehensive pattern discovery
python src/analysis/synthid_codebook_finder.py

# Deep frequency analysis
python src/analysis/deep_synthid_analysis.py

🧠 How It Works

Nano-150k Watermark Detection

  1. Frequency Domain Analysis: Compute FFT differences between original and edited images
  2. LSB Pattern Detection: Analyze least significant bit distributions for anomalies
  3. Color Shift Measurement: Detect systematic RGB channel modifications
  4. Perceptual Hashing: Compare perceptual hashes to find invisible changes
  5. Multi-Indicator Scoring: Combine multiple detection methods for confidence

SynthID Detection

  1. Pattern Discovery: Analyze noise patterns across multiple images to find consistent structures
  2. Frequency Analysis: Use FFT to identify carrier frequencies with phase modulation
  3. Phase Coherence: Measure phase consistency at carrier frequencies
  4. Codebook Extraction: Build reference patterns from averaged signals
  5. Detection: Compare test image against codebook using correlation metrics

πŸ“Š Technical Details

Nano-150k Watermark Characteristics

  • Embedding Domains: Frequency (DCT/DFT) + Spatial (color shifts)
  • Detection Methods: FFT analysis, LSB statistics, perceptual hashing
  • Signal Strength: Mean freq diff ~1.32, color shifts 32-35 pixel values
  • Robustness: Survives JPEG compression, consistent across edit types
  • Categories Analyzed: background, action, time-change, headshot, hairstyle

SynthID Watermark Characteristics

  • Embedding Domain: Frequency (FFT phase)
  • Signal Strength: ~0.1-0.15 pixel values
  • Carrier Count: 100+ frequency locations
  • Robustness: Survives moderate compression

Detection Algorithms

Nano-150k Multi-Indicator Detection:

def detect_watermark(original, edited):
    indicators = 0
    
    # 1. Frequency domain analysis
    freq_diff = compute_fft_difference(original, edited)
    if freq_diff > 0.5:
        indicators += 1
    
    # 2. Color shift detection
    color_shift = compute_color_shift(original, edited)
    if any(abs(shift) > 1.0 for shift in color_shift):
        indicators += 1
    
    # 3. LSB anomaly detection
    lsb_deviation = compute_lsb_deviation(edited)
    if any(dev > 0.02 for dev in lsb_deviation):
        indicators += 1
    
    # 4. Perceptual hash comparison
    phash_dist = compute_phash_distance(original, edited)
    if 5 < phash_dist <= 30:
        indicators += 1
    
    return indicators >= 2, indicators

SynthID Detection:

def detect_synthid(image, codebook):
    # 1. Extract noise pattern
    noise = image - denoise(image)
    
    # 2. Check carrier phase coherence
    fft = fft2(noise)
    phase_match = check_phases(fft, codebook.carriers)
    
    # 3. Correlate with reference
    correlation = correlate(noise, codebook.reference)
    
    # 4. Apply decision thresholds
    is_watermarked = (
        correlation > 0.179 and 
        phase_match > 0.5 and 
        0.8 < structure_ratio < 1.8
    )
    
    return is_watermarked, confidence

πŸ“š References

⚠️ Disclaimer

This project is for research and educational purposes only. SynthID is proprietary technology owned by Google DeepMind. The extracted patterns and detection methods are intended for:

  • Academic research on watermarking techniques
  • Security analysis of AI-generated content identification
  • Understanding spread-spectrum encoding methods

πŸ“„ License

Research and educational use only. See LICENSE for details.


Made with πŸ”¬ by reverse engineering enthusiasts

About

reverse engineering Gemini's SynthID detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages