Skip to content
This repository was archived by the owner on Apr 27, 2026. It is now read-only.

jakedev796/szurubooru-scripts

Repository files navigation

Repository Archived

This repository has been archived and is no longer actively maintained.

All future development, updates, bug fixes, and new releases will be handled in the new repository:

Szurubooru Media Manager v3.0

A Python script for automating media upload and AI auto-tagging for Szurubooru image boards.

Performance Highlights

  • 10-20x Faster: Process 100k images in hours, not days
  • 15-25 files/sec: Upload speeds (vs. ~1 file/sec traditional)
  • 8-15 files/sec: AI tagging with GPU batching
  • Parallel Architecture: True concurrent processing

Important change logs and feature updates are documented at the bottom of this README.

Expected Performance

Hardware Setup Upload Rate Tagging Rate Overall Rate
i9 + RTX 4080 Super 20-25 files/sec 12-15 files/sec 15-20 files/sec
i7 + RTX 3080 15-20 files/sec 8-12 files/sec 10-15 files/sec
i5 + RTX 3060 10-15 files/sec 6-10 files/sec 8-12 files/sec
CPU Only 8-12 files/sec 2-4 files/sec 5-8 files/sec

Installation

Prerequisites

  • Python 3.8+
  • CUDA-compatible GPU (recommended for maximum performance)
  • Szurubooru instance running and accessible

Quick Install

# Clone the repository
git clone https://github.com/jakedev796/szurubooru-scripts.git
cd szurubooru-scripts

# Install dependencies
pip install -r requirements.txt

# For maximum GPU performance (recommended)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# If you encounter PyTorch/transformers compatibility issues, try:
# pip install torch==1.13.1 torchvision==0.14.1 --index-url https://download.pytorch.org/whl/cu118

Quick Start

# 1. Create optimized configuration
python szurubooru_manager.py --create-config

# 2. Edit config.json with your settings
# 3. Test connection
python szurubooru_manager.py --test-connection

# 4. Run high-performance processing
python szurubooru_manager.py --mode optimized

Docker Quick Start

For GPU users:

docker-compose up -d
docker-compose logs -f

For CPU-only users:

docker-compose -f docker-compose.cpu.yml up -d
docker-compose -f docker-compose.cpu.yml logs -f

Docker Configuration

You can customize the container behavior using environment variables in your docker-compose file:

environment:
  - MODE=optimized                    # Mode: optimized, upload, tag, untagged, add-characters
  - SCHEDULE_ENABLED=true             # Enable/disable scheduling: true, false
  - SCHEDULE_TIME=*/30 * * * *        # Cron schedule (every 30 minutes)

Examples:

# Run once in upload mode (no scheduling)
- MODE=upload
- SCHEDULE_ENABLED=false

# Run every hour in tag mode
- MODE=tag
- SCHEDULE_ENABLED=true
- SCHEDULE_TIME=0 * * * *

# Run daily at 2 AM in optimized mode
- MODE=optimized
- SCHEDULE_ENABLED=true
- SCHEDULE_TIME=0 2 * * *

Operation Modes

optimized (Default)

Full pipeline: Upload + AI tagging with maximum performance

  • Uploads new files with appropriate tags (tagme for images, video for videos)
  • Processes images with WD14 Tagger
  • Skips AI tagging for video files

upload

Upload-only mode for maximum speed

  • Uploads files without AI tagging
  • Videos get video tag, images get tagme tag

tag

Comprehensive tagging for all posts needing tags

  • Processes both tagme posts AND completely untagged posts
  • Continuous processing until no more posts need tagging
  • Video support with automatic video tag assignment

untagged

Process posts with no tags at all

  • Finds posts using tag-count:0 API query
  • Adds video tag to untagged videos
  • AI tags untagged images

add-characters

Brute-force character tagging for your collection

  • Processes posts in your Szurubooru instance
  • Only extracts and adds character tags from WD14 Tagger
  • Preserves all existing tags - only adds missing character tags
  • Range support: Use --start-post and --end-post to process specific ranges

Configuration

Generate Optimized Config

python szurubooru_manager.py --create-config

Creates an optimized config.json with high-performance defaults:

{
  "szurubooru_url": "http://localhost:8080",
  "username": "your_username", 
  "api_token": "your_api_token_here",
  "upload_directory": "./uploads",
  "supported_extensions": ["jpg", "jpeg", "png", "gif", "webm", "mp4", "webp"],
  "tagme_tag": "tagme",
  "video_tag": "video",
  
  "max_concurrent_uploads": 12,
  "gpu_batch_size": 8,
  "upload_workers": 8,
  "tagging_workers": 2,
  "pipeline_enabled": true,
  "connection_pool_size": 20,
  "upload_timeout": 30.0,
  "tagging_timeout": 60.0,
  
  "batch_size": 0,
  "gpu_enabled": true,
  "confidence_threshold": 0.5,
  "max_tags_per_image": 20,
  "delete_after_upload": true,
  "retry_attempts": 3,
  "retry_delay": 1.0
}

Usage

Core Modes

Optimized Mode (Recommended)

python szurubooru_manager.py --mode optimized

Upload-Only Mode (Maximum Speed)

python szurubooru_manager.py --mode upload

Tagging-Only Mode

python szurubooru_manager.py --mode tag  

Character-Only Mode

# Add characters to all posts
python szurubooru_manager.py --mode add-characters

# Add characters to posts 1-70000
python szurubooru_manager.py --mode add-characters --start-post 1 --end-post 70000

Advanced Usage

Custom Configuration:

python szurubooru_manager.py --config custom.json --mode optimized

Scheduled Processing:

# Every 30 minutes with high performance
python szurubooru_manager.py --schedule "*/30 * * * *" --mode optimized

Performance Benchmarking:

python szurubooru_manager.py --mode optimized --benchmark

Tag Synchronization Manager

The tag_sync_manager.py script helps maintain your Szurubooru tag database by synchronizing tags with popular aliases from external sources like Danbooru.

Features

  • CSV Import: Import tag data from CSV files (e.g., Danbooru tag exports)
  • Category Mapping: Automatically assign proper categories (default, copyright, character)
  • Alias Management: Add popular aliases to existing tags
  • Smart Cleanup: Remove incorrect suggestions and unused tags
  • Batch Processing: Efficiently handle large tag databases
  • Dry Run Mode: Preview changes before applying them

Basic Usage

# Test connection and preview changes
python tag_sync_manager.py --dry-run --sample

# Sync tags from CSV file
python tag_sync_manager.py --csv danbooru_tags.csv

# Clean up suggestions and unused tags
python tag_sync_manager.py --cleanup-unused --no-create

# Update only categories, skip aliases
python tag_sync_manager.py --no-aliases

CSV File Format

The script expects a CSV file with columns:

  • tag: Tag name
  • category: Category number (0=default, 3=copyright, 4=character)
  • count: Usage count
  • alias: Comma-separated list of aliases

Command Line Options

  • --dry-run: Preview changes without applying them
  • --sample: Process only one batch for testing
  • --no-create: Don't create missing tags
  • --no-categories: Don't update categories
  • --no-aliases: Don't update aliases
  • --cleanup-unused: Delete tags with 0 usage
  • --batch-size N: Set batch size for API calls

Troubleshooting

Performance Issues

  • Check GPU utilization: nvidia-smi
  • Increase max_concurrent_uploads gradually
  • Monitor server response times
  • Verify SSD storage (not HDD) for image files

GPU Issues

GPU not detected?

python -c "import torch; print(torch.cuda.is_available())"

Out of VRAM?

  • Reduce gpu_batch_size to 4
  • Close other GPU applications
  • Use CPU mode: "gpu_enabled": false

Network Issues

  • Increase upload_timeout and tagging_timeout
  • Reduce max_concurrent_uploads
  • Check server capacity and network stability

Video File Issues

"Unhandled file type: application/octet-stream" error?

  1. Check if the file is actually a valid video: --check-video "file.mp4"
  2. The script automatically tries fallback methods for problematic videos
  3. Try re-encoding the video with a different tool

Security Best Practices

  • Use API Tokens: More secure than passwords
  • HTTPS Only: For production deployments
  • File Permissions: Restrict config file access (chmod 600 config.json)
  • Network Security: Use VPN for remote servers
  • Regular Updates: Keep dependencies current

What's New in v3.0

Read this commit for the full details.

What's New in v2.0

Architectural Overhaul

  • True Parallel Processing: Concurrent uploads instead of sequential
  • GPU Batch Processing: Process multiple images simultaneously on GPU
  • Pipeline Architecture: Upload and tagging phases run independently
  • Connection Pooling: Multiple concurrent API connections
  • Async Everything: Non-blocking I/O operations throughout

Video File Support

  • Smart Video Detection: Automatically identifies video files by extension
  • Video Tagging: Adds 'video' tag to video files instead of 'tagme'
  • AI Tagging Skip: Videos skip WD14 processing (which doesn't support videos)
  • Supported Formats: MP4, WebM, AVI, MOV, MKV, FLV, WMV, M4V, 3GP, OGV

Smart Tag Category Assignment

  • Automatic Categorization: Tags are automatically assigned to appropriate categories
  • Meta Tags: tagme, video, animated, gif, nsfw, etc. → meta category
  • Character Tags: WD14-detected character names → character category
  • General Tags: All other AI-detected tags → default category

Untagged Posts Processing

  • Find Untagged Posts: Uses tag-count:0 API query to find posts with no tags
  • AI Tagging: Processes untagged images with WD14 Tagger
  • Batch Processing: Efficiently handles large numbers of untagged posts

License

Open source - see LICENSE file for details.

About

A Python script for automating media upload and AI auto-tagging for Szurubooru image boards.

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages