A WordPress plugin that scrapes images from websites using the Firecrawl API and adds them to your WordPress media library.
- π₯ Dual Scraping Methods: Choose between Simple Mode (free, direct HTML) or Firecrawl API (advanced, JavaScript-heavy sites)
- πΌοΈ Automatic Image Import: Import scraped images directly to WordPress media library
- βοΈ Per-Image Customization: Edit individual settings for each image (filename, alt text, title, format, dimensions)
- ποΈ Bulk Options: Apply global settings to all images or customize each one individually
- π Image Processing: Convert formats (WebP, JPEG, PNG), resize to max width, compress to max file size
- βοΈ Selective Import: Choose which images to import with checkboxes
- π― CSS Class Targeting: Optionally scrape only images with specific CSS classes
- π Security First: Follows WordPress security best practices (nonces, sanitization, escaping)
- ποΈ Clean Architecture: Object-oriented design with proper namespacing
- π± Responsive UI: Works on desktop and mobile devices
- Clone or download this plugin to
wp-content/plugins/image-scraper/ - Activate the plugin through the WordPress admin panel
- Navigate to "Image Scraper" β "Settings" in the admin menu
- Choose your scraping method:
- Simple Mode (default): Free, works for most websites, no API key needed
- Firecrawl API: For JavaScript-heavy sites, requires API key from firecrawl.dev
- Navigate to "Image Scraper" in the WordPress admin menu
- Enter URL of the webpage containing images
- Optional: Target specific CSS class to scrape only certain images
- Click "Start Scraping" to fetch images
- Review Preview: See all found images in a grid
- Customize Images (optional):
- Click "Edit Settings" on any image to customize individually
- Or use global options at the bottom to apply settings to all images
- Select Images: Use checkboxes to choose which images to import
- Configure Options:
- Convert format (WebP, JPEG, PNG)
- Set maximum width (images larger will be resized)
- Set maximum file size (compress if needed)
- Add filename prefix
- Set alt text and title
- Click "Add to Media Library" to import selected images
Each image can have individual settings that override global defaults:
- Filename: Custom filename for this specific image
- Alt Text: SEO-friendly alt text
- Title: Image title in media library
- Format: Convert to WebP, JPEG, or PNG
- Max Width: Resize if wider than specified (maintains aspect ratio)
- Max Size: Compress to stay under file size limit (in KB)
- β Free - no API costs
- β Fast - direct HTTP requests
- β No API key required
- β Works for most standard websites
- β Cannot handle JavaScript-rendered content
- β May be blocked by anti-bot protections
- β Handles JavaScript-heavy sites (React, Vue, Angular)
- β Bypasses anti-bot protections
- β More reliable for protected content
image-scraper/
βββ image-scraper.php # Main plugin file (bootstrap)
βββ includes/ # Core plugin classes
β βββ class-core.php # Main orchestrator
β βββ class-loader.php # Hooks/filters manager
β βββ class-activator.php # Activation hooks
β βββ class-deactivator.php # Deactivation hooks
β βββ class-i18n.php # Internationalization
β βββ class-firecrawl-api.php # Firecrawl API integration
β βββ class-html-scraper.php # Simple Mode HTML scraper
β βββ class-media-importer.php # Image processing & import
βββ admin/ # Admin-specific functionality
β βββ class-admin.php # Admin menu and pages
β βββ class-settings.php # Settings API integration
β βββ class-ajax-handler.php # AJAX request handlers
β βββ css/
β β βββ admin.css # Admin styles
β βββ js/
β β βββ admin.js # Admin JavaScript (AJAX)
β βββ partials/ # View templates
β βββ settings-display.php # Settings page UI
β βββ scraper-display.php # Main scraper page UI
βββ .github/
βββ copilot-instructions.md # AI coding assistant guide
All classes use the Image_Scraper namespace to avoid conflicts:
Image_Scraper\Core- Main plugin orchestrator, coordinates all componentsImage_Scraper\Loader- Manages WordPress hooks/filters registrationImage_Scraper\Firecrawl_Api- Firecrawl API integration for advanced scrapingImage_Scraper\Html_Scraper- Simple Mode HTML scraper (no API needed)Image_Scraper\Media_Importer- Image processing, format conversion, and media library importImage_Scraper\Admin\Admin- Handles admin menu, pages, and asset loadingImage_Scraper\Admin\Settings- Settings API registration and sanitizationImage_Scraper\Admin\Ajax_Handler- AJAX request handlers for scraping and importingImage_Scraper\Activator- Plugin activation logicImage_Scraper\Deactivator- Plugin deactivation logicImage_Scraper\I18n- Translation/localization support
- Autoloading: PSR-4-style autoloader converts namespaced class names to file paths
- Separation of Concerns: Admin, public, and core logic in separate directories
- Hook Abstraction:
Loaderclass centralizes all WordPress hooks - Settings API: Full WordPress Settings API integration with validation
- Security First: All inputs sanitized, all outputs escaped, nonces everywhere
The plugin stores settings in a single option: image_scraper_settings
Available settings:
scraping_method(string) - Scraping method: 'simple' (default) or 'firecrawl'firecrawl_api_key(string) - Your Firecrawl API key (only needed for Firecrawl mode)max_images(int) - Maximum images per scrape (1-500, default: 50)timeout(int) - API request timeout in seconds (5-300, default: 30)
- Batch processing for multiple URLs
- Background processing with WordPress cron for large scrapes
- Schedule recurring scrapes
- Custom taxonomy for scraped images
- Import/export settings
- WP-CLI commands for automation
- Srcset support for responsive images
- Image gallery creation from scraped images
- Auto-detection of lazy-loaded images (already partially supported)
- API for third-party integrationsrsion, resizing, compression
- Returns success/error count
image_scraper_validate_api- Test Firecrawl API key- Only available in Firecrawl mode
- Validates API connectivity
-
Create Firecrawl API Service Class
- Location:
includes/class-firecrawl-api.php - Methods:
scrape_url(),validate_api_key(),get_images() - Handle API authentication and error responses
- Location:
-
Create AJAX Handler
- Add AJAX action:
wp_ajax_image_scraper_scrape - Validate nonce and capabilities
- Call Firecrawl API service
- Return JSON response
- Add AJAX action:
-
Create Media Library Importer
- Location:
includes/class-media-importer.php - Use
media_sideload_image()or custom implementation - Handle duplicate detection
- Set proper image metadata (alt text, title, caption)
- Location:
The plugin can optionally use the Firecrawl API for advanced web scraping:
- When to use: JavaScript-heavy sites, SPAs, protected content
- Authentication: API key in request headers
- Documentation: https://docs.firecrawl.dev
- Getting started: Sign up at https://firecrawl.dev
The plugin works perfectly fine without Firecrawl using Simple Mode for standard HTML websites.
- Batch processing for multiple URLs
- Background processing with WordPress cron
- Image optimization before import
- Custom taxonomy for scraped images
- Export/import settings
- WP-CLI commands
- Unit tests with PHPUnit
The plugin uses the Firecrawl API for web scraping. Key endpoints:
- Authentication: API key in
Authorizationheader - Scrape endpoint: POST to scrape URLs
- Rate limits: Varies by plan (handle gracefully)
Documentation: https://docs.firecrawl.dev
This plugin follows WordPress coding standards:
β Security:
- Nonce verification on all form submissions
- Capability checks (
manage_options) - Input sanitization (
sanitize_text_field(),absint()) - Output escaping (
esc_html(),esc_attr(),esc_url())
β Naming Conventions:
- Classes:
Image_Scraper_Class_Name - Functions:
image_scraper_function_name() - Hooks:
image_scraper_hook_name
β Best Practices:
- WordPress functions over PHP alternatives
- Proper enqueueing of scripts/styles
- Translation-ready strings
- Direct file access prevention
# Activate plugin via WP-CLI
lando wp plugin activate image-scraper
# Deactivate plugin
lando wp plugin deactivate image-scraper
# Check plugin status
lando wp plugin list
# View plugin options
lando wp option get image_scraper_settings
# Update API key via CLI
lando wp option patch update image_scraper_settings firecrawl_api_key "your-api-key"GPL v2 or later
James Welbes - https://jameswelbes.com
Your Name (customize in image-scraper.php)