Target Product Data Scraper - Selenium (Python)

A production-ready python scraper for extracting product data from target using Selenium. This scraper efficiently extracts aggregateRating, availability, brand and related data from target pages.

What This Scraper Extracts
Quick Start
Supported URLs
Configuration
Output Schema
Anti-Bot Protection
How It Works
Error Handling & Troubleshooting
Alternative Implementations

What This Scraper Extracts

Product Information:
- Aggregaterating: Aggregated rating information
- Availability: Product availability status
- Brand: Brand name
- Currency: Currency code (e.g., USD)
- Features: Product features
Category Information: Category name, ID, URL, description, and banner image

Quick Start

Prerequisites

Python 3.7 or higher
pip package manager (for Python) or npm (for Node.js)

Installation

Install required dependencies:

pip install selenium beautifulsoup4 requests

Get your ScrapeOps API key from https://scrapeops.io/app/register/ai-builder
Update the API key in the scraper:

API_KEY = "YOUR-API-KEY"  # Replace with your ScrapeOps API key

Running the Scraper

Navigate to the scraper directory:

cd python/selenium/product_data

Edit the URLs in scraper/target.com_scraper_product_v1.py:

if __name__ == "__main__":
    urls = [
        "https://www.target.com/p/gioberti-men-s-long-sleeve-brushed-flannel-plaid-checkered-shirt-with-corduroy-contrast/-/A-93271805?preselect=93271827#lnk=sametab",
    ]

Run the scraper:

python scraper/target.com_scraper_product_v1.py

The scraper will generate a timestamped JSONL file (e.g., target_com_product_data_page_scraper_data_20260114_120000.jsonl) containing all extracted data.

Example Output

See example/product.json for a sample of the extracted data structure.

Supported URLs

This scraper supports target product data page URLs:

https://www.target.com
https://www.target.com/p/gioberti-men-s-long-sleeve-brushed-flannel-plaid-checkered-shirt-with-corduroy-contrast/-/A-93271805?preselect=93271827#lnk=sametab

Configuration

Scraper Parameters

The scraper supports several configuration options. See the scraper code for available parameters.

ScrapeOps Configuration

The scraper can use ScrapeOps for anti-bot protection and request optimization:

API_KEY = "YOUR-API-KEY"  # Your ScrapeOps API key

payload = {
    "api_key": API_KEY,
    "url": url,
    "optimize_request": True,  # Enables request optimization
}

ScrapeOps Features:

Proxy rotation (may help reduce IP blocking)
Request header optimization (can help reduce detection)
Rate limiting management
Note: CAPTCHA challenges may occur depending on site behavior and cannot be guaranteed to be resolved automatically

Output Schema

The scraper outputs data in JSONL format (one JSON object per line). Each object contains:

Field	Type	Description	Example
`aggregateRating`	object	Aggregated rating information	`Object with 4 fields`
`availability`	string	Product availability status	`in_stock`
`brand`	string	Brand name	`GIOBERTI`
`category`	string	Category information	`Casual Button Down Shirts`
`currency`	string	Currency code (e.g., USD)	`USD`
`description`	string	Description or details	`Shop Gioberti Men's 100% Cotton Brushed Flannel Pl...`
`features`	array	Product features	`True to Size`
`images`	array	Image URL	`Array of objects (see example)`
`name`	string	Name or title	`Gioberti Men's 100% Cotton Brushed Flannel Plaid C...`
`preDiscountPrice`	number	Original price before discount	`49.99`
`price`	number	Current price	`23.99`
`productId`	string	Unique product identifier	`93271827`
`reviews`	array	Review data	`[]`
`seller`	object	Seller information	`Object with 3 fields`
`serialNumbers`	array	Serial number information	`Array of objects (see example)`
`specifications`	array	Product specifications	`[]`
`url`	string	URL or link to the resource	`https://www.target.com/p/gioberti-men-s-100-cotton...`
`videos`	array	Unique identifier	`Array of objects (see example)`

Field Descriptions

The scraper outputs data in JSONL format (one JSON object per line). Each object contains the fields listed in the table above. See example/product.json for a complete example.

Product/Listing Fields:

aggregateRating (object): Aggregated rating information
availability (string): Product availability status
brand (string): Brand name
currency (string): Currency code (e.g., USD)
features (array): Product features
images (array): Image URL
preDiscountPrice (number): Original price before discount
price (number): Current price
productId (string): Unique product identifier
reviews (array): Review data
seller (object): Seller information
specifications (array): Product specifications
videos (array): Unique identifier
Category Fields:
- category (string): Category information
Metadata Fields:
- description (string): Description or details
- url (string): URL or link to the resource
Other Fields:
- name (string): Name or title
- serialNumbers (array): Serial number information
  Anti-Bot Protection
  
  This scraper can integrate with ScrapeOps to help handle target's anti-bot measures:
  
  Why ScrapeOps?
  
  target may employ various anti-scraping measures including:
  - Rate limiting and IP blocking
  - Browser fingerprinting
  - CAPTCHA challenges (may occur depending on site behavior)
  - JavaScript rendering requirements
  - Request pattern analysis
  ScrapeOps Integration
  
  The scraper can use ScrapeOps proxy service which may provide:
  1. Proxy Rotation: May help distribute requests across multiple IP addresses
  2. Request Optimization: May optimize headers and request patterns to reduce detection
  3. Retry Logic: Built-in retry mechanism with exponential backoff
  Note: Anti-bot measures vary by site and may change over time. CAPTCHA challenges may occur and cannot be guaranteed to be resolved automatically. Using proxies and browser automation can help reduce blocking, but effectiveness depends on the target site's specific anti-bot measures.
  
  Getting Started with ScrapeOps
  1. Sign up for a free account at https://scrapeops.io/app/register/ai-builder
  2. Get your API key from the dashboard
  3. Replace YOUR-API-KEY in the scraper code
  4. The scraper can use ScrapeOps for requests (if configured)
  Free Tier: ScrapeOps offers a generous free tier perfect for testing and small-scale scraping.
  
  How It Works
  
  The scraper uses Selenium to navigate to target.com pages in a browser, wait for content to load, and extract structured data using CSS selectors and DOM parsing. The extracted data is normalized and saved in JSONL format for efficient processing.
  
  Error Handling & Troubleshooting
  
  1. No Data Extracted
  
  Symptoms: Scraper runs but produces empty output files.
  
  Solutions:
  - Verify the URL format is correct
  - Check if the page requires JavaScript rendering
  - Ensure your ScrapeOps API key is valid
  - Check network connectivity
  2. Rate Limiting / Blocked Requests
  
  Symptoms: HTTP 429 errors or empty responses.
  
  Solutions:
  - Reduce concurrency settings
  - Increase delay between requests
  - Verify ScrapeOps API key has sufficient credits
  3. Parsing Errors
  
  Symptoms: Errors in extraction logic or missing fields.
  
  Solutions:
  - The site may have updated their HTML structure
  - Check if selectors need updating
  - Review the actual HTML structure of the target page
  Debugging
  
  Enable detailed logging:
```
logging.basicConfig(level=logging.DEBUG)  # Change from INFO to DEBUG
```
  This will show:
  - Request URLs and responses
  - Extraction steps
  - Parsing errors
  - Retry attempts
  Retry Logic
  
  The scraper includes retry logic with configurable retry attempts and exponential backoff.
  
  Alternative Implementations
  
  This repository provides multiple implementations for scraping target Product Data pages:
  
  Python Implementations
  - BeautifulSoup - BeautifulSoup implementation
  - Playwright - Playwright implementation
  Node.js Implementations
  - Cheerio & Axios - Cheerio & Axios implementation
  - Playwright - Playwright implementation
  - Puppeteer - Puppeteer implementation
  Choosing the Right Implementation
  
  Use BeautifulSoup/Cheerio when:
  - You need fast, lightweight scraping
  - JavaScript rendering is not required
  - You want minimal dependencies
  - You're scraping simple HTML pages
  Use Playwright or Selenium when:
  - Pages require JavaScript rendering
  - You need to interact with dynamic content
  - You need to handle complex anti-bot measures
  - You want to simulate real browser behavior
  Performance Considerations
  
  Concurrency
  
  The scraper supports concurrent requests. See the scraper code for configuration options.
  
  Recommendations:
  - Start with minimal concurrency for testing
  - Gradually increase based on your ScrapeOps plan limits
  - Monitor for rate limiting or blocking
  Output Format
  
  Data is saved in JSONL format (one JSON object per line):
  - Efficient for large datasets
  - Easy to process line-by-line
  - Can be imported into databases or data processing tools
  - Each line is a complete, valid JSON object
  Memory Usage
  
  The scraper processes data incrementally:
  - Products are written to file immediately after extraction
  - No need to load entire dataset into memory
  - Suitable for scraping large pages
  Best Practices
  1. Respect Rate Limits: Use appropriate delays and concurrency settings
  2. Monitor ScrapeOps Usage: Track your API usage in the ScrapeOps dashboard
  3. Handle Errors Gracefully: Implement proper error handling and logging
  4. Validate URLs: Ensure URLs are valid target pages before scraping
  5. Update Selectors: target may change HTML structure; update selectors as needed
  6. Test Regularly: Test scrapers regularly to catch breaking changes early
  Support & Resources
  - ScrapeOps Documentation: https://scrapeops.io/docs
  - Framework Documentation: See framework-specific documentation
  - Example Output: See example/product.json for sample data structure
  - Scraper Code: See scraper/target.com_scraper_product_v1.py for implementation details
  License
  
  This scraper is provided as-is for educational and commercial use. Please ensure compliance with target's Terms of Service and robots.txt when using this scraper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Target Product Data Scraper - Selenium (Python)

Table of Contents

What This Scraper Extracts

Quick Start

Prerequisites

Installation

Running the Scraper

Example Output

Supported URLs

Configuration

Scraper Parameters

ScrapeOps Configuration

Output Schema

Field Descriptions

Anti-Bot Protection

Why ScrapeOps?

ScrapeOps Integration

Getting Started with ScrapeOps

How It Works

Error Handling & Troubleshooting

Debugging

Retry Logic

Alternative Implementations

Python Implementations

Node.js Implementations

Choosing the Right Implementation

Performance Considerations

Concurrency

Output Format

Memory Usage

Best Practices

Support & Resources

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Target Product Data Scraper - Selenium (Python)

Table of Contents

What This Scraper Extracts

Quick Start

Prerequisites

Installation

Running the Scraper

Example Output

Supported URLs

Configuration

Scraper Parameters

ScrapeOps Configuration

Output Schema

Field Descriptions

Anti-Bot Protection

Why ScrapeOps?

ScrapeOps Integration

Getting Started with ScrapeOps

How It Works

Error Handling & Troubleshooting

Debugging

Retry Logic

Alternative Implementations

Python Implementations

Node.js Implementations

Choosing the Right Implementation

Performance Considerations

Concurrency

Output Format

Memory Usage

Best Practices

Support & Resources

License