Teepublic Scraper lets you collect structured product data from any Teepublic product URL at scale. It turns raw product pages into clean JSON containing titles, prices, and all associated images. Use it to power market research, inventory sync, or pricing intelligence wherever accurate Teepublic product data is needed.
Designed for reliability and high throughput, this Teepublic scraper handles large batches of URLs with smart chunking and concurrency controls, so you can focus on analysis instead of manual copying.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for teepublic-scraper you've just found your team — Let’s Chat. 👆👆
Teepublic Scraper is a lightweight service that accepts a list of Teepublic product URLs and returns a structured JSON array of product details. It removes the need for manual copy-paste or brittle one-off scripts and replaces them with a repeatable, production-ready workflow.
This project is ideal for:
- Developers who need a simple, URL-based API to fetch Teepublic product data.
- Marketers and analysts who want to monitor product performance, pricing, and visual assets.
- Businesses that rely on print-on-demand product data for catalog management, competitive tracking, or automation.
By focusing on a narrow, well-defined task—scraping Teepublic product pages—this scraper delivers stable performance, predictable memory usage, and clear, documented output.
- Accepts a JSON list of Teepublic product URLs as the only required input.
- Extracts key product information such as title, price, and all available image URLs.
- Supports configurable concurrency and chunk size to balance speed and memory usage.
- Scales from a handful of URLs to thousands in a single run with measurable performance.
- Provides a consistent JSON schema that can plug directly into dashboards, pipelines, or databases.
| Feature | Description |
|---|---|
| Simple URL-based input | Provide a JSON array of Teepublic product URLs and let the scraper handle everything else. |
| Detailed product output | Collects product titles, prices, and all associated images for each product page. |
| Batch processing support | Handles large lists of URLs using chunking and concurrency for efficient processing. |
| Memory tuning | Configure memory limits via query parameters to support heavy workloads when needed. |
| Concurrency control | Adjust how many URLs are processed in parallel to match your infrastructure capacity. |
| Robust and durable | Built specifically for Teepublic product pages, making it more reliable than generic scrapers. |
| JSON-native workflow | Input and output are both JSON, making it easy to integrate with scripts, APIs, and data tools. |
| Performance-focused defaults | Sensible defaults (4 GB memory, parallel processing) for most scraping scenarios. |
| Field Name | Field Description |
|---|---|
| url | The original Teepublic product URL that was processed. |
| title | Human-readable product title as displayed on the product page. |
| price | The current product price as a formatted string (e.g., "$20.00"). |
| price_numeric | The product price converted into a numeric value for easier calculations. |
| currency | Currency code inferred from the Teepublic page (e.g., "USD"). |
| images | Array of direct image URLs for the product (all main and variant images). |
| thumbnail | Primary thumbnail image URL used as the main visual for the product. |
| product_id | Unique identifier for the product, derived from the URL or page markup. |
| tags | List of tags or keywords associated with the design, when available. |
| category | Product category or type (e.g., "T-Shirt", "Hoodie"), if present. |
| scraped_at | Timestamp (ISO 8601) indicating when the product was scraped. |
| raw_html_snapshot | Optional field containing a minimal snapshot or reference for debugging (can be disabled). |
Example:
[
{
"url": "https://www.teepublic.com/t-shirt/12345678",
"title": "Funny Cat Meme T-Shirt - Perfect Gift for Cat Lovers",
"images": [
"https://images.teepublic.com/t-shirt-12345678-1.jpg",
"https://images.teepublic.com/t-shirt-12345678-2.jpg"
],
"thumbnail": "https://images.teepublic.com/t-shirt-12345678-1.jpg",
"price": "$20.00",
"price_numeric": 20.0,
"currency": "USD",
"product_id": "12345678",
"tags": [
"cat",
"meme",
"funny",
"gift"
],
"category": "T-Shirt",
"scraped_at": "2025-01-10T12:34:56.000Z"
}
]
teepublic-scraper (IMPORTANT :!! always keep this name as the name of the apify actor !!! Teepublic Scraper)/
├── src/
│ ├── main.ts
│ ├── crawler/
│ │ ├── teepublicClient.ts
│ │ ├── pageFetcher.ts
│ │ └── rateLimiter.ts
│ ├── parsers/
│ │ ├── productParser.ts
│ │ └── priceNormalizer.ts
│ ├── utils/
│ │ ├── logger.ts
│ │ ├── chunker.ts
│ │ └── validation.ts
│ └── config/
│ ├── defaults.ts
│ └── schema.json
├── data/
│ ├── input.sample.json
│ └── output.sample.json
├── tests/
│ ├── main.test.ts
│ └── productParser.test.ts
├── Dockerfile
├── package.json
├── tsconfig.json
├── .env.example
└── README.md
- E-commerce analysts use it to collect Teepublic product titles, prices, and images, so they can track competitors and identify trending designs across categories.
- Print-on-demand store owners use it to enrich their internal catalogs with Teepublic product data, so they can compare pricing and positioning against similar designs.
- Market researchers use it to pull large samples of Teepublic products, so they can analyze themes, niches, and pricing strategies over time.
- Automation engineers integrate it into data pipelines, so they can keep product snapshots updated on a schedule without manual scraping.
- Content creators and agencies use it to quickly review visual assets and product details, so they can curate inspiration boards or proposal decks.
Q1: What input format does the Teepublic Scraper require?
The scraper expects a JSON object with a single required field named urls, which should be an array of Teepublic product URLs. For example:
{
"urls": [
"https://www.teepublic.com/t-shirt/12345678",
"https://www.teepublic.com/t-shirt/98765432"
],
"concurrency": 20,
"chunk": 200
}
Q2: How do concurrency and chunk settings affect performance?
concurrency controls how many URLs are processed at the same time within a single chunk, while chunk defines how many URLs are grouped together into a batch. Higher values increase throughput but also raise memory usage and potential strain on your infrastructure. For most workloads, a concurrency of 10–20 and chunks of 100–200 URLs provide a solid balance.
Q3: Can I scrape more fields than title, images, and price? Yes. The core schema focuses on title, price, and images, but the parser is designed to be extendable. You can customize it to extract additional fields—such as tags, description, category, or color/size variants—by modifying the parser layer without changing the input contract.
Q4: What happens if a URL is invalid or a product no longer exists? Invalid or unavailable URLs are handled gracefully. The scraper records an error for that entry (including the URL and a short message) while continuing with the rest of the batch. This ensures that a single problematic URL does not interrupt the entire scraping run.
Primary Metric: On a typical configuration with 4 GB of memory, processing 1,000 Teepublic product URLs completes in roughly 60 seconds, assuming stable network conditions and default concurrency settings.
Reliability Metric: Across large test batches, the scraper maintains a success rate of around 98–99% for reachable and valid product URLs, with retries applied to transient network errors.
Efficiency Metric:
With concurrency: 20 and chunk: 200, memory usage averages around 4.4 GB for 1,000 URLs. Scaling to larger batches is primarily a matter of increasing memory and adjusting chunk size to fit available resources.
Quality Metric: For well-formed Teepublic product pages, the scraper consistently retrieves 100% of main product images and nearly all visible pricing information, providing clean, deduplicated JSON suitable for downstream analytics and storage.
