This actor gives you a clean, ready-to-extend TypeScript template for building high-speed web scrapers using Crawleeβs CheerioCrawler. Itβs built for developers who want a lightweight, predictable foundation for crawling websites and extracting structured data.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for TypeScript-Crawlee-CheerioCrawler you've just found your team β Let's Chat. ππ
The Mosaddik Actor is essentially a starter kit: it wires together Crawlee, Cheerio, Apifyβs storage, and an input schema so you can focus on scraping logic instead of boilerplate. You supply the URLs. The crawler loads them, parses the HTML with Cheerio, and stores whatever data you extractβtitles by default, but customizable to anything on the page.
Itβs simple, fast, and TypeScript-first. The actor handles request routing, crawling limits, dataset outputs, and error logs so you can scale up without rewriting core logic.
| Feature | Description |
|---|---|
| Crawlee + CheerioCrawler | Efficient crawling with HTML parsing built in. |
| TypeScript Architecture | Strong typing and maintainable structure. |
| Modular Input Schema | Easily define start URLs and limits. |
| Dataset Integration | Outputs clean, normalized records. |
| Extendable Logic | Swap βscrape titleβ for any custom selectors. |
Default behavior captures page titles and URLs, but the template is flexible enough for any structure:
| Field Name | Field Description |
|---|---|
| title | Content of the <title> tag or any selectable heading. |
| url | The crawled page URL. |
| metadata | Optional added fields depending on your custom selectors. |
You can expand it to scrape paragraphs, tables, images, product specs, contact info, listings, or even category pagination.
[
{
"title": "Welcome to Example",
"url": "https://example.com"
},
{
"title": "About Us β Example",
"url": "https://example.com/about"
}
]
Mosaddik Actor/
βββ src/
β βββ main.ts
β βββ crawler/
β β βββ cheerio_crawler.ts
β βββ handlers/
β β βββ request_handler.ts
β βββ utils/
β β βββ logger.ts
β βββ config/
β βββ input.schema.json
βββ data/
β βββ sample_input.json
β βββ sample_output.json
βββ package.json
βββ README.md
- Lead generation scrapers that collect names, emails, or business details.
- SEO tools extracting headings, metadata, and page structure.
- Content aggregation for blogs, news, or research portals.
- Product discovery scraping catalogs or listing pages.
- Academic and research crawlers collecting public-domain documents.
Is this actor ready to scrape dynamic pages?
CheerioCrawler handles static HTML. For dynamic pages, you can switch to PlaywrightCrawler within the same structure.
How many pages can I crawl?
The maxPagesPerCrawl input controls the limitβset it high for large sites or low for quick tests.
Can I add login or cookies?
YesβCrawlee supports custom headers, sessions, and authentication flows.
Does it support pagination?
You can enqueue new links inside the requestHandler by pushing them into the crawlerβs RequestQueue.
Primary Metric:
Fast HTML parsing suitable for high-volume crawling.
Reliability Metric:
Consistent extraction across simple and moderately complex HTML layouts.
Efficiency Metric:
Low memory footprint due to Cheerioβs lightweight parsing.
Quality Metric:
Delivers predictable, structured datasets once selectors are tuned.
