A TypeScript-based web scraper designed to crawl and extract music data from various music websites for the Melodyo platform.
Melodyo Scrapper is a monorepo project that provides both CLI and automated cron-based scraping capabilities. It's built with TypeScript and uses a modular architecture with shared packages for API communication and utility functions.
This is a monorepo managed with pnpm workspaces and Turborepo, organized as follows:
melodyo-scrapper/
βββ apps/
β βββ cli/ # Command-line interface application
β βββ cron/ # Automated cron job scraper
βββ packages/
β βββ api/ # API client and communication layer
β βββ eslint/ # Shared ESLint configuration
β βββ utils/ # Shared utility functions
βββ docker/ # Docker configuration files
βββ [config files] # Root-level configuration
- CLI Application: Interactive command-line tool for manual scraping operations
- Cron Jobs: Automated scheduled scraping tasks
- Modular Architecture: Shared packages for reusability
- Docker Support: Containerized deployment with Docker and Docker Compose
- TypeScript: Fully typed codebase for better developer experience
- Testing: Jest-based testing framework
- Linting: ESLint configuration for code quality
- Runtime: Node.js >= 14.0.0
- Language: TypeScript
- Package Manager: pnpm
- Build Tool: Turborepo
- CLI Framework: oclif
- Testing: Jest
- Archive Handling: 7zip, unrar (for extracting downloaded files)
- Node.js >= 14.0.0
- pnpm (automatically installed if using Docker)
- Clone the repository:
git clone https://github.com/melodyo/melodyo-scrapper.git
cd melodyo-scrapper- Install dependencies:
pnpm install- Set up environment variables:
cp .env.example .env- Configure your
.envfile with the required values:
API_URL= # Your API endpoint URL
COOKIE= # Authentication cookie
ACCESS_TOKEN= # API access token
CRON_END_PAGE= # End page number for scraping
CRON_START_PAGE= # Start page number for scrapingBuild all packages:
pnpm buildRun in development mode:
pnpm devRun tests across all packages:
pnpm testCheck code quality:
pnpm lintThe CLI application provides interactive commands for scraping operations. After building:
pnpm --filter @scrapper/cli build
# Use the scrapper CLI toolThe cron application runs automated scraping tasks on a schedule:
pnpm --filter @scrapper/cron build
pnpm --filter @scrapper/cron start-
Ensure your
.envfile is configured -
Build and start the container:
docker-compose up -d- View logs:
docker-compose logs -f- Stop the container:
docker-compose down- Base Image: Node.js 18 slim
- Archive Tools: Includes p7zip-full and unrar for extracting compressed files
- Persistent Storage: Downloads are stored in a Docker volume at
/app/apps/cron/downloads - Production Optimized: Multi-stage build for smaller image size
Command-line interface built with oclif framework for manual scraping operations.
Automated scheduler using node-cron for periodic scraping tasks.
Shared API client for communicating with the Melodyo backend.
Common utility functions and helpers used across applications.
Shared ESLint configuration for consistent code style.
- TypeScript Config:
tsconfig.jsonat root with shared settings - Jest Config:
jest.config.tsfor testing configuration - Turbo Config:
turbo.jsonfor build pipeline optimization - Editor Config:
.editorconfigfor consistent code formatting
The project includes a .gitlab-ci.yml configuration for GitLab CI/CD pipelines.
ISC
HamidNE
- Email: hamidne@mail.ru
- GitHub: @hamidne
- Repository: https://github.com/melodyo/melodyo-scrapper
- Docker Registry: registry.gitlab.com/melodyo/melodyo-scrapper
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Current Version: 1.13.1
Note: This scraper is designed specifically for the Melodyo platform. Make sure you have proper authorization and comply with the terms of service of the websites you're scraping.