Melodyo Scrapper

A TypeScript-based web scraper designed to crawl and extract music data from various music websites for the Melodyo platform.

📋 Description

Melodyo Scrapper is a monorepo project that provides both CLI and automated cron-based scraping capabilities. It's built with TypeScript and uses a modular architecture with shared packages for API communication and utility functions.

🏗️ Project Structure

This is a monorepo managed with pnpm workspaces and Turborepo, organized as follows:

melodyo-scrapper/
├── apps/
│   ├── cli/          # Command-line interface application
│   └── cron/         # Automated cron job scraper
├── packages/
│   ├── api/          # API client and communication layer
│   ├── eslint/       # Shared ESLint configuration
│   └── utils/        # Shared utility functions
├── docker/           # Docker configuration files
└── [config files]    # Root-level configuration

✨ Features

CLI Application: Interactive command-line tool for manual scraping operations
Cron Jobs: Automated scheduled scraping tasks
Modular Architecture: Shared packages for reusability
Docker Support: Containerized deployment with Docker and Docker Compose
TypeScript: Fully typed codebase for better developer experience
Testing: Jest-based testing framework
Linting: ESLint configuration for code quality

🔧 Tech Stack

Runtime: Node.js >= 14.0.0
Language: TypeScript
Package Manager: pnpm
Build Tool: Turborepo
CLI Framework: oclif
Testing: Jest
Archive Handling: 7zip, unrar (for extracting downloaded files)

📦 Installation

Prerequisites

Node.js >= 14.0.0
pnpm (automatically installed if using Docker)

Local Setup

Clone the repository:

git clone https://github.com/melodyo/melodyo-scrapper.git
cd melodyo-scrapper

Install dependencies:

pnpm install

Set up environment variables:

cp .env.example .env

Configure your .env file with the required values:

API_URL=                  # Your API endpoint URL
COOKIE=                   # Authentication cookie
ACCESS_TOKEN=             # API access token
CRON_END_PAGE=           # End page number for scraping
CRON_START_PAGE=         # Start page number for scraping

🚀 Usage

Development

Build all packages:

pnpm build

Run in development mode:

pnpm dev

Testing

Run tests across all packages:

pnpm test

Linting

Check code quality:

pnpm lint

CLI Usage

The CLI application provides interactive commands for scraping operations. After building:

pnpm --filter @scrapper/cli build
# Use the scrapper CLI tool

Cron Application

The cron application runs automated scraping tasks on a schedule:

pnpm --filter @scrapper/cron build
pnpm --filter @scrapper/cron start

🐳 Docker Deployment

Build and Run with Docker Compose

Ensure your .env file is configured
Build and start the container:

docker-compose up -d

View logs:

docker-compose logs -f

Stop the container:

docker-compose down

Docker Image Features

Base Image: Node.js 18 slim
Archive Tools: Includes p7zip-full and unrar for extracting compressed files
Persistent Storage: Downloads are stored in a Docker volume at /app/apps/cron/downloads
Production Optimized: Multi-stage build for smaller image size

📁 Packages

@scrapper/cli

Command-line interface built with oclif framework for manual scraping operations.

@scrapper/cron

Automated scheduler using node-cron for periodic scraping tasks.

@scrapper/api

Shared API client for communicating with the Melodyo backend.

@scrapper/utils

Common utility functions and helpers used across applications.

@scrapper/eslint-config

Shared ESLint configuration for consistent code style.

🛠️ Development

Project Configuration

TypeScript Config: tsconfig.json at root with shared settings
Jest Config: jest.config.ts for testing configuration
Turbo Config: turbo.json for build pipeline optimization
Editor Config: .editorconfig for consistent code formatting

CI/CD

The project includes a .gitlab-ci.yml configuration for GitLab CI/CD pipelines.

📝 License

ISC

👤 Author

HamidNE

Email: hamidne@mail.ru
GitHub: @hamidne

🔗 Links

Repository: https://github.com/melodyo/melodyo-scrapper
Docker Registry: registry.gitlab.com/melodyo/melodyo-scrapper

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📊 Version

Current Version: 1.13.1

Note: This scraper is designed specifically for the Melodyo platform. Make sure you have proper authorization and comply with the terms of service of the websites you're scraping.

Name		Name	Last commit message	Last commit date
Latest commit History 415 Commits
apps		apps
docker		docker
packages		packages
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env.example		.env.example
.eslintignore		.eslintignore
.eslintrc		.eslintrc
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.npmrc		.npmrc
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
jest.config.ts		jest.config.ts
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.json		tsconfig.json
turbo.json		turbo.json

License

melodyo/melodyo-scrapper

Folders and files

Latest commit

History

Repository files navigation