Skip to content

melodyo/melodyo-scrapper

Repository files navigation

Melodyo Scrapper

A TypeScript-based web scraper designed to crawl and extract music data from various music websites for the Melodyo platform.

πŸ“‹ Description

Melodyo Scrapper is a monorepo project that provides both CLI and automated cron-based scraping capabilities. It's built with TypeScript and uses a modular architecture with shared packages for API communication and utility functions.

πŸ—οΈ Project Structure

This is a monorepo managed with pnpm workspaces and Turborepo, organized as follows:

melodyo-scrapper/
β”œβ”€β”€ apps/
β”‚   β”œβ”€β”€ cli/          # Command-line interface application
β”‚   └── cron/         # Automated cron job scraper
β”œβ”€β”€ packages/
β”‚   β”œβ”€β”€ api/          # API client and communication layer
β”‚   β”œβ”€β”€ eslint/       # Shared ESLint configuration
β”‚   └── utils/        # Shared utility functions
β”œβ”€β”€ docker/           # Docker configuration files
└── [config files]    # Root-level configuration

✨ Features

  • CLI Application: Interactive command-line tool for manual scraping operations
  • Cron Jobs: Automated scheduled scraping tasks
  • Modular Architecture: Shared packages for reusability
  • Docker Support: Containerized deployment with Docker and Docker Compose
  • TypeScript: Fully typed codebase for better developer experience
  • Testing: Jest-based testing framework
  • Linting: ESLint configuration for code quality

πŸ”§ Tech Stack

  • Runtime: Node.js >= 14.0.0
  • Language: TypeScript
  • Package Manager: pnpm
  • Build Tool: Turborepo
  • CLI Framework: oclif
  • Testing: Jest
  • Archive Handling: 7zip, unrar (for extracting downloaded files)

πŸ“¦ Installation

Prerequisites

  • Node.js >= 14.0.0
  • pnpm (automatically installed if using Docker)

Local Setup

  1. Clone the repository:
git clone https://github.com/melodyo/melodyo-scrapper.git
cd melodyo-scrapper
  1. Install dependencies:
pnpm install
  1. Set up environment variables:
cp .env.example .env
  1. Configure your .env file with the required values:
API_URL=                  # Your API endpoint URL
COOKIE=                   # Authentication cookie
ACCESS_TOKEN=             # API access token
CRON_END_PAGE=           # End page number for scraping
CRON_START_PAGE=         # Start page number for scraping

πŸš€ Usage

Development

Build all packages:

pnpm build

Run in development mode:

pnpm dev

Testing

Run tests across all packages:

pnpm test

Linting

Check code quality:

pnpm lint

CLI Usage

The CLI application provides interactive commands for scraping operations. After building:

pnpm --filter @scrapper/cli build
# Use the scrapper CLI tool

Cron Application

The cron application runs automated scraping tasks on a schedule:

pnpm --filter @scrapper/cron build
pnpm --filter @scrapper/cron start

🐳 Docker Deployment

Build and Run with Docker Compose

  1. Ensure your .env file is configured

  2. Build and start the container:

docker-compose up -d
  1. View logs:
docker-compose logs -f
  1. Stop the container:
docker-compose down

Docker Image Features

  • Base Image: Node.js 18 slim
  • Archive Tools: Includes p7zip-full and unrar for extracting compressed files
  • Persistent Storage: Downloads are stored in a Docker volume at /app/apps/cron/downloads
  • Production Optimized: Multi-stage build for smaller image size

πŸ“ Packages

@scrapper/cli

Command-line interface built with oclif framework for manual scraping operations.

@scrapper/cron

Automated scheduler using node-cron for periodic scraping tasks.

@scrapper/api

Shared API client for communicating with the Melodyo backend.

@scrapper/utils

Common utility functions and helpers used across applications.

@scrapper/eslint-config

Shared ESLint configuration for consistent code style.

πŸ› οΈ Development

Project Configuration

  • TypeScript Config: tsconfig.json at root with shared settings
  • Jest Config: jest.config.ts for testing configuration
  • Turbo Config: turbo.json for build pipeline optimization
  • Editor Config: .editorconfig for consistent code formatting

CI/CD

The project includes a .gitlab-ci.yml configuration for GitLab CI/CD pipelines.

πŸ“ License

ISC

πŸ‘€ Author

HamidNE

πŸ”— Links

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“Š Version

Current Version: 1.13.1


Note: This scraper is designed specifically for the Melodyo platform. Make sure you have proper authorization and comply with the terms of service of the websites you're scraping.

About

Melodyo crawler to scrap another music website

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published