This repository contains a simple Python project focused on web scraping tasks. The project includes several script files (task1.py, task2.py, task3.py) which demonstrate basic scraping techniques and data handling.
task1.py- Scrapes IMDb's compact list page for top movies, extracting title, ranking, year, rating, and link. Saves results tomovies.jsonto cache and avoid repeated downloads.task2.py- Importsscrape_top_listfrom task1 and groups the scraped movies by release year. Prints a dictionary where each key is a year and the value is the list of movies released that year.task3.py- Uses the same movie data and organizes it by decade. It computes decade boundaries from the minimum and maximum years and prints movies grouped under each ten‑year span.movies.json- JSON file used to store the list of movie dictionaries returned byscrape_top_list. This file acts as both input (when already present) and output of the scraping process.
-
Prerequisites
- Python 3.x installed on your system
- Recommended to use a virtual environment
-
Installation
python -m venv venv .\venv\Scripts\activate # On Windows pip install -r requirements.txt # if you have dependencies
-
Usage Run the scripts individually to perform different scraping tasks:
python task1.py python task2.py python task3.py
-
Data The
movies.jsonfile contains sample data used or generated by the scripts.
Feel free to fork the repository and submit pull requests for improvements or additional examples.
This project is provided under the MIT License. See LICENSE for details (not included by default).