Skip to content

manuelandersen/padel-scrapy

Repository files navigation

padel-scrapy

This is a project to scrap data from the International Padel Federation page. It collects players, tourneys and games information into JSON files.

Installation

  1. Clone the repository
git clone https://github.com/manuelandersen/padel-scrapy.git
cd padel-scrapy
  1. Create a virtual environment (optional but recommended):
python3 -m venv venv
source venv/bin/activate
  1. Install the dependencies:
pip install -r requirements.txt

Running the spiders

# you need to be inside the padelscraper directory
cd padelscraper

# to run player spider
scrapy crawl playerspider 

# to run tournament spider
scrapy crawl tournamentspider

# to run games spider you need to give it a url and the numbers of days played
# this info can be obtained from the tournamentspider results
scrapy crawl gamespider -a start_url="the_star_url" -a days_played=days_played

# if you want to store the json file 
scrapy crawl playerspider -O path_to_file.json

If you prefer not to create a virtual environment, you can use Docker instead.

# to build the containe
docker build -t scrapy-project .

# to run one of the spiders
docker run scrapy-project scrapy crawl tournamentspider

Examples

Examples of the way the data look for each spider can be found in the examples folder.

Contributions

We welcome contributions to improve and expand this project! Whether you're fixing a bug, adding a new feature, or improving documentation, your help is appreciated.

About

🕸️ Collects data from padelfip website

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors