Skip to content

CLI tool for extracting property listings from Puerto Rico real estate portals — Playwright + BeautifulSoup

License

Notifications You must be signed in to change notification settings

Edioff/CLI-Web-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CLI Web Scraper — Puerto Rico Real Estate

Python Playwright License

Command-line tool for extracting property listings from Puerto Rican real estate portals. Built with Playwright and BeautifulSoup for reliable data extraction.

Overview

Scrapes property listings (houses and apartments) from Reality Realty PR, extracting detailed property information and saving it as structured JSON. Designed as a CLI tool with configurable parameters for property type, pagination, and output.

Features

  • CLI interface — Simple command-line usage with configurable parameters
  • Property types — Houses and apartments
  • Structured output — Clean JSON with URL, title, city, price, description, images
  • Playwright rendering — Handles JavaScript-rendered content
  • Flyer generation — Includes links to property flyers

Output Format

[
  {
    "url": "https://www.realityrealtypr.com/compra-venta/casa/puerto-rico/cayey/...",
    "title": "Bo. Toita",
    "city": "Cayey",
    "price": "Venta: $140,000",
    "description": "Propiedad de 2 niveles con 4 habitaciones...",
    "images": [
      "https://s3.amazonaws.com/app-propiedades/166209/1_large.jpg"
    ],
    "flyer": "https://www.realityrealtypr.com/properties/print/id:166209/"
  }
]

Tech Stack

Python Playwright BeautifulSoup

  • Playwright — Browser automation for JS-rendered pages
  • BeautifulSoup + lxml — HTML parsing
  • Rich — Terminal UI formatting
  • Poetry — Dependency management

Installation

git clone https://github.com/Edioff/CLI-Web-Scraper.git
cd CLI-Web-Scraper
poetry install
poetry run playwright install --with-deps

Usage

poetry run python scraper.py <HOUSE|APARTMENT> <page_number> <output_file>

Examples

# Scrape first page of houses
poetry run python scraper.py HOUSE 0 houses.json

# Scrape apartments page 3
poetry run python scraper.py APARTMENT 3 apartments.json

Notes

  • Designed for educational purposes
  • Scrapes one page per execution
  • Respect the target site's Terms of Service

Author

Johan Cruz — Data Engineer & Web Scraping Specialist

  • GitHub: @Edioff
  • Available for freelance projects

License

MIT

About

CLI tool for extracting property listings from Puerto Rico real estate portals — Playwright + BeautifulSoup

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages