Command-line tool for extracting property listings from Puerto Rican real estate portals. Built with Playwright and BeautifulSoup for reliable data extraction.
Scrapes property listings (houses and apartments) from Reality Realty PR, extracting detailed property information and saving it as structured JSON. Designed as a CLI tool with configurable parameters for property type, pagination, and output.
- CLI interface — Simple command-line usage with configurable parameters
- Property types — Houses and apartments
- Structured output — Clean JSON with URL, title, city, price, description, images
- Playwright rendering — Handles JavaScript-rendered content
- Flyer generation — Includes links to property flyers
[
{
"url": "https://www.realityrealtypr.com/compra-venta/casa/puerto-rico/cayey/...",
"title": "Bo. Toita",
"city": "Cayey",
"price": "Venta: $140,000",
"description": "Propiedad de 2 niveles con 4 habitaciones...",
"images": [
"https://s3.amazonaws.com/app-propiedades/166209/1_large.jpg"
],
"flyer": "https://www.realityrealtypr.com/properties/print/id:166209/"
}
]- Playwright — Browser automation for JS-rendered pages
- BeautifulSoup + lxml — HTML parsing
- Rich — Terminal UI formatting
- Poetry — Dependency management
git clone https://github.com/Edioff/CLI-Web-Scraper.git
cd CLI-Web-Scraper
poetry install
poetry run playwright install --with-depspoetry run python scraper.py <HOUSE|APARTMENT> <page_number> <output_file># Scrape first page of houses
poetry run python scraper.py HOUSE 0 houses.json
# Scrape apartments page 3
poetry run python scraper.py APARTMENT 3 apartments.json- Designed for educational purposes
- Scrapes one page per execution
- Respect the target site's Terms of Service
Johan Cruz — Data Engineer & Web Scraping Specialist
- GitHub: @Edioff
- Available for freelance projects
MIT