Skip to content

matte-realize/nutcs-data-service

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NUTCS Data Service

This is the scraper used to scrape the Northeastern University Transfer Credits website. Using Selenium, the data has been scraped into JSON format and organized with institutions with course data and institutions without course data. The data is then converted using sqlalchemy and normalized using pandas into PostgreSQL data. Docker is used to containerize the data set. SQL files are used to query out the data.

Scraper

To run the scraper, use the terminal and run:

python pipeline/scraper.py

In PyCharm, click the play button to run the scraper.

Docker setup and SQL conversion

Before setting up Docker containers, change the parameters of .env.example to suit your PostgreSQL database. To set up the Docker containers, run the following command in terminal:

docker compose up -d

To run the conversion, run the following command in terminal:

python pipeline/sql_conv.py

Connect to the database

To connect to the database, we should set up our connection with these settings:

Host: localhost
Port: your_local_host_port
Database: your_database_name
User: postgres
Password: your_password

Make sure public schema is checked off.

Debugging

To complete remove the data from the database, run the following commands inside terminal.

docker compose down -v
docker system prune -f 
docker volume prune -f

About

A data service that has scraped the transfer credit courses within Northeastern University, placing it in JSON format to be converted into SQL data for the backend of NUTCS.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors