This represents a project given as a job application assignment by Adivare BV.
- Run pip install -r requirements.txt to install all the necessary packages
-
redis queue #1:
docker run -d -p 6379:6379 --name redis-redisjson redislabs/rejson:latest
-
redis queue #2:
docker run -d -p 6380:6379 --name redis-redisjson2 redislabs/rejson:latest
-
redis queue #3:
docker run -d -p 6381:6379 --name redis-redisjson3 redislabs/rejson:latest
-
PostgreSQL database:
docker run -p 5432:5432 -e POSTGRES_DB=postgres_database -e POSTGRES_USER=postgres_user -e POSTGRES_PASSWORD=postgres_password -d postgres
-
inject_data.py
- creates fake data
- injects the data on the pre-processing queue
- user inputs 1 for complete data and 0 for incomplete data (which is used later for testing the de-duplication process)
- offers logs about the data creation process
- the data can be found on the Redis Queue #1
-
check_data.py
- reads data from the previous queue
- validates and grades the data
- if necessary, de-duplicates data
- grades and pre-processed data (takes care of duplcated items as well)
- stores the data in the second Redis Queue
-
makemigrations
-
migrate
-
store_data
- reads data from Redis Queue #2
- creates the models (while validating the data once again at the same time)
- stores them in the PostgreSQL Database
-
createsuperuser
- required in order to get admin privileges
-
- The Database Models are present here
- Upon clicking on one of them, the user is greeted with the list of database entries of that model
- The entry can be modified or deleted
-
http://localhost:8000/grading_queue/
- Displays the items currently in the Grading Queue as HTML tables
-
http://localhost:8000/dedup_queue/
- Displays the items currently in the De-duplication Queue as HTML tables
-
http://localhost:8000/poison_queue/
- Displays the items currently in the Poison Queue as HTML tables
-
manage.py runserver: starts the Django server on the default port, localhost:8000/
-
manage.py read_json "file_name" : reads the json file and outputs the formatted data
-
manage.py read_xml "file_name" : reads the xml file and outputs the formatted data
- user: postgres_user
- password: postgres_password
- host: localhost
- port: 5432
- name: postgres_database