Semantic Search

The Task

The task has three parts -- data collection, data exploration / algorithm development, then finally predictive modeling.

Part 1 -- Collection

We want you to query the wikipedia API and collect all of the articles under the following wikipedia categories:

We want your code to be modular enough that any valid category from Wikipedia can be queried by your code.

The results of the query should be written to PostgreSQL tables, page and category. You will also need to build some sort of reference between the pages and categories. Keep in mind that a page can have many categories and a category can have many pages so a straight foreign key arrangement will not work.

optional
Make it so that your code can be run via a python script e.g.

$ docker run -v `pwd`:/src python -m download #SOME_CATEGORY#

optional
Make it so that your code can query nested sub-categories e.g.

$ docker run -v `pwd`:/src python -m download #SOME_CATEGORY# #NESTING_LEVEL#

Part 2 -- Search

Use Latent Semantic Analysis to search your pages. Given a search query, find the top 5 related articles to the search query.

optional
Make it so that your code can be run via a python script e.g.

$ docker run -v `pwd`:/src python -m search #SOME_TERM#

Part 3 -- Predictive Model

In this part, we want you to build a predictive model from the data you've just indexed. Specifically, when a new article from wikipedia comes along, we would like to be able to predict what category the article should fall into. We expect a training script of some sort that is runnable and will estimate a model.

optional
Make it so that your code can be run via a python script e.g.

$ docker run -v `pwd`:/src python -m train

Finally, you should be able to pass the url of a wikipedia page and it will generate a prediction for the best category for that page, along with a probability of that being the correct category.

optional
Make it so that your code can be run via a python script e.g.

$ docker run -v `pwd`:/src python -m predict #URL#

Infrastructure

You may use the include docker-compose.yml file to build your project.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
docker		docker
ipynb		ipynb
lib		lib
src		src
.DS_Store		.DS_Store
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Search

The Task

Part 1 -- Collection

Part 2 -- Search

Part 3 -- Predictive Model

Infrastructure

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

petehwu/Semantic_Search

Folders and files

Latest commit

History

Repository files navigation

Semantic Search

The Task

Part 1 -- Collection

Part 2 -- Search

Part 3 -- Predictive Model

Infrastructure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages