GitHub - vrachnis/OpSysII: Repo for the Operating Systems II project

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
gr/upatras/ceid/romo		gr/upatras/ceid/romo
input		input
.gitignore		.gitignore
CliSearch.java		CliSearch.java
LICENSE		LICENSE
Makefile		Makefile
README		README

Repository files navigation

This is a map-reduce program for hadoop calculating the TF-IDF values
for every word in a set of input text files.

This was developed as a part of a school project.

after running `make jar`, to create the inverted index, run:
hadoop jar TfIdf.jar gr.upatras.ceid.romo.Index <input> <output> <title>

to create the tf-idf metrics, run:
hadoop jar TfIdf.jar gr.upatras.ceid.romo.Tf <input> <output> <title>

I hardcoded the number of reducers to 5 according to my system.
You might want to change it to suit your needs.