EmailAuthorPrediction

For the task of prediction of author from emails, we used Unigram language model. We started out on the problem by finding out the features that would help model the solution. The features that looked important were: • N-grams of the email • Frequency of each N-gram • Out of Vocabulary words (Spelling mistakes) The combination of first two features describes how the particular author chooses his dictionary set for writing text. Therefore, this feature can be termed as the signature of the author as all writers tend to choose only words from some defined subset of the Vocabulary. Also, the out of vocabulary words, generally the spelling mistakes done by the author, depict the style of the writing text, and therefore, comes to be an important aspect of the solution. The solution, thus, comes to be finding the total probability of each Ngram to be written by the particular author in the email.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Dataset1_fbis		Dataset1_fbis
Dataset2_wsj		Dataset2_wsj
Dataset3		Dataset3
Dataset4		Dataset4
EnronDataset		EnronDataset
Zipf		Zipf
README.md		README.md
Readme.docx		Readme.docx
Report.pdf		Report.pdf
email.py		email.py
extension.py		extension.py
task_four.sh		task_four.sh
task_four_1.py		task_four_1.py
task_four_2.py		task_four_2.py
task_one.py		task_one.py
task_three.py		task_three.py
task_two.py		task_two.py
temp.py		temp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EmailAuthorPrediction

About

Uh oh!

Releases

Packages

Languages

rahularora/EmailAuthorPrediction

Folders and files

Latest commit

History

Repository files navigation

EmailAuthorPrediction

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages