-
Notifications
You must be signed in to change notification settings - Fork 26
Example datasets
- Topic Models Email List Archive
- Coursera PGM Video Transcripts
- AP Articles
- NSF Grants
- New York Times
# Topic Models Email List Archive
This dataset consists of 1887 email messages from the Topic-models mailing list archive between September 2006 and May 2012. The quoted text in response emails has been (mostly) scrubbed by removing all lines that begin with '>'. Furthermore, we have (mostly) removed signatures by removing all text that follows a sequence of dashes, e.g.
---
John Smith
# Coursera PGM Video Transcripts
This dataset consists of the 92 video transcripts from Coursera's free Probabilistic Graphical Models course.
# AP Articles
This dataset consists of 1085 Associated Press articles taken randomly from the 2046 AP articles provided as sample data for Dave Blei's LDA implementation.
# NSF Grants
This dataset consists of a random subset 1166 NSF Grant abstracts from the NSF Research Awards Corpus between 2000-2003.
# New York Times
This dataset consists of 845 semi-processed New York Times articles. Dataset taken from David Newman's Topic Modeling Tool. This dataset has been removed from the online version.