Skip to content

Sylvia9628/nbdev_testing

Repository files navigation

nbdev_testing

This is a project to test out nbdev

Preprocess.lemmatize[source]

Preprocess.lemmatize()

Returns stemmed or lemmatized documents with punctuation and stopwords removed

get_freq[source]

get_freq(preprocessed_documents)

Returns list with vocabulary frequencies per document and a vocabalury list

form_matrix[source]

form_matrix(doc_freq, vocabulary)

Returns matrix with td-idf vectors.

get_query_vec[source]

get_query_vec(preprocessed_query, vocab, doc_freq)

Retun tf-idf vector of input query

get_cos_sim[source]

get_cos_sim(matrix, vector)

Returns 10 most similar documents based on cosine similarity between documents and query vector

Install

pip install nbdev_testing

How to use

Preprocess

Lemmatize

documents =  ["Hello world", "NLP is fun", "We work at the bank"]
text = Preprocess(documents)
preprocessed = text.lemmatize()
preprocessed
[['hello', 'world'], ['NLP', 'fun'], ['-PRON-', 'work', 'bank']]

TfIDF tool

Vocabulary frequency

document_frequency, vocabulary = get_freq(preprocessed)
document_frequency
[Counter({'hello': 1, 'world': 1}),
 Counter({'NLP': 1, 'fun': 1}),
 Counter({'-PRON-': 1, 'work': 1, 'bank': 1})]
vocabulary
['NLP', 'world', 'fun', 'work', 'bank', 'hello', '-PRON-']

More functions

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Generated from fastai/nbdev_template