Sentence boundary detection à la NLTK

NLTK's [Punkt tokenizer](https://www.nltk.org/_modules/nltk/tokenize/punkt.html) implements an algorithm for [unsupervised sentence boundary detection](https://www.mitpressjournals.org/doi/pdfplus/10.1162/coli.2006.32.4.485) that's language-independent. This algorithm should be ported to Penelope along with hooks for easy retraining using a corpus and optional list of language-specific special case regexes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sentence boundary detection à la NLTK #30

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sentence boundary detection à la NLTK #30

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions