Skip to content

Sentence boundary detection à la NLTK #30

@space-pope

Description

@space-pope

NLTK's Punkt tokenizer implements an algorithm for unsupervised sentence boundary detection that's language-independent. This algorithm should be ported to Penelope along with hooks for easy retraining using a corpus and optional list of language-specific special case regexes.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions