Skip to content

TurkuNLP/htr-annotations

Repository files navigation

htr-annotations

Manually annotated images for handwritten text recognition of Finnish migration records, including annotations for page de-skew, table structure, cell type classification, text recognition, and year recognition.

Source images: Source images for annotations can be downloaded from https://zenodo.org/records/15836012 (jpg format).

Annotations: Annotations are in PageXML format (version 2013-07-15). File names can be used to pair annotations with corrsponding source images.

Train/Dev/Test split: The data is divided into training, development and test directories. In addition to these, pielavesi-directory includes additional annotations, but note that pielavesi-annotations are not radomly sampled, all being from the Pielavesi parish. We suggest to use these as additional training data.

Annotation guidelines

See https://github.com/TurkuNLP/finnish-migration-data for more information about the project.

License

Annotations: CC-BY

Source images: See the license information in https://zenodo.org/records/15836012.

About

Handwritten text recognition annotations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages