Lute Lemma

A simple python 3 script to generate child-parent lemma mappings for import into Lute.

This uses spaCy-stanza, a wrapper around Stanza (formerly StanfordNLP) to find lemma. Stanza has models for over 60 languages, see this page for the complete list.

For example, given the following Spanish input file (in demo/es_input.txt):

perros
perro
perras
vives
vivimos
vivieron
muchacho
muchacha
coches
vez

The script generates a csv file containing the following (Lute term import csv), with the root form in the parent column, and the the child form from the input file under term:

language,term,translation,parent,tags,pronunciation
Spanish,coche,,coches,,
Spanish,perra,,perras,,
Spanish,perro,,perros,,
Spanish,vivir,,vives,,
Spanish,vivir,,vivieron,,
Spanish,vivir,,vivimos,,

Only cases where the lemma form differs from the original term are included, so this doesn't show the terms muchacho, vez, que, etc.

(If you know Spanish, you'll see that some of the above aren't really useful ... but the spaCy pipeline is often very, very good.)

Requirements

python3 (I've only tested this with python 3.11, it may work with earlier versions)

Installation

Note this uses python3.11 explictly.

python3.11 -m venv .env
source .env/bin/activate
pip3.11 install -r requirements.txt

Usage:

$ source .env/bin/activate

# Ignore warnings with -W ignore
#
# 1st arg: LUTE language name
# 2nd arg: Stanza language code of the terms (link below)
# 3rd arg: path to input file
# 4th arg: path to the output file
#
$ python -W ignore main.py Spanish es demo/es_input.txt output.txt
Opening library ...
Done.
Loading pipeline ...
Done.  Processing 1 batches.
  1 of 1

File generated: output_1.txt

Please remove any mappings you don't want from this file before importing it.

Stanza language codes: Stanza has models for over 60 languages, see this page for the complete list.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
demo		demo
lutelemma		lutelemma
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lute Lemma

Requirements

Installation

Usage:

Refs

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lute Lemma

Requirements

Installation

Usage:

Refs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages