This repository contains code and analysis for fine-grained Part-of-Speech (POS) tagging using Large Language Models (LLMs), specifically focused on the Universal Dependencies (UD) framework. The project explores how LLMs can perform complex linguistic annotation tasks and addresses the challenges of tokenization and POS tagging in a unified pipeline.
- Universal Dependencies POS tagging with Google Gemini 2.0 Flash Lite
- Segmentation pipeline that follows UD tokenization guidelines
- Comprehensive error analysis comparing LLM performance against traditional methods
- Evaluation framework for both tokenization and tagging accuracy
- Integrated pipeline that handles both segmentation and tagging in one flow
- Python (> 3.11)
- Git
- uv (https://docs.astral.sh/uv/getting-started/)
- Visual Studio Code
-
Create a folder for the assignment:
mkdir hw1; cd hw1
-
Retrieve the dataset we will use and the code from this repo:
git clone https://github.com/UniversalDependencies/UD_English-EWT.git git clone https://github.com/melhadad/nlp-with-llms-2025-hw1.git
-
Load the required python libraries:
cd nlp-with-llms-2025-hw1; uv sync
-
Define your API keys in either gemini_key.ini or grok_key.ini
# Unix like source grok_key.ini export GROK_API_KEY=$GROK_API_KEY
For Google Gemini:
export GOOGLE_API_KEY="your-api-key" # On Windows: set GOOGLE_API_KEY=your-api-key
-
Activate the project virtual env:
source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Open ud_pos_tagger_sklearn.ipynb in VS Code and verify you can execute the cells.
The project uses the Universal Dependencies English-EWT dataset. The code expects the dataset directory structure as follows:
UD_English-EWT/
├── en_ewt-ud-dev.conllu
├── en_ewt-ud-test.conllu
└── en_ewt-ud-train.conllu
# Run the basic POS tagger
uv run ud_pos_tagger_gemini.py# Run the segmentation model
uv run ud_pos_llm_segmentor_gemini.py# Run the improved LLM tagger with integrated segmentation
uv run ud_pos_improved_llm_tagger.pyTo analyze the results and view the visualizations:
jupyter notebook ud_pos_tagger_gemini.ipynbud_pos_tagger_gemini.py: Main POS tagging implementation using Geminiud_pos_llm_segmentor_gemini.py: Specialized tokenization model for UD guidelinesud_pos_improved_llm_tagger.py: Integrated pipeline for segmentation and taggingutils.py: Helper functions for data processing and evaluationschema.py: Data structures and type definitionsprompts.py: LLM prompts for tagging and segmentationud_pos_tagger_gemini.ipynb: Jupyter notebook with detailed analysis and visualizations
The LLM tagger achieves strong performance on Universal Dependencies POS tagging, with the following key findings:
-
Strong overall accuracy, with particular strengths in:
- Adpositions (ADP)
- Proper nouns (PROPN)
- Common nouns (NOUN)
- Verbs (VERB)
-
Improvement areas compared to traditional machine learning approaches:
- Pronouns (PRON) recognition
- Distinguishing determiners (DET) from pronouns
- Particle (PART) vs. adverb (ADV) disambiguation
Our analysis of tokenization shows:
- 38.7% average error reduction when using proper UD tokenization
- Most significant improvements on sentences with hyphenated compounds and punctuation
- Most challenging segmentation cases: hyphenated terms, contractions, and special punctuation
The LLM tagger struggles most with:
- Deictic words that can be pronoun or determiner - "this/that/these/those"
- Discourse pronoun "there" vs. locative adverb
- Subordinating conjunction vs. adposition - words like "for", "in", "to"
- Verb-particle/adverb vs. preposition - "up", "out", "in", "off", "on"
- Possessive pronouns classified as determiners
- Experiment with fine-tuning approaches for the LLM
- Explore parameter-efficient adaptation for specialized linguistic domains
- Implement additional languages from Universal Dependencies
- Create a more robust evaluation framework for cross-linguistic performance
- Develop a web interface for interactive POS tagging demonstrations
- Universal Dependencies project for the dataset and guidelines
- Google for providing access to the Gemini API
- The NLP community for benchmarks and evaluation methodologies
- BGU CS Course 'NLP with LLMs' - Spring 2025 - Michael Elhadad
Authors: Gil Barel and Daniel Ohayon