Skip to content

mshamrai/deep-language-geometry

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Language Geometry

arXiv 🤗 Dataset: language-metric-data 🤗 Space: language-metric-analysis

This repository contains the code for
“Deep Language Geometry: Constructing a Metric Space from LLM Weights.”

We construct binary “language vectors” from LLM weight importance (via OBS-style estimates), then measure inter-language distances (e.g., Hamming), enabling analysis, visualization, and downstream transfer heuristics.

📰 News

  • Accepted for a long presentation at RANLP 2025! 🎉

Usage

To calculate and save binary vector from a model and dataset run:

python main.py  --model <your model> --dataset <your dataset>

The arguments:

  • --model: The identifier for the model from Hugging Face model hub.
  • --dataset: Calibration dataset name.
  • --seed: Seed for sampling the calibration data.

For more examples of usage see launch.sh

Supplementary materials

Calculated binary vectors, Euclidian vectors and distances are published as HiggingFace dataset: mshamrai/language-metric-data.

Also, the gradio analysis tool is published as HuggingFace space: mshamrai/language-metric-analysis.

Citation

If you use this repo, dataset, or space, please cite:

@article{shamrai2025deep,
  title   = {Deep Language Geometry: Constructing a Metric Space from LLM Weights}, 
  author  = {Maksym Shamrai and Vladyslav Hamolia},
  journal = {arXiv preprint arXiv:2508.11676},
  year    = {2025},
  url     = {https://arxiv.org/abs/2508.11676}
}

License

This project is licensed under the MIT License. Feel free to use and modify the code for academic and research purposes.

About

Constructing a Metric Space from LLM Weights

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors