This directory contains the Python scripts used throughout the project described in Sebastiano Vecellio Salto, Camilla Casula, Alessio Palmero Aprosio, and Sara Tonelli (2026), "University Speaking for Everyone: Assessing Changes in Italian Higher Education Statutes Toward Gender-Inclusive Language," LREC.
The folders 'data', 'db', 'db-rewrite', and 'results' store all datasets, outputs, and results associated with to the tasks described in our analysis.
The 'statutes' folder includes the PDF versions of the statutes of all Italian universities, as well as the XML and JSON files corresponding to the institutions included in our analysis. The XML files are provided in both annotated and non-annotated versions. The JSON files are divided into two subsets: one containing only gender-related paragraphs and another comprising all paragraphs from the statutes.
If you use our data, please cite this paper:
@inproceedings{vecelliosalto2026university,
title = {University Speaking for Everyone: Assessing Changes in Italian Higher Education Statutes Toward Gender-Inclusive Language},
author = {Vecellio Salto, Sebastiano and Casula, Camilla and Palmero Aprosio, Alessio and Tonelli, Sara},
booktitle = {Proceedings of the 15th Language Resources and Evaluation Conference(LREC 2026)},
year = {2026},
address = {},
publisher = {European Language Resources Association (ELRA)},
note = {To appear}
}