[FEATURE] Add MELO Benchmark datasets as a ranking task for job title normalization

## Problem

Is your proposal tackling an existing problem or limitation? 
- No, it's an addition

## Proposal

Add the MELO Benchmark datasets [[*](https://ceur-ws.org/Vol-3788/RecSysHR2024-paper_2.pdf), [**](https://huggingface.co/datasets/Avature/MELO-Benchmark)] as ranking tasks in WorkRB. The implementation would be similar to that of the new `JobTitleSimilarityRanking` task proposed in #24.

**Architectural consideration: dataset indexing within each task**
In the current WorkRB architecture, each task contains one or more datasets, indexed by language. This design limits each task to having at most one dataset per language. This constraint arises from the code in data loading, evaluation, and result aggregation. However, MELO datasets are identified by **(**`country`, `query_language`, `corpus_languages`**)** tuples, so multiple datasets share the same language. We propose generalizing the indexing from `Language` to arbitrary string identifiers. This would allow WorkRB to fully support MELO and accommodate future tasks with arbitrary indexing for datasets.

@Mattdl Thanks again for inviting us to contribute! The codebase is clean and well-designed. This proposal does add some complexity, but if you think this makes sense, I would be happy to open a separate issue to discuss the refactor. Once aligned, I can submit a PR for the refactor first, and then implement the MELO task on top of those changes.

- Type: 
    - [ ] New Ontology (data source for multiple tasks)
    - [x] New Task(s)
    - [ ] New Model(s)
    - [ ]  New Metric(s)
    - [ ] Other

- Area(s) of code: paths, modules, or APIs you expect to touch
  `src/workrb/tasks/__init__.py`
  `src/workrb/tasks/abstract/ranking_base.py`
  `src/workrb/tasks/ranking/__init__.py`
  `src/workrb/tasks/ranking/melo.py`
  `tests/test_task_loading.py`

## Additional Context

**Dataset source:**
- HuggingFace: https://huggingface.co/datasets/Avature/MELO-Benchmark
- GitHub: https://github.com/avature/melo-benchmark

**Publication:**
Retyk et al. (2024) introduced the MELO Benchmark in "MELO: Multilingual Entity Linking of Occupations" (RecSys in HR 2024).
https://ceur-ws.org/Vol-3788/RecSysHR2024-paper_2.pdf

**Dataset statistics:**

Full statistics for all 48 datasets are available in the [HuggingFace dataset card](https://huggingface.co/datasets/Avature/MELO-Benchmark).

**Task characteristics:**
- Task type: Ranking
- Label type: Multi-label (each query maps to one ESCO occupation, but occupations have multiple surface forms)
- Query input type: Job titles
- Target input type: Job titles (ESCO occupation surface forms)
- Evaluation metrics: `MRR`, `Hit@1`, `Hit@5`, and `Hit@10`, as used by Retyk et al. (2024).

**Potential future addition:**
We have unpublished equivalent datasets for skill entity linking to the ESCO Skills taxonomy (~8 datasets) [[***](https://huggingface.co/datasets/federetyk/MELS-Benchmark)]. These follow the same structure as MELO. We can discuss adding these as a separate task if you are interested!

## Implementation

- [x] I plan to implement this in a PR
- [ ] I am proposing the idea and would like someone else to pick it up


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Add MELO Benchmark datasets as a ranking task for job title normalization #30

Problem

Proposal

Additional Context

Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE] Add MELO Benchmark datasets as a ranking task for job title normalization #30

Description

Problem

Proposal

Additional Context

Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions