Skip to content

Conversation

@jonleinena
Copy link

Pull Request: Add CodeXGLUE Code-to-Code Translation Task with CodeBLEU Metric

Summary

This PR adds support for the CodeXGLUE Code-to-Code Translation benchmark, enabling evaluation of models on translating code between Java and C# using the CodeBLEU metric.

Motivation

CodeBLEU is the recommended metric for code translation tasks as it goes beyond standard BLEU by considering:

  • N-gram matching (like BLEU)
  • Weighted n-gram matching based on syntax
  • AST-based syntax matching
  • Dataflow matching for semantic similarity

This makes it particularly suited for evaluating code generation and translation quality.

Changes

New Files

  • bigcode_eval/tasks/codexglue_code_to_code_trans.py - Task implementation
  • bigcode_eval/tasks/few_shot_examples/codexglue_code_to_code_trans_few_shot_prompts.json - Few-shot examples
  • requirements-codebleu.txt - CodeBLEU dependency

Modified Files

  • bigcode_eval/tasks/__init__.py - Register the new task
  • requirements.txt - Add tree-sitter dependencies
  • README.md - Document the new task and installation instructions

New Tasks

Task Name Description
codexglue_code_to_code_trans-java_cs Java → C# translation
codexglue_code_to_code_trans-cs_java C# → Java translation

Installation

Due to a dependency conflict between codebleu and newer tree-sitter versions, install with:

pip install -r requirements-codebleu.txt --no-deps
pip install -r requirements.txt

This installs codebleu without its dependencies first (bypassing the tree-sitter>=0.22.0,<0.23.0 constraint), then installs the compatible tree-sitter==0.25.2 packages required by the language parsers.

Usage

accelerate launch main.py \
  --model <MODEL_NAME> \
  --tasks codexglue_code_to_code_trans-java_cs \
  --max_length_generation 512 \
  --n_samples 1 \
  --batch_size 1 \
  --save_generations

Dataset

References

Checklist

  • The title is a summary of the contribution
  • All public methods have informative docstrings
  • Existing tests pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant