Skip to content

CheliM7/AutoGen-TraceKit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AutoGen-TraceKit — Agentic Evaluation Toolkit

AutoGen-TraceKit is a research-oriented toolkit for evaluating agentic AI systems and autonomous problem solvers. The repository collects dataset examples, evaluation scripts, and tooling to run reproducible experiments, produce evaluation metrics, and generate visualizations for analysis.

Key features:

  • Dataset loading and preprocessing for evaluation tasks
  • Modular evaluator and solver components for running model-based experiments
  • Utilities for reproducible experiments (seeded runs, configurable temperatures)
  • Built-in analysis and visualization scripts for results and summaries

See the docs/ directory for the research proposal, methodology, and literature review that motivated this work.

Project Structure

├── README.md                    # This file
├── docs/
│   ├── research_proposal.md     # Research proposal
│   ├── literature_review.md     # Literature review and references
│   ├── methodology.md           # Detailed methodology
│   └── progress_reports/        # Weekly progress reports
├── data/                        # Datasets and data files
├── experiments/                 # Experiment scripts and configs
│   └── logs/
├── results/                     # Experimental results
├── src/                         # Source code
│   ├── evaluator/
│   ├── model/
│   ├── sanity-checks/
│   └── utils/
├── visualizations/
│   └──  visualizations.py
├── .gitignore
├── .env
├── config.py
├── requirements.txt             # Project dependencies
└── run.py

Prerequisites

  • Python 3.8 or higher
  • pip (Python package installer)

Setup Instructions

  1. Clone the Repository

    git clone https://github.com/CheliM7/AutoGen-TraceKit.git
  2. Create a Virtual Environment

    python -m venv env
  3. Activate the Virtual Environment

    • On Windows:

      .\env\Scripts\activate
    • On macOS/Linux:

      source env/bin/activate
  4. Create a .env File In the root directory, create a file named .env and add the following values:

    GROQ_API_KEY=
    MODEL_ID=
    DATA_PATH=data/math_easy_int_120.jsonl
  5. Install Dependencies

    pip install -r requirements.txt
  6. Run the Project (for initial testing, only the first five rows of the dataset will be processed. Modify run.py to handle the entire dataset as needed.)

    python run.py
  7. Generate Visualizations

    python src/visualizations/visualizations.py

About

Improved approach of evaluating AutoGen math agents by focusing on the reasoning process

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages