AutoGen-TraceKit — Agentic Evaluation Toolkit

AutoGen-TraceKit is a research-oriented toolkit for evaluating agentic AI systems and autonomous problem solvers. The repository collects dataset examples, evaluation scripts, and tooling to run reproducible experiments, produce evaluation metrics, and generate visualizations for analysis.

Key features:

Dataset loading and preprocessing for evaluation tasks
Modular evaluator and solver components for running model-based experiments
Utilities for reproducible experiments (seeded runs, configurable temperatures)
Built-in analysis and visualization scripts for results and summaries

See the docs/ directory for the research proposal, methodology, and literature review that motivated this work.

Project Structure

├── README.md                    # This file
├── docs/
│   ├── research_proposal.md     # Research proposal
│   ├── literature_review.md     # Literature review and references
│   ├── methodology.md           # Detailed methodology
│   └── progress_reports/        # Weekly progress reports
├── data/                        # Datasets and data files
├── experiments/                 # Experiment scripts and configs
│   └── logs/
├── results/                     # Experimental results
├── src/                         # Source code
│   ├── evaluator/
│   ├── model/
│   ├── sanity-checks/
│   └── utils/
├── visualizations/
│   └──  visualizations.py
├── .gitignore
├── .env
├── config.py
├── requirements.txt             # Project dependencies
└── run.py

Prerequisites

Python 3.8 or higher
pip (Python package installer)

Setup Instructions

Clone the Repository

git clone https://github.com/CheliM7/AutoGen-TraceKit.git

Create a Virtual Environment
```
python -m venv env
```
Activate the Virtual Environment
- On Windows:
```
.\env\Scripts\activate
```
- On macOS/Linux:
```
source env/bin/activate
```
Create a .env File In the root directory, create a file named .env and add the following values:
```
GROQ_API_KEY=
MODEL_ID=
DATA_PATH=data/math_easy_int_120.jsonl
```
Install Dependencies
```
pip install -r requirements.txt
```
Run the Project (for initial testing, only the first five rows of the dataset will be processed. Modify run.py to handle the entire dataset as needed.)
```
python run.py
```

Generate Visualizations

python src/visualizations/visualizations.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoGen-TraceKit — Agentic Evaluation Toolkit

Project Structure

Prerequisites

Setup Instructions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
docs		docs
src		src
.gitignore		.gitignore
README.md		README.md
config.py		config.py
requirements.txt		requirements.txt
run.py		run.py

Folders and files

Latest commit

History

Repository files navigation

AutoGen-TraceKit — Agentic Evaluation Toolkit

Project Structure

Prerequisites

Setup Instructions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages