AutoGen-TraceKit is a research-oriented toolkit for evaluating agentic AI systems and autonomous problem solvers. The repository collects dataset examples, evaluation scripts, and tooling to run reproducible experiments, produce evaluation metrics, and generate visualizations for analysis.
Key features:
- Dataset loading and preprocessing for evaluation tasks
- Modular evaluator and solver components for running model-based experiments
- Utilities for reproducible experiments (seeded runs, configurable temperatures)
- Built-in analysis and visualization scripts for results and summaries
See the docs/ directory for the research proposal, methodology, and literature review that motivated this work.
├── README.md # This file
├── docs/
│ ├── research_proposal.md # Research proposal
│ ├── literature_review.md # Literature review and references
│ ├── methodology.md # Detailed methodology
│ └── progress_reports/ # Weekly progress reports
├── data/ # Datasets and data files
├── experiments/ # Experiment scripts and configs
│ └── logs/
├── results/ # Experimental results
├── src/ # Source code
│ ├── evaluator/
│ ├── model/
│ ├── sanity-checks/
│ └── utils/
├── visualizations/
│ └── visualizations.py
├── .gitignore
├── .env
├── config.py
├── requirements.txt # Project dependencies
└── run.py
- Python 3.8 or higher
pip(Python package installer)
-
Clone the Repository
git clone https://github.com/CheliM7/AutoGen-TraceKit.git
-
Create a Virtual Environment
python -m venv env
-
Activate the Virtual Environment
-
On Windows:
.\env\Scripts\activate
-
On macOS/Linux:
source env/bin/activate
-
-
Create a
.envFile In the root directory, create a file named.envand add the following values:GROQ_API_KEY= MODEL_ID= DATA_PATH=data/math_easy_int_120.jsonl
-
Install Dependencies
pip install -r requirements.txt
-
Run the Project (for initial testing, only the first five rows of the dataset will be processed. Modify
run.pyto handle the entire dataset as needed.)python run.py
-
Generate Visualizations
python src/visualizations/visualizations.py