Agent Adoption Study

Empirical study on the adoption and usage of software engineering agents in Github. Code and initial data for the paper: "Agentic Much? Adoption of Coding Agents on GitHub" (https://arxiv.org/abs/2601.18341). The data produced by the mining, used in the paper, is available on Zenodo (https://zenodo.org/records/19256968).

code contains the code and its configuration; see AGENTS.md there for an overview
data contains projects lists to kick start the analysis
config contains the patterns we look for, your github tokens, the github yaml linguist stuff
a temp directory is generated with all the data generated

Setup

The recommended way to set up the project is using uv:

cd code
uv sync

This will install all dependencies from pyproject.toml.

For manual setup with pip:

pip install -r ../requirements.txt

Dependencies

Requires Python 3.12+. Key dependencies include:

pandas
pyyaml
requests
seaborn
tqdm
upsetplot
cliffs_delta
adjustText
emoji
statsmodels
scikit-learn
plotly

How to run the data gathering

add one or more github tokens in config/tokens.ini, following config/tokens.ini.template

Quick Start: Using `full_reproduction.sh`

The recommended way to run the full pipeline (analysis + reports):

cd code
./full_reproduction.sh ../data/<project csv> ../temp/my_experiment

This runs both run.py (data gathering) and run_analysis.sh (report generation) in sequence. When the number of repositories is large (e.g. 130k repositories), the data gathering phase can be very long, and consume a lot of space (more than one terabyte).

Options:

./full_reproduction.sh ../data/<project csv> ../temp/my_experiment --num-workers 24 --analysis-date 2025-01-01

The data directories has several datasets to run on:

data/projects-29-08.csv: dataset used in the paper.
data/projects-aug-25-feb-26.csv: a more recent sample of projects
data/new_600.csv: first 600 lines of the previous, for a quicker check.

Running the analysis only

This is possible if you already have some data, for instance if you download the data from Zenodo (https://zenodo.org/records/19256968).

Manual Run: Using `run.py` directly

in the code directory, run the following for a small test (20 projects, a couple of minutes):

python run.py ../data/claude-test.csv ../data/claude-test.csv git-test

the first argument is the list of projects to analyze; the second is the list of projects to sample; the third is the name of the directory in ../temp
to run on all the data (overnight, maybe start early, ~300 GBs):

python run.py ../data/projects-29-08.csv ../data/non_adopters_10k.csv <dir_name>

or for full analysis of all projects (no sampling):

python run.py ../data/projects.csv <dir_name> --analyze-all

this generates a lot of data:
- an overall analysis report with the main petrics
- metrics about the pull requests gathered for each project
- most important commit ratios computed
in addition each project has data in its directory, which is found in a subdirectory
- tool_only_adopters
- commit_only_adopters
- tool_commit_adopters
- non_adopters
in each project directory, you will have:
- file lists
- file matching a file heuristic checked out
- pull request data
- commit data
- commit statistics
- computed metrics and commit ratios

Name		Name	Last commit message	Last commit date
Latest commit History 160 Commits
code		code
config		config
data		data
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Adoption Study

Setup

Dependencies

How to run the data gathering

Quick Start: Using `full_reproduction.sh`

Running the analysis only

Manual Run: Using `run.py` directly

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent Adoption Study

Setup

Dependencies

How to run the data gathering

Quick Start: Using full_reproduction.sh

Running the analysis only

Manual Run: Using run.py directly

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Quick Start: Using `full_reproduction.sh`

Manual Run: Using `run.py` directly

Packages