A structured, version-controlled dataset of all activity on the
quarto-dev/quarto-cli repository: issues, pull
requests, and discussions, together with every comment, reply, review, and author.
The dataset is user-agnostic.
Any contributor can be analysed or compared; mcanouil is simply one query against it.
- GitHub CLI authenticated via
gh auth login(read access to the target repository). jqfor JSON processing.maketo drive the workflow.typstto build the reports.
# Full crawl of every issue, pull request, and discussion, then build the CSVs.
make backfill
# Later: fetch only items updated since the last run, then rebuild the CSVs.
make fetch
# Rebuild the derived CSVs without fetching.
make summaryThe target repository is configured in config.sh (OWNER, REPO).
Enumeration walks the repository GraphQL object connections
(repository.issues, repository.pullRequests, repository.discussions) rather than the
search API.
The search API caps results at 1000 per query, whereas the connections page through the full
history without that limit and cost only GraphQL points.
Comments are fetched inline during enumeration.
Threads whose comment or reply counts exceed the inline page are flagged and completed by
fetch-long-threads.sh, so per-author counts stay accurate.
Fetching is incremental and resumable.
Pages are ordered by UPDATED_AT descending, and state/sync-state.json records the most
recent updatedAt seen per type, so a later make fetch stops as soon as it reaches
already-synced items.
config.sh # OWNER / REPO of the target repository
Makefile # backfill, fetch, summary, long-threads, lint, clean
queries/ # GraphQL queries
scripts/ # fetch and summarise scripts (Bash + gh + jq)
data/
raw/ # one NDJSON line per thread, comments embedded
issues.ndjson
pull-requests.ndjson
discussions.ndjson
derived/ # analysis-ready CSVs
threads.csv
comments.csv
users.csv
summary.csv
activity-monthly.csv
state/sync-state.json # last synced timestamp per type
One JSON object per line.
Common fields: type, number, title, author, createdAt, updatedAt, comments_total,
and an embedded comments array of { author, createdAt }.
Issues and pull requests add state, closedAt, and labels; pull requests also add merged,
mergedAt, and a reviews array.
Discussions add category, isAnswered, answer_author, and nested replies within each
comment.
threads.csv: one row per thread, with state, author, category, answered flag, labels, and comment and reply counts.comments.csv: one row per comment, reply, and review, flagged byis_replyandis_review.users.csv: per-user totals (issues, pull requests, and discussions opened; comments; replies; answers; reviews), sorted by total activity.summary.csv: repo-wide headline metrics asmetric,valuepairs.activity-monthly.csv: tidymonth,metric,counttime series.
# Everything mcanouil authored.
jq -c 'select(.author == "mcanouil")' data/raw/issues.ndjson
# mcanouil's row in the per-user leaderboard.
grep '^"mcanouil"' data/derived/users.csv
# Top 10 most active users.
head -n 11 data/derived/users.csvThis project is licensed under the MIT License. See the LICENSE file for details.
