Quarto CLI Activity

A structured, version-controlled dataset of all activity on the quarto-dev/quarto-cli repository: issues, pull requests, and discussions, together with every comment, reply, review, and author.

The dataset is user-agnostic. Any contributor can be analysed or compared; mcanouil is simply one query against it.

Prerequisites

GitHub CLI authenticated via gh auth login (read access to the target repository).
jq for JSON processing.
make to drive the workflow.
typst to build the reports.

Usage

# Full crawl of every issue, pull request, and discussion, then build the CSVs.
make backfill

# Later: fetch only items updated since the last run, then rebuild the CSVs.
make fetch

# Rebuild the derived CSVs without fetching.
make summary

The target repository is configured in config.sh (OWNER, REPO).

How it works

Enumeration walks the repository GraphQL object connections (repository.issues, repository.pullRequests, repository.discussions) rather than the search API. The search API caps results at 1000 per query, whereas the connections page through the full history without that limit and cost only GraphQL points.

Comments are fetched inline during enumeration. Threads whose comment or reply counts exceed the inline page are flagged and completed by fetch-long-threads.sh, so per-author counts stay accurate.

Fetching is incremental and resumable. Pages are ordered by UPDATED_AT descending, and state/sync-state.json records the most recent updatedAt seen per type, so a later make fetch stops as soon as it reaches already-synced items.

Layout

config.sh                 # OWNER / REPO of the target repository
Makefile                  # backfill, fetch, summary, long-threads, lint, clean
queries/                  # GraphQL queries
scripts/                  # fetch and summarise scripts (Bash + gh + jq)
data/
  raw/                    # one NDJSON line per thread, comments embedded
    issues.ndjson
    pull-requests.ndjson
    discussions.ndjson
  derived/                # analysis-ready CSVs
    threads.csv
    comments.csv
    users.csv
    summary.csv
    activity-monthly.csv
state/sync-state.json     # last synced timestamp per type

Data schema

`data/raw/*.ndjson`

One JSON object per line. Common fields: type, number, title, author, createdAt, updatedAt, comments_total, and an embedded comments array of { author, createdAt }. Issues and pull requests add state, closedAt, and labels; pull requests also add merged, mergedAt, and a reviews array. Discussions add category, isAnswered, answer_author, and nested replies within each comment.

`data/derived/*.csv`

threads.csv: one row per thread, with state, author, category, answered flag, labels, and comment and reply counts.
comments.csv: one row per comment, reply, and review, flagged by is_reply and is_review.
users.csv: per-user totals (issues, pull requests, and discussions opened; comments; replies; answers; reviews), sorted by total activity.
summary.csv: repo-wide headline metrics as metric,value pairs.
activity-monthly.csv: tidy month,metric,count time series.

Per-user query recipes

# Everything mcanouil authored.
jq -c 'select(.author == "mcanouil")' data/raw/issues.ndjson

# mcanouil's row in the per-user leaderboard.
grep '^"mcanouil"' data/derived/users.csv

# Top 10 most active users.
head -n 11 data/derived/users.csv

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github		.github
data		data
queries		queries
reports		reports
scripts		scripts
state		state
templates		templates
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.sh		config.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quarto CLI Activity

Prerequisites

Usage

How it works

Layout

Data schema

`data/raw/*.ndjson`

`data/derived/*.csv`

Per-user query recipes

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Quarto CLI Activity

Prerequisites

Usage

How it works

Layout

Data schema

data/raw/*.ndjson

data/derived/*.csv

Per-user query recipes

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

`data/raw/*.ndjson`

`data/derived/*.csv`