Skip to content

mcanouil/quarto-cli-activity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quarto CLI Activity

A structured, version-controlled dataset of all activity on the quarto-dev/quarto-cli repository: issues, pull requests, and discussions, together with every comment, reply, review, and author.

The dataset is user-agnostic. Any contributor can be analysed or compared; mcanouil is simply one query against it.

mcanouil's activity report

Prerequisites

  • GitHub CLI authenticated via gh auth login (read access to the target repository).
  • jq for JSON processing.
  • make to drive the workflow.
  • typst to build the reports.

Usage

# Full crawl of every issue, pull request, and discussion, then build the CSVs.
make backfill

# Later: fetch only items updated since the last run, then rebuild the CSVs.
make fetch

# Rebuild the derived CSVs without fetching.
make summary

The target repository is configured in config.sh (OWNER, REPO).

How it works

Enumeration walks the repository GraphQL object connections (repository.issues, repository.pullRequests, repository.discussions) rather than the search API. The search API caps results at 1000 per query, whereas the connections page through the full history without that limit and cost only GraphQL points.

Comments are fetched inline during enumeration. Threads whose comment or reply counts exceed the inline page are flagged and completed by fetch-long-threads.sh, so per-author counts stay accurate.

Fetching is incremental and resumable. Pages are ordered by UPDATED_AT descending, and state/sync-state.json records the most recent updatedAt seen per type, so a later make fetch stops as soon as it reaches already-synced items.

Layout

config.sh                 # OWNER / REPO of the target repository
Makefile                  # backfill, fetch, summary, long-threads, lint, clean
queries/                  # GraphQL queries
scripts/                  # fetch and summarise scripts (Bash + gh + jq)
data/
  raw/                    # one NDJSON line per thread, comments embedded
    issues.ndjson
    pull-requests.ndjson
    discussions.ndjson
  derived/                # analysis-ready CSVs
    threads.csv
    comments.csv
    users.csv
    summary.csv
    activity-monthly.csv
state/sync-state.json     # last synced timestamp per type

Data schema

data/raw/*.ndjson

One JSON object per line. Common fields: type, number, title, author, createdAt, updatedAt, comments_total, and an embedded comments array of { author, createdAt }. Issues and pull requests add state, closedAt, and labels; pull requests also add merged, mergedAt, and a reviews array. Discussions add category, isAnswered, answer_author, and nested replies within each comment.

data/derived/*.csv

  • threads.csv: one row per thread, with state, author, category, answered flag, labels, and comment and reply counts.
  • comments.csv: one row per comment, reply, and review, flagged by is_reply and is_review.
  • users.csv: per-user totals (issues, pull requests, and discussions opened; comments; replies; answers; reviews), sorted by total activity.
  • summary.csv: repo-wide headline metrics as metric,value pairs.
  • activity-monthly.csv: tidy month,metric,count time series.

Per-user query recipes

# Everything mcanouil authored.
jq -c 'select(.author == "mcanouil")' data/raw/issues.ndjson

# mcanouil's row in the per-user leaderboard.
grep '^"mcanouil"' data/derived/users.csv

# Top 10 most active users.
head -n 11 data/derived/users.csv

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

Structured dataset of all GitHub activity on quarto-dev/quarto-cli — issues, pull requests, discussions, comments, reviews, and authors.

Topics

Resources

License

Stars

Watchers

Forks

Contributors