Skip to content

feat(github): events, contributions (multi-year), authored history + profile pinned/orgs/achievements#75

Open
volod-vana wants to merge 3 commits into
mainfrom
github-events-contributions
Open

feat(github): events, contributions (multi-year), authored history + profile pinned/orgs/achievements#75
volod-vana wants to merge 3 commits into
mainfrom
github-events-contributions

Conversation

@volod-vana
Copy link
Copy Markdown
Member

Summary

Extends the GitHub connector from 3 scopes → 6 scopes to surface a Personal Server with an actually-useful slice of the user's GitHub history. Same auth flow (Playwright session login) — no new credential prompts, no PAT.

The motivating gap: `github.repositories` only sees a user's owned repos (5 for the test account). For most engineers — including ours — daily activity happens in org-owned repos, which were entirely invisible to PS. After this PR, a typical Vana team member's PS holds ~3 years of authored PRs across all orgs, plus a full per-year contribution graph.

Scopes added

  • `github.events` — public activity feed via Events API (push/PR/issue/branch/release/comment/fork/star, ~90-day rolling window, 300-event cap, anonymous)
  • `github.contributions` — contribution graph scraped per-year for 4 years (current + 3 prior), not the default rolling 12-month window. Adds `yearTotals[]`, per-month aggregates, top day of the year
  • `github.history` — full lifetime of authored PRs + issues via Search API (up to 1000 per type, ~6.5s gap between pages to stay under the 10 req/min anonymous limit)

Scope extended

  • `github.profile` now also captures pinned repositories, org memberships, achievement badges, and the year-counter "N contributions in the last year"

Connector version

`1.2.0` → `1.4.0` (additive only — existing scope shapes unchanged).

Validations run locally

  • `scripts/validate-manifests.mjs` ✅ 21 manifests, 0 errors
  • `scripts/normalize-manifests.mjs --check` ✅ no drift
  • `scripts/validate-scope-schemas.mjs` ✅ all 18 registered manifests have local schemas
  • `scripts/check-additive-schemas.mjs` ✅ 48 schemas additive
  • `scripts/check-source-id-stability.mjs` ✅ source_id stable
  • `scripts/check-page-api-additive.mjs` ✅ no new page API methods
  • `scripts/schema-health-check.mjs` against dev gateway ✅ none of the new scopes show as missing (the 9 blocking issues it does report are pre-existing or from other in-flight PRs — `instagram.following`, `steam.`, `icloud_notes.`, `slack.*`, plus a `linkedin.profile` metadata mismatch)

`generate-connector-index.mjs --check` fails locally on macOS (BSD tar missing `--sort=name`); should pass on Ubuntu CI.

Schema registration (dev gateway)

All three new scopes already registered on `dev.data-gateway.vana.org` against the project's builder key:

Scope schemaId
`github.events` `0x83bd91820a6b553210324d9df44e94c31096482039b0fd38ec729e16700cb9df`
`github.contributions` `0xe60fdf594b68be1682ed78f1c805f21b1e33214e59d8a4ec31c3a00375dc9e81`
`github.history` `0xb3c738188de0bcd829663b44d5a6d30766785b523e2d5b440101ec764c67a5c1`

Their current `definitionUrl` points at temporary gists. Post-merge action: re-register with canonical URLs at `raw.githubusercontent.com/vana-com/data-connectors/main/connectors/github/schemas/{scope}.json`. I can do this in a follow-up — schemaId changes because URL is part of the EIP-712 payload, so we'll need to coordinate with anyone already pointing at the old ids.

Real-world impact (test account `@volod-vana`)

Metric Before After
PS scopes returning data 3 (snapshots only) 6
GitHub timeline moments 14 (profile snapshots) 327
Date span ~1 week ~3 years (Aug 2023 → May 2026)
Authored PRs visible 0 167 (across vana-com/vana-framework, data-connect, vana-dlp-chatgpt, data-connectors, vana-sdk, …)
Contribution heatmap last 12 months 4 full years (2026=725, 2025=766, 2024=1658, 2023=312)

Test plan

  • CI guardrails green
  • Run connector against own GitHub account in DataConnect desktop — confirm all 6 scopes complete without errors
  • Verify Personal Server accepts all 6 scope writes (requires schemas registered on whichever gateway the desktop is pointed at)
  • Check timeline coverage in a downstream consumer (e.g. Memory app) — should now show org-repo PRs and multi-year contribution data

…evements

Adds two new scopes to the GitHub connector and extends profile.

github.events
  - GET api.github.com/users/{u}/events/public (anonymous, 60/hr — fine
    for daily sync)
  - Normalizes 12 event types (PushEvent, PullRequestEvent,
    PullRequestReviewEvent, PullRequestReviewCommentEvent, IssuesEvent,
    IssueCommentEvent, CreateEvent, DeleteEvent, ForkEvent, WatchEvent,
    ReleaseEvent, GollumEvent, CommitCommentEvent) to a uniform shape
    {id, type, createdAt, repo, action, title, body, url, branch, commits}.
  - Up to 300 events / ~90 days (GitHub events API ceiling).
  - Crucially: includes activity in organization-owned repos that
    github.repositories cannot see (it only scrapes ?tab=repositories,
    which lists owned + forks).

github.contributions
  - Scrapes the contribution-graph cells from /{u} (both old "rect.day"
    and new "td.ContributionCalendar-day" markups).
  - Returns: days[{date, count, level}], monthlyTotals[], topDay,
    totalContributionsLastYear.

github.profile (extended)
  - pinnedRepositories — what the user curates on their profile
  - organizations — visible org-avatar group
  - achievements — badge names + icon URLs
  - contributionsLastYear — quick-glance counter from the profile heading
github.history (NEW)
  - Up to 1000 authored PRs + 1000 authored issues via Search API
  - Full lifetime (not limited to the 90-day events window)
  - Includes any repo the user has touched: org-owned, forks, anything
  - Anonymous Search API (no token / no flow changes), 6.5s gap
    between pages to stay under the 10 req/min rate limit
  - Each item carries title, body, dates, repo, labels, state, url

github.contributions (extended)
  - Now scrapes 4 calendar years (current + 3 prior) instead of the
    rolling 12-month window
  - New yearTotals[] field aggregates per-year contribution counts
  - De-dupes overlapping days across year windows
  - First-year scrape miss is fatal; later-year misses tolerable

Bumps connector version to 1.4.0.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 26, 2026

Schema Health Check — Non-blocking inherited issues

44/50 scopes consistent | 6 inherited Gateway gap(s) | no new blocking issues in this PR

- registry.json: bump github-playwright 1.2.0 → 1.4.0, refresh script
  and metadata sha256s, update description
- connectors/github/github-playwright.js: sync VERSION const to 1.4.0
- artifacts/github-playwright/github-playwright-1.4.0.tgz: new bundle
- connector-index.json: append 1.4.0 entry, refresh sourceTag /
  sourceCommit / artifactUrl on all entries to match this branch's HEAD
  (standard regen side-effect)

The 16 other byte-different artifacts are gtar-version artifacts from
running scripts/generate-connector-index.mjs on macOS (BSD tar lacks
--sort=name; we use gtar via PATH shim). Functionally identical to
main's artifacts — same bundle contents, different gzip framing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant