Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions .github/workflows/buildtest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,3 +64,33 @@ jobs:
echo ""
exit 1
fi

validate-webhooks:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this job to validate the optimized webhooks JSON is equivalent to the full webhooks JSON file.

Refer changes to the file eventPayloads.test.ts in this PR

runs-on: ubuntu-latest
name: Validate webhook optimization

steps:
- uses: actions/checkout@v4
- name: Use Node.js 22.x
uses: actions/setup-node@v4
with:
node-version: 22.x
cache: 'npm'
registry-url: 'https://npm.pkg.github.com'
- run: npm ci
env:
NODE_AUTH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Build workspaces
run: npm run build -ws
- name: Generate full webhooks file
run: cd languageservice && npm run update-webhooks
- name: Run optimization validation tests
run: cd languageservice && npm test -- --testPathPattern=eventPayloads
- name: Verify validation tests ran
run: |
if [ ! -f languageservice/src/context-providers/events/webhooks.full.validation-complete ]; then
Copy link
Collaborator Author

@ericsciple ericsciple Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the test is normally skipped when the full webhooks file doesn't exist, I made the test write a marker file. With the marker file, we can be sure the test actually ran.

The full webhooks file (unoptimized) is written above by the script npm run update-webhooks

echo "ERROR: Validation tests did not run!"
echo "The webhooks.full.validation-complete marker file was not created."
exit 1
fi
echo "Validation tests completed at: $(cat languageservice/src/context-providers/events/webhooks.full.validation-complete)"
9 changes: 5 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ node_modules
# Minified JSON (generated at build time)
*.min.json

# Intermediate JSON for size comparison (generated by update-webhooks --all)
*.all.json
*.drop.json
*.strip.json
# Full webhooks source (generated by update-webhooks, used for validation tests)
*.full.json

# Validation marker (generated by tests)
*.validation-complete
115 changes: 91 additions & 24 deletions docs/json-data-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@ This document describes the JSON data files used by the language service package

The language service uses several JSON files containing schema definitions, webhook payloads, and other metadata. To reduce bundle size, these files are:

1. **Optimized at generation time** — unused events are dropped, unused fields are stripped
2. **Minified at build time** — whitespace is removed to produce `.min.json` files
1. **Optimized at generation time** — unused events are dropped, unused fields are stripped, shared objects are deduplicated, property names are interned
2. **Compacted using a space-efficient format** — params use type-based dispatch arrays instead of objects
3. **Minified at build time** — whitespace is removed to produce `.min.json` files

The source `.json` files are human-readable and checked into the repository. The `.min.json` files are generated during build and gitignored.

Expand All @@ -19,6 +20,7 @@ The source `.json` files are human-readable and checked into the repository. The
|------|-------------|
| `src/context-providers/events/webhooks.json` | Webhook event payload schemas for autocompletion |
| `src/context-providers/events/objects.json` | Deduplicated shared object definitions referenced by webhooks |
| `src/context-providers/events/strings.json` | Interned property names shared by webhooks and objects |
| `src/context-providers/events/schedule.json` | Schedule event context data |
| `src/context-providers/events/workflow_call.json` | Reusable workflow call context data |
| `src/context-providers/descriptions.json` | Context variable descriptions for hover |
Expand All @@ -33,7 +35,7 @@ The source `.json` files are human-readable and checked into the repository. The

### Webhooks and Objects

The `webhooks.json` and `objects.json` files are generated from the [GitHub REST API description](https://github.com/github/rest-api-description):
The `webhooks.json`, `objects.json`, and `strings.json` files are generated from the [GitHub REST API description](https://github.com/github/rest-api-description):

```bash
cd languageservice
Expand All @@ -44,9 +46,10 @@ This script:
1. Fetches webhook schemas from the GitHub API description
2. **Validates** all events are categorized (fails if new events are found)
3. **Drops** events that aren't valid workflow triggers (see [Dropped Events](#dropped-events))
4. **Strips** unused fields like `description` and `summary` (see [Stripped Fields](#stripped-fields))
4. **Compacts** params into a space-efficient array format, keeping only `name`, `description`, and `childParamsGroups` (see [Compact Format](#compact-format))
5. **Deduplicates** shared object definitions into `objects.json`
6. Writes the optimized, pretty-printed JSON files
6. **Interns** duplicate property names into `strings.json` (see [String Interning](#string-interning))
7. Writes the optimized, pretty-printed JSON files

### Handling New Webhook Events

Expand All @@ -67,9 +70,9 @@ Action required:

1. Check [Events that trigger workflows](https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/events-that-trigger-workflows)

2. Edit `languageservice/script/webhooks/index.ts`:
- Add to `KEPT_EVENTS` if it's a valid workflow trigger
- Add to `DROPPED_EVENTS` if it's GitHub App or API-only
2. Edit `languageservice/src/context-providers/events/event-filters.json`:
- Add to `kept` array if it's a valid workflow trigger
- Add to `dropped` array if it's GitHub App or API-only

3. Run `npm run update-webhooks` and commit the changes

Expand Down Expand Up @@ -101,13 +104,15 @@ The code imports the minified versions:

```ts
import webhooks from "./events/webhooks.min.json"
import objects from "./events/objects.min.json"
import strings from "./events/strings.min.json"
```

## CI Verification

CI verifies that generated source files are up-to-date:

1. Runs `npm run update-webhooks` to regenerate webhooks.json and objects.json
1. Runs `npm run update-webhooks` to regenerate webhooks.json, objects.json, and strings.json
2. Checks for uncommitted changes with `git diff --exit-code`

The `.min.json` files are generated at build time and are not committed to the repository.
Expand All @@ -118,33 +123,95 @@ If the build fails, run `cd languageservice && npm run update-webhooks` locally

Webhook events that aren't valid workflow `on:` triggers are dropped (e.g., `installation`, `ping`, `member`, etc.). These are GitHub App or API-only events.

See `DROPPED_EVENTS` in `script/webhooks/index.ts` for the full list.
See `dropped` array in `src/context-providers/events/event-filters.json` for the full list.

## Stripped Fields
## Compact Format

Unused fields are stripped to reduce bundle size. For example:
Params are converted from verbose objects into compact arrays, keeping only the fields needed for autocompletion and hover docs (`name`, `description`, `childParamsGroups`). Unused fields like `type`, `in`, `isRequired`, `enum`, and `default` are discarded.

| Format | Meaning |
|--------|---------|
| `"name"` | Name only (no description, no children) |
| `[name, desc]` | Name + description (arr[1] is a string) |
| `[name, children]` | Name + children (arr[1] is an array) |
| `[name, desc, children]` | Name + description + children |

The reader uses `typeof arr[1]` to determine the format: if it's a string, it's a description; if it's an array, it's children.

**Example:**

```json
// Before (from webhooks.all.json)
// Before (object format)
{
"type": "object",
"name": "issue",
"in": "body",
"description": "The issue itself.",
"isRequired": true,
"childParamsGroups": [...]
"childParamsGroups": [
{ "name": "id" },
{ "name": "title", "description": "Issue title" }
]
}

// After (webhooks.json)
// After (compact format)
["issue", "The issue itself.", [
"id",
["title", "Issue title"]
]]
```

## String Interning

Property names that appear 2+ times are "interned" into a shared string table (`strings.json`). In the compact arrays, these names are replaced with numeric indices:

```json
// strings.json
["url", "id", "name", ...] // Index 0 = "url", 1 = "id", 2 = "name"

// webhooks.json - uses indices instead of strings
["push", [
[0, "The URL..."], // 0 = "url" from string table
[1, "Unique ID"], // 1 = "id"
2 // 2 = "name" (name-only, no description)
]]
Comment on lines +170 to +174
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example structure is misleading. The code shows ["push", [...]] which looks like a compact param array, but "push" is actually an event name (object key), not part of a param structure. Consider revising to show the actual structure: { "push": { "default": { "p": [0, 1, 2] } } } or clarifying that this is pseudo-code illustrating how params would look within the p array, not showing "push" as part of a param.

Suggested change
["push", [
[0, "The URL..."], // 0 = "url" from string table
[1, "Unique ID"], // 1 = "id"
2 // 2 = "name" (name-only, no description)
]]
{
"push": {
"default": {
"p": [
[0, "The URL..."], // 0 = "url" from string table
[1, "Unique ID"], // 1 = "id"
2 // 2 = "name" (name-only, no description)
]
}
}
}

Copilot uses AI. Check for mistakes.
```

**How to distinguish indices from other values:**

- **Top-level numbers in `p` arrays** → Object indices (references into `objects.json`)
- **Nested numbers inside compact arrays** → String indices (references into `strings.json`)
- **Literal strings** → Singletons (names appearing only once, not interned)

Singletons are kept as literal strings for readability and to avoid the overhead of adding rarely-used names to the string table.

## Deduplication

Shared object definitions are extracted into `objects.json` and referenced by index:

```json
// objects.json
[
["url", "The URL"], // Index 0
["id", "Unique identifier"], // Index 1
[...]
]

// webhooks.json - top-level numbers reference objects
{
"name": "issue",
"description": "The issue itself.",
"childParamsGroups": [...]
"push": {
"default": {
"p": [0, 1, ["ref", "The git ref"]] // 0 and 1 are object indices
}
}
}
```

Only `name`, `description`, and `childParamsGroups` are kept — these are used for autocompletion and hover docs.
This reduces duplication when the same object structure appears in multiple events (e.g., `repository`, `sender`, `organization`).

## Size Reduction

To compare all fields vs stripped, run `npm run update-webhooks -- --all` and diff the `.all.json` files against the regular ones.
The optimizations achieve approximately 97% file size reduction:

See `EVENT_ACTION_FIELDS` and `BODY_PARAM_FIELDS` in `script/webhooks/index.ts` to modify what gets stripped.
| Stage | Minified | Gzip |
|-------|----------|------|
| Original (webhooks.full.json) | 6.7 MB | 310 KB |
| After optimization | 209 KB | 22 KB |
| **Reduction** | **97%** | **93%** |
4 changes: 2 additions & 2 deletions languageservice/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,13 @@
"format-check": "prettier --check '**/*.ts'",
"lint": "eslint 'src/**/*.ts'",
"lint-fix": "eslint --fix 'src/**/*.ts'",
"minify-json": "node ../script/minify-json.js src/context-providers/descriptions.json src/context-providers/events/webhooks.json src/context-providers/events/objects.json src/context-providers/events/schedule.json src/context-providers/events/workflow_call.json",
"minify-json": "node ../script/minify-json.js src/context-providers/descriptions.json src/context-providers/events/webhooks.json src/context-providers/events/objects.json src/context-providers/events/strings.json src/context-providers/events/schedule.json src/context-providers/events/workflow_call.json",
"prebuild": "npm run minify-json",
"prepublishOnly": "npm run build && npm run test",
"pretest": "npm run minify-json",
"test": "NODE_OPTIONS=\"--experimental-vm-modules\" jest",
"test-watch": "NODE_OPTIONS=\"--experimental-vm-modules\" jest --watch",
"update-webhooks": "npx tsx script/webhooks/index.ts",
"update-webhooks": "npx tsx script/webhooks/update-webhooks.ts",
"watch": "tsc --build tsconfig.build.json --watch"
},
"dependencies": {
Expand Down
56 changes: 46 additions & 10 deletions languageservice/script/webhooks/deduplicate.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,38 @@
import Webhook from "./webhook";

/**
* Get the name from a param.
* Formats: "name" (string), or [name, ...] (array)
*/
function getParamName(param: any): string {
if (typeof param === "string") {
return param;
}
if (Array.isArray(param)) {
return param[0];
}
return param.name;
}

/**
* Get params from a webhook action.
* Uses 'p' (short key) if present, falls back to 'bodyParameters'
*/
function getParams(webhook: any): any[] {
return webhook.p || webhook.bodyParameters || [];
}

/**
* Set params on a webhook action using the short key 'p'
*/
function setParams(webhook: any, params: any[]): void {
if (webhook.p !== undefined) {
webhook.p = params;
} else {
webhook.bodyParameters = params;
}
}

// Store any repeated body parameters in an array
// and replace them in the webhook with an index in the array
export function deduplicateWebhooks(webhooks: Record<string, Record<string, Webhook>>): any[] {
Expand All @@ -10,10 +43,11 @@ export function deduplicateWebhooks(webhooks: Record<string, Record<string, Webh
const objectCount: Record<string, number> = {};

for (const webhook of iterateWebhooks(webhooks)) {
for (const param of webhook.bodyParameters) {
objectsByName[param.name] ||= [];
const index = findOrAdd(param, objectsByName[param.name]);
const key = `${param.name}:${index}`;
for (const param of getParams(webhook)) {
const name = getParamName(param);
objectsByName[name] ||= [];
const index = findOrAdd(param, objectsByName[name]);
const key = `${name}:${index}`;
objectCount[key] ||= 0;
objectCount[key]++;
}
Expand All @@ -27,18 +61,19 @@ export function deduplicateWebhooks(webhooks: Record<string, Record<string, Webh

for (const webhook of iterateWebhooks(webhooks)) {
const newParams: any[] = [];
for (const param of webhook.bodyParameters) {
const index = find(param, objectsByName[param.name]);
const key = `${param.name}:${index}`;
for (const param of getParams(webhook)) {
const name = getParamName(param);
const index = find(param, objectsByName[name]);
const key = `${name}:${index}`;
if (objectCount[key] > 1) {
newParams.push(indexForParam(param, index, bodyParamIndexMap, duplicatedBodyParams));
newParams.push(indexForParam(param, name, index, bodyParamIndexMap, duplicatedBodyParams));
} else {
// If an object is only used once, keep it inline
newParams.push(param);
}
}

webhook.bodyParameters = newParams;
setParams(webhook, newParams);
}

return duplicatedBodyParams;
Expand Down Expand Up @@ -74,11 +109,12 @@ function find(param: any, objects: any[]): number {

function indexForParam(
param: any,
paramName: string,
paramNameIndex: number,
objectIndexMap: Record<string, number>,
duplicatedBodyParams: any[]
): number {
const key = `${param.name}:${paramNameIndex}`;
const key = `${paramName}:${paramNameIndex}`;

const existingIndex = objectIndexMap[key];
if (existingIndex !== undefined) {
Expand Down
Loading