-
Notifications
You must be signed in to change notification settings - Fork 4
185 lines (167 loc) · 9.32 KB
/
Copy pathci.yml
File metadata and controls
185 lines (167 loc) · 9.32 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
name: CI
on:
push:
branches: [master]
pull_request:
# Nightly + on-demand FULL run: the tree-sitter job below only generates when tree-sitter/**
# changed (the materialized grammar is its sole input), so these backstop the one input it can't
# see in that diff — a tree-sitter-cli bump (lockfile) — and re-verify the "beats official" claim.
schedule:
- cron: '0 9 * * *'
workflow_dispatch:
permissions:
contents: read
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
with:
submodules: true
# Node 24+ runs the .ts sources directly (native type stripping) — no build, no tsx.
- uses: actions/setup-node@v4
with:
node-version: 24
- run: npm ci
# Regenerate every grammar's artifacts FIRST: the uncommitted ones
# (*.cst-types.ts / *.cst-match.ts, gitignored) must exist before Typecheck
# and the gates, which import them. Then fail if any COMMITTED artifact
# drifts from the regenerated output (someone edited a grammar but forgot
# to regenerate). Covers all grammars (sources at the repo root) + the
# tree-sitter packages.
- name: Generate editor artifacts (committed ones must be in sync)
run: |
npm run gen
git diff --exit-code -- '*.tmLanguage.json' '*.language-configuration.json' '*.monarch.json' '*.contributes.json' tree-sitter
- name: Typecheck
run: npx tsc --noEmit
# Every correctness GATE through ONE runner — sanity / agnostic / conformance /
# highlighter / vue / yaml / the generative scope≡role check / the gap-ledger selftest
# + --check (stale KNOWN-GAPS.md fails). One ✓/✗ summary, one exit code. The comparative
# METRICS (scope-gap / src-coverage) and BENCH tools need the external TS corpus / VS Code
# grammars and run in the readme-bench workflow, not here. See TESTING.md for the taxonomy.
- name: Test
run: npm run check
# Engine-parity BREADTH guard. The `test` job already runs the three parity gates
# (emit-parser-verify / emit-reject-messages / emit-lexer-verify) on the corpus-free
# in-repo corpus — that is the standing mechanism that forces a gen-parser change to
# propagate to emit-parser. This job adds the full external TS corpus for breadth, so a
# divergence on some construct the in-repo corpus does not exercise still gets caught.
# Gated on parser/grammar changes (like the treesitter job) so it doesn't clone the
# corpus on doc-only pushes; schedule / workflow_dispatch force the full run.
emit-parity:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
with:
fetch-depth: 0 # need history to diff against the base for the path gate below
- name: Did the parser/grammar inputs change?
id: changed
run: |
if [ "${{ github.event_name }}" != "push" ] && [ "${{ github.event_name }}" != "pull_request" ]; then
echo "value=true" >> "$GITHUB_OUTPUT"; echo "forced full run (${{ github.event_name }})"; exit 0
fi
if [ "${{ github.event_name }}" = "pull_request" ]; then base="${{ github.event.pull_request.base.sha }}"; else base="${{ github.event.before }}"; fi
if [ -z "$base" ] || ! git cat-file -e "$base^{commit}" 2>/dev/null; then
echo "value=true" >> "$GITHUB_OUTPUT"; echo "no usable base — running the gate"; exit 0
fi
if git diff --name-only "$base" HEAD | grep -qE '^src/|^[^/]+\.ts$|^test/emit-'; then
echo "value=true" >> "$GITHUB_OUTPUT"; echo "parser/grammar changed — running the breadth gate"
else
echo "value=false" >> "$GITHUB_OUTPUT"; echo "no parser/grammar change — skipping the corpus clone"
fi
- uses: actions/setup-node@v4
if: steps.changed.outputs.value == 'true'
with:
node-version: 24
- if: steps.changed.outputs.value == 'true'
run: npm ci
# Pinned-SHA, shallow, sparse clone of the TS conformance corpus to the fixed path the
# parity gates auto-detect (same pin + technique as the readme-bench workflow).
- name: Clone the pinned TS corpus
if: steps.changed.outputs.value == 'true'
run: |
set -euo pipefail
rm -rf /tmp/ts-repo; mkdir -p /tmp/ts-repo
git -C /tmp/ts-repo init -q
git -C /tmp/ts-repo remote add origin https://github.com/microsoft/TypeScript
git -C /tmp/ts-repo config core.sparseCheckout true
printf 'tests/cases/\n' > /tmp/ts-repo/.git/info/sparse-checkout
git -C /tmp/ts-repo fetch -q --depth 1 --filter=blob:none origin 6fbce89821d93a5b761581d9ac540455f38e9acb
git -C /tmp/ts-repo checkout -q FETCH_HEAD
- name: Engine-parity over the full corpus
if: steps.changed.outputs.value == 'true'
run: |
node test/emit-parser-verify.ts all
node test/emit-reject-messages.ts
node test/emit-lexer-verify.ts
# The derived tree-sitter highlighter is the strongest thesis proof (a real GLR
# parser from the same grammar, beating the official hand-written one). Build its
# wasm and gate the accuracy so the 95.9% is verified, not just claimed. The
# tree-sitter CLI bundles its own wasm toolchain — no emscripten/docker needed.
treesitter:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
with:
fetch-depth: 0 # need history to diff against the base for the path gate below
# `tree-sitter generate` is ~5 min for the TS grammar (issue #46: the state count is at the
# floor for a unified-grammar-derived parser, so the cost is irreducible) — but the generated
# parser is a PURE FUNCTION of the committed tree-sitter/** (grammar.js + scanner.c + queries),
# and the `test` job fails if those drift from the grammar sources, so EVERY grammar change
# necessarily lands as a tree-sitter/** diff. Re-running generate when nothing under
# tree-sitter/** changed is pure waste, so gate the expensive steps on it. The job still RUNS
# (reports success) — only the steps are skipped — so a required status check is never pending.
# schedule / workflow_dispatch force the full run regardless (the lockfile/cli-bump backstop).
- name: Did the tree-sitter inputs change?
id: changed
run: |
if [ "${{ github.event_name }}" != "push" ] && [ "${{ github.event_name }}" != "pull_request" ]; then
echo "value=true" >> "$GITHUB_OUTPUT"; echo "forced full run (${{ github.event_name }})"; exit 0
fi
if [ "${{ github.event_name }}" = "pull_request" ]; then base="${{ github.event.pull_request.base.sha }}"; else base="${{ github.event.before }}"; fi
if [ -z "$base" ] || ! git cat-file -e "$base^{commit}" 2>/dev/null; then
echo "value=true" >> "$GITHUB_OUTPUT"; echo "no usable base — running the gate"; exit 0
fi
if git diff --name-only "$base" HEAD | grep -qE '^tree-sitter/'; then
echo "value=true" >> "$GITHUB_OUTPUT"; echo "tree-sitter/** changed — running the gate"
else
echo "value=false" >> "$GITHUB_OUTPUT"; echo "no tree-sitter/** change — skipping generate/build/bench"
fi
- uses: actions/setup-node@v4
if: steps.changed.outputs.value == 'true'
with:
node-version: 24
- if: steps.changed.outputs.value == 'true'
run: npm ci
# Conflict gate: `tree-sitter generate` for every derived grammar IN PARALLEL (was sequential
# ~12 min; parallel ≈ the slowest single grammar, ts/tsx ~5 min). A conflict introduced by a
# grammar change is caught even for the dialects whose wasm is not built below (tsx/js/jsx) —
# exactly the gap that once let an unresolved `type`/`class_heritage` conflict ship. yaml
# included (issue #3): its indent/scalar externals + C scanner make it generate + build.
- name: Generate every derived tree-sitter grammar (parallel conflict gate)
if: steps.changed.outputs.value == 'true'
run: |
langs=(typescript typescriptreact javascript javascriptreact html yaml)
pids=()
for g in "${langs[@]}"; do
( cd "tree-sitter/$g" && npx tree-sitter generate ) >"/tmp/gen-$g.log" 2>&1 &
pids+=($!)
done
fail=0
for i in "${!langs[@]}"; do
if wait "${pids[$i]}"; then echo "✓ ${langs[$i]}"; else echo "✗ ${langs[$i]}"; cat "/tmp/gen-${langs[$i]}.log"; fail=1; fi
done
exit $fail
# Build the gated wasms FROM the parser.c just generated (no re-generate) and run the accuracy
# benches: ts must beat official (the thesis proof), html vs parse5. The YAML wasm is built to
# prove its C indentation scanner compiles + links; its accuracy bench needs the yaml-test-suite
# checkout, so it runs in the readme-bench workflow.
- name: Build wasm + accuracy gate (typescript / html / yaml)
if: steps.changed.outputs.value == 'true'
run: |
( cd tree-sitter/typescript && npx tree-sitter build --wasm . )
( cd tree-sitter/html && npx tree-sitter build --wasm . )
( cd tree-sitter/yaml && npx tree-sitter build --wasm . )
node test/treesitter-bench.ts
node test/html-treesitter.ts