Add acceptance tests using behat by math3usmartins · Pull Request #4 · xphp-lang/language-server

math3usmartins · 2026-06-03T08:18:59Z

Summary

Turns the features/ Gherkin specs into a real, end-to-end Behat acceptance
suite that drives the production language server and runs in CI. The specs are
now executable, living documentation: 58 scenarios / 339 steps covering the
LSP surface across five themes -- Navigate, Edit, Understand, Validate, Find.

Everything runs fully in-memory (no stdio, sockets, or files), so the suite
is isolated and parallel-safe, and a new behat-lsp CI job gates every PR.

What it does

Each scenario drives the real server end-to-end via phpactor's
LanguageServerTester: it builds the production LspDispatcherFactory, runs the
initialize/ServerCapabilities handshake, opens fixtures through
textDocument/didOpen, and routes real JSON-RPC requests through the full
middleware + argument-resolver stack to the actual handlers. There is no
re-derived copy of the wiring, so the tests and production can't drift.

Coverage by theme:

Navigate -- definition, type-definition, references, implementation,
document & workspace symbols, document highlight, call hierarchy, type
hierarchy.
Edit -- rename, code actions (import / optimize / diagnostic-fix), code
lens (+resolve), workspace/willRenameFiles.
Understand -- hover, signature help, inlay hints, folding ranges, semantic
tokens.
Validate -- diagnostics: parse, undefined-name, bound violation,
constructor-arg mismatch (pull-mode, through the real diagnostics handler).
Find -- completion (type-arg suggestions, scope-aware insert text, prefix
and bound filtering) and completionItem/resolve.

Assertions are exact and grounded in the existing PHPUnit ground truth:
covered source text for ranges (references, diagnostics underlines, code-action
edits, rename edits, selection ranges, semantic tokens), exact counts and
structure (outline nesting, hint counts), exact labels/kinds/details, and
negative cases (null/empty results where nothing should match). Repetition is
collapsed with Scenario Outlines (document-symbol members, signature
parameters, completion prefixes).

How it's wired

Isolated Behat install under tools/behat/ with its own composer.json --
Behat 3.x caps symfony/console at ^7 while the root pins ^8 (via
xphp-lang/xphp), so it can't live in the root require-dev. A files-autoload
pulls in the root autoloader; tools/behat/bootstrap.php silences the warmer
chatter.
Plain context classes, not traits: a World value object holds the
per-scenario state (the tester, fixtures, last response, helpers) and is
constructor-injected into each context by a small Behat extension
(WorldExtension + WorldArgumentResolver), which also resets it before each
scenario/outline example. ServerContext owns the cross-theme Givens and
generic request dispatch; one *Context class per theme holds its steps.
Deferred behavior is written as @todo scenarios (skipped via a gherkin
tag filter) so the suite stays green on what's expected to work.

CI

A new behat-lsp job in .github/workflows/ci-lsp.yml installs the isolated
tooling and runs make test/behat on every PR and push to main, in parallel
with the PHPUnit gate. The Behat command runs with memory_limit=-1 so the first
scenario's worse-reflection stub-map build fits in a cold CI cache.

Known gaps (tracked as `@todo`)

Go-to-definition through a generic method call doesn't yet resolve to the
method declaration (class/type-arg jumps work).
The duplicate-template diagnostic is detected by the analyzer but the
per-file pull provider canonicalizes the edited file, so it surfaces on the
other file; surfacing it on the edited file needs the roadmap's cross-file
diagnostic broadcast.

Testing

make test/behat -- 58 scenarios / 339 steps pass (deterministic across runs).
make test/behat/parallel -- one process per feature, conflict-free.
make test/unit -- unchanged and green (889 tests); no src/ changes in this
MR -- it is purely additive test tooling + specs.

Notes for reviewers

No production code was changed; every "expected to work" feature already
behaved correctly. The work is specs + the in-memory harness + CI.
The harness models fixtures as open documents, so it exercises the
open-document resolution path; the filesystem-index path is intentionally out
of scope here.

Specify expected Language Server behavior for go-to-definition, hover, and inlay hints across files. Each scenario arranges fixture file contents and a warmed FQN index (Given), issues a single LSP request (When), and asserts the response (Then). Resolution is expected to work through the filesystem index regardless of editor open/closed state. These are living specifications, not yet wired to an executable Behat harness. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…el-safe) Wire the features/ specs to the real LSP handlers via a Behat FeatureContext that opens every fixture as an in-memory TextDocumentItem -- nothing is written to disk. Each scenario builds its own workspace + handler stack, so the suite shards across processes with identical, deterministic results (verified). Behat lives in an isolated tools/behat install rather than the root require-dev: Behat 3.x caps symfony/console at ^7 while the project pins ^8 via xphp-lang/xphp. A files-autoload pulls in the root autoloader so the context resolves XPHP\Lsp\*; psr/log is pinned to 1.1.4 to match the root and a bootstrap.php silences PHP 8.4 deprecations before the root autoloader loads (mirrors test/bootstrap.php). Specs run STRICT: scenarios are written to desired behavior, so the ones the server doesn't yet satisfy fail by design (2 passed, 7 failed) as an executable backlog. Behat is therefore NOT part of the test/unit gate. make test/behat # sequential make test/behat/parallel # one process per feature (pre-warms shared stub cache) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The previous FeatureContext recorded sources and nulled the workspace on each fixture, deferring all opens to a rebuild. Replace that with a single workspace (created per scenario in the constructor) that each fixture is opened into directly. The handler stack is built once and resolves against the live workspace, so multi-file scenarios -- several files open at once -- are modeled naturally without rebuild/invalidate juggling. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Extract the shared in-memory world (workspace, full handler stack mirroring LspDispatcherFactory, fixture Givens, position/assertion helpers) into WorldTrait, and split the step definitions into one trait per theme: Navigate, Edit, Understand, Validate, Find. FeatureContext is now a thin aggregator that composes them. Pure refactor -- existing scenarios unchanged (2 passed, 7 failed); unit suite green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@todo

Cross-file go-to-definition: jump from a generic instantiation to the class declaration, and from a type-argument to the imported class. The generic-method jump is tagged @todo (not yet resolved). Add a global @todo gherkin filter so deferred scenarios are skipped and the suite stays green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Jump from a variable use to the class of its inferred type via the worse-reflection-backed resolver. Add the typeDefinition dispatch and make the "points to" matcher tolerant of the file:// URIs worse-reflection emits. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Find usages of a class across open documents: the declaration, the use import, the instantiation, and a fully-qualified type hint (4 locations). Adds the references/implementation/documentHighlight position dispatch and list-location assertion steps. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

List the direct implementers of an interface across open documents. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Highlight the class declaration plus both usages in the current file (3 hits). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Outline a class with its constant, properties, constructor and method. Adds a document-level request dispatcher and a recursive outline assertion. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Filter project symbols by a case-insensitive substring of the short name. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Prepare a call-hierarchy item at a method, then walk incoming calls (callers) and outgoing calls (callees) across open documents. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Prepare a type-hierarchy item, then walk supertypes (parent class) and subtypes (interface implementers) across open documents. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Rename a class and have its declaration plus the use import and instantiation all rewritten (2 files, 3 edits). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Quick-fixes from all three providers: import an unresolved class, optimize (remove) an unused import, and fix an undefined-name typo from a diagnostic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Emit a "Show references" lens above a declaration and lazily resolve it to a usage count via codeLens/resolve. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Renaming a file whose basename matches its single class renames the class and updates the importing file -- driven entirely from open documents, no disk. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Hover over a generic instantiation shows the specialized type ("Specializes to:"), and hover over a type parameter explains it and its bound. Replaces the earlier idealized cross_file_hover spec with assertions matching real output. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A generic method call ($users->first() where $users is Collection<User>) renders the substituted return type ": ?App\Models\User" after the assignment. Replaces the earlier idealized inlay spec with the real FQN-qualified output. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Show a free function's signature with the active parameter index, and advance the active parameter past a comma. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Fold the class body and each method body; single-line declarations are not folded. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Emit a non-empty, 5-int-aligned token stream that classifies the generic T as a typeParameter token. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@todo

Diagnostics produced in-memory over the open workspace: syntax error, undefined-bareword warning, generic bound violation, and constructor argument mismatch. Duplicate-template detection works in the analyzer but is tagged @todo here because the per-file pull provider canonicalizes the edited file -- the duplicate surfaces on the other file, pending cross-file diagnostic broadcast. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Context-aware type-argument completion: suggest workspace classes, choose the fully-qualified vs short insert text by import scope, filter by typed prefix, and filter by a generic bound (Stringable). Adds the Find step trait. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Resolving a class completion item lazily enriches it with the class docblock. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@todo

Point the parallel Make target at the theme subdirs (find features -name) and warm the cache via navigate/definition. Rewrite features/README to describe the theme layout, the WorldTrait + per-theme step traits, and the @todo scenarios. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@todo

Address review finding #1 (the harness hand-wired handlers and bypassed the dispatch layer, risking drift from production). Replace WorldTrait's ~150-line re-derived handler stack with phpactor's LanguageServerTester, which builds the production LspDispatcherFactory and routes real JSON-RPC through the full middleware + argument-resolver stack. Scenarios now exercise: - the real initialize / ServerCapabilities handshake - JSON-RPC routing and middleware - textDocument/didOpen sync (fixtures opened via the server) - request-param deserialization (typed *Params, plus the LspObject resolver for codeLens/resolve and completionItem/resolve) - the real XphpPullDiagnosticsHandler (textDocument/diagnostic, pull mode) There is now a single source of truth for the wiring (the factory), so the test and production graphs cannot drift. Handler results come back typed and raw (HandlerMethodRunner returns the handler's value unserialized), so the Then assertions are unchanged; only the When steps now dispatch through the tester. Everything stays in-memory (TestMessageTransmitter buffer; no stdio/sockets/ files), so parallel sharding remains conflict-free. bootstrap.php sets XPHP_LSP_QUIET=1 via putenv to silence the warmers' stderr (shell env-prefixes don't propagate through the containerized php proxy). Full suite: 39 passed (2 @todo skipped); unit suite green. Coverage deepening (negative cases, Scenario Outlines, assertion tightening) remains a follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@todo

…asses Per review preference, drop the trait composition (WorldTrait + 5 step traits in one FeatureContext) in favor of plain classes: - World -- shared per-scenario state + helpers (the tester, request dispatch, position/assertion helpers); not a Context. - WorldExtension / WorldArgumentResolver -- a small Behat extension that constructor-injects a fresh World into every context (tag context.argument_resolver) and resets it before each scenario/example (subscribes to ScenarioTested/ExampleTested BEFORE). The reset-before- construct ordering is guaranteed by Behat. - ServerContext -- cross-theme fixture Givens + generic request dispatchers. - {Navigate,Edit,Understand,Validate,Find}Context -- one class per theme, each `__construct(World $world)` and delegating shared concerns to it. Pure refactor: no feature files change. Full suite 39 passed (2 @todo skipped), deterministic, parallel conflict-free; unit suite green. Per-scenario isolation verified by content-conflicting scenarios (e.g. references asserts exactly 4 locations; 5 completion scenarios reuse URIs with different content). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…names Add World::textForRange / decodeSemanticTokens (range-as-text helpers). Navigate now asserts: each reference/implementation/highlight covers the exact source text (not just a uri/count); the document outline's class has exactly 5 nested members with the right kinds and a selectionRange covering the name; workspace search returns exactly one result of kind class; call-hierarchy incoming/outgoing use exact names (App\persist); type-hierarchy entries carry the expected fqn. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Code actions now assert kind + the actual edit: import inserts the use statement (refactor.rewrite), optimize removes the unused-use line (source.organizeImports), the typo fix replaces "nul" with "null" (quickfix). Code lens resolves to the exact "2 usages" and carries the showReferences locations. Rename edits each cover the old name; willRename inserts the new class name. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ions Signature label asserted exactly; hover requires the full pinned substring set (specialized FQN, `T`, App\Box, Stringable). Inlay asserts exactly one hint and its character position just after $first. Folding asserts the region kind. Semantic tokens decode to (text,type) and assert a typeParameter token actually covering "T". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Each diagnostic now asserts the exact source text its range underlines: undefined-name -> "nul", bound violation -> "Box", ctor-arg-mismatch -> "new User()", in addition to the code + message. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…mentation Completion now asserts the Plastic item's kind (class) and detail (App\Models\Plastic) alongside its exact insertText; completionItem/resolve asserts the documentation equals "A user account." exactly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…line Negatives: go-to-definition of an undeclared class returns null; an interface with no implementers yields 0 locations; a no-match workspace search is empty. Convert the per-member outline assertions into a Scenario Outline over (kind, member). Adds a shared `the response is null` step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A clean cursor position offers no code actions; renaming a non-symbol position (a literal) returns a null WorkspaceEdit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tline Negatives: hover over a literal is null; a file with no generic assignment yields zero inlay hints; signature help outside a call is null. Convert the active-parameter checks into a Scenario Outline over (cursor, param). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A well-formed file reports no diagnostics through the pull handler. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Convert prefix filtering into a Scenario Outline over (prefix, match, other) -- including a parameterized fixture. Negatives: a prefix matching no class suggests none; resolving a class with no docblock adds no documentation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@todo

Add a behat-lsp job to ci-lsp.yml (alongside phpunit-lsp) that installs the isolated tools/behat tooling and runs `make test/behat` on every PR and push to main. The suite drives the real LSP dispatcher fully in-memory; @todo scenarios are skipped via the gherkin tag filter, so the run is green. Also pass -d memory_limit=-1 to the Behat command so the first scenario's worse-reflection stub-map build (~512M, like the PHPUnit handler tests) doesn't OOM on a cold CI cache. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

math3usmartins and others added 30 commits June 2, 2026 23:29

test(navigate): go-to-implementation behavior spec

b7b4ec8

List the direct implementers of an interface across open documents. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test(navigate): document-highlight behavior spec

2ca815c

Highlight the class declaration plus both usages in the current file (3 hits). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test(navigate): document-symbol outline behavior spec

0053f91

Outline a class with its constant, properties, constructor and method. Adds a document-level request dispatcher and a recursive outline assertion. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test(navigate): workspace-symbol search behavior spec

7349e13

Filter project symbols by a case-insensitive substring of the short name. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test(navigate): call-hierarchy behavior spec

eba0774

Prepare a call-hierarchy item at a method, then walk incoming calls (callers) and outgoing calls (callees) across open documents. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test(navigate): type-hierarchy behavior spec

fa875e0

Prepare a type-hierarchy item, then walk supertypes (parent class) and subtypes (interface implementers) across open documents. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test(edit): rename behavior spec

cd8063f

Rename a class and have its declaration plus the use import and instantiation all rewritten (2 files, 3 edits). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test(edit): code-action behavior spec

ed591e9

Quick-fixes from all three providers: import an unresolved class, optimize (remove) an unused import, and fix an undefined-name typo from a diagnostic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test(edit): code-lens behavior spec

83c0ada

Emit a "Show references" lens above a declaration and lazily resolve it to a usage count via codeLens/resolve. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test(edit): workspace/willRenameFiles behavior spec

8b2b1b0

Renaming a file whose basename matches its single class renames the class and updates the importing file -- driven entirely from open documents, no disk. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test(understand): signature-help behavior spec

36a04ae

Show a free function's signature with the active parameter index, and advance the active parameter past a comma. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test(understand): folding-range behavior spec

0f58384

Fold the class body and each method body; single-line declarations are not folded. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test(understand): semantic-tokens behavior spec

a043332

Emit a non-empty, 5-int-aligned token stream that classifies the generic T as a typeParameter token. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test(find): completion-item resolve behavior spec

7597f47

Resolving a class completion item lazily enriches it with the class docblock. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

math3usmartins and others added 9 commits June 3, 2026 06:53

test(edit): add negative cases (no code actions / no rename edit)

b1eec28

A clean cursor position offers no code actions; renaming a non-symbol position (a literal) returns a null WorkspaceEdit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

test(validate): add a clean-file negative (no diagnostics)

386dd51

A well-formed file reports no diagnostics through the pull handler. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

math3usmartins merged commit 6b5feed into main Jun 3, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add acceptance tests using behat#4

Add acceptance tests using behat#4
math3usmartins merged 39 commits into
mainfrom
test/behat-executable-specs

math3usmartins commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

math3usmartins commented Jun 3, 2026

Summary

What it does

How it's wired

CI

Known gaps (tracked as @todo)

Testing

Notes for reviewers

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Known gaps (tracked as `@todo`)