Skip to content

Add acceptance tests using behat#4

Merged
math3usmartins merged 39 commits into
mainfrom
test/behat-executable-specs
Jun 3, 2026
Merged

Add acceptance tests using behat#4
math3usmartins merged 39 commits into
mainfrom
test/behat-executable-specs

Conversation

@math3usmartins

Copy link
Copy Markdown
Member

Summary

Turns the features/ Gherkin specs into a real, end-to-end Behat acceptance
suite
that drives the production language server and runs in CI. The specs are
now executable, living documentation: 58 scenarios / 339 steps covering the
LSP surface across five themes -- Navigate, Edit, Understand, Validate, Find.

Everything runs fully in-memory (no stdio, sockets, or files), so the suite
is isolated and parallel-safe, and a new behat-lsp CI job gates every PR.

What it does

Each scenario drives the real server end-to-end via phpactor's
LanguageServerTester: it builds the production LspDispatcherFactory, runs the
initialize/ServerCapabilities handshake, opens fixtures through
textDocument/didOpen, and routes real JSON-RPC requests through the full
middleware + argument-resolver stack to the actual handlers. There is no
re-derived copy of the wiring, so the tests and production can't drift.

Coverage by theme:

  • Navigate -- definition, type-definition, references, implementation,
    document & workspace symbols, document highlight, call hierarchy, type
    hierarchy.
  • Edit -- rename, code actions (import / optimize / diagnostic-fix), code
    lens (+resolve), workspace/willRenameFiles.
  • Understand -- hover, signature help, inlay hints, folding ranges, semantic
    tokens.
  • Validate -- diagnostics: parse, undefined-name, bound violation,
    constructor-arg mismatch (pull-mode, through the real diagnostics handler).
  • Find -- completion (type-arg suggestions, scope-aware insert text, prefix
    and bound filtering) and completionItem/resolve.

Assertions are exact and grounded in the existing PHPUnit ground truth:
covered source text for ranges (references, diagnostics underlines, code-action
edits, rename edits, selection ranges, semantic tokens), exact counts and
structure (outline nesting, hint counts), exact labels/kinds/details, and
negative cases (null/empty results where nothing should match). Repetition is
collapsed with Scenario Outlines (document-symbol members, signature
parameters, completion prefixes).

How it's wired

  • Isolated Behat install under tools/behat/ with its own composer.json --
    Behat 3.x caps symfony/console at ^7 while the root pins ^8 (via
    xphp-lang/xphp), so it can't live in the root require-dev. A files-autoload
    pulls in the root autoloader; tools/behat/bootstrap.php silences the warmer
    chatter.
  • Plain context classes, not traits: a World value object holds the
    per-scenario state (the tester, fixtures, last response, helpers) and is
    constructor-injected into each context by a small Behat extension
    (WorldExtension + WorldArgumentResolver), which also resets it before each
    scenario/outline example. ServerContext owns the cross-theme Givens and
    generic request dispatch; one *Context class per theme holds its steps.
  • Deferred behavior is written as @todo scenarios (skipped via a gherkin
    tag filter) so the suite stays green on what's expected to work.

CI

A new behat-lsp job in .github/workflows/ci-lsp.yml installs the isolated
tooling and runs make test/behat on every PR and push to main, in parallel
with the PHPUnit gate. The Behat command runs with memory_limit=-1 so the first
scenario's worse-reflection stub-map build fits in a cold CI cache.

Known gaps (tracked as @todo)

  • Go-to-definition through a generic method call doesn't yet resolve to the
    method declaration (class/type-arg jumps work).
  • The duplicate-template diagnostic is detected by the analyzer but the
    per-file pull provider canonicalizes the edited file, so it surfaces on the
    other file; surfacing it on the edited file needs the roadmap's cross-file
    diagnostic broadcast.

Testing

  • make test/behat -- 58 scenarios / 339 steps pass (deterministic across runs).
  • make test/behat/parallel -- one process per feature, conflict-free.
  • make test/unit -- unchanged and green (889 tests); no src/ changes in this
    MR -- it is purely additive test tooling + specs.

Notes for reviewers

  • No production code was changed; every "expected to work" feature already
    behaved correctly. The work is specs + the in-memory harness + CI.
  • The harness models fixtures as open documents, so it exercises the
    open-document resolution path; the filesystem-index path is intentionally out
    of scope here.

math3usmartins and others added 30 commits June 2, 2026 23:29
Specify expected Language Server behavior for go-to-definition, hover, and
inlay hints across files. Each scenario arranges fixture file contents and a
warmed FQN index (Given), issues a single LSP request (When), and asserts the
response (Then). Resolution is expected to work through the filesystem index
regardless of editor open/closed state.

These are living specifications, not yet wired to an executable Behat harness.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…el-safe)

Wire the features/ specs to the real LSP handlers via a Behat FeatureContext
that opens every fixture as an in-memory TextDocumentItem -- nothing is written
to disk. Each scenario builds its own workspace + handler stack, so the suite
shards across processes with identical, deterministic results (verified).

Behat lives in an isolated tools/behat install rather than the root require-dev:
Behat 3.x caps symfony/console at ^7 while the project pins ^8 via xphp-lang/xphp.
A files-autoload pulls in the root autoloader so the context resolves XPHP\Lsp\*;
psr/log is pinned to 1.1.4 to match the root and a bootstrap.php silences PHP 8.4
deprecations before the root autoloader loads (mirrors test/bootstrap.php).

Specs run STRICT: scenarios are written to desired behavior, so the ones the
server doesn't yet satisfy fail by design (2 passed, 7 failed) as an executable
backlog. Behat is therefore NOT part of the test/unit gate.

  make test/behat            # sequential
  make test/behat/parallel   # one process per feature (pre-warms shared stub cache)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The previous FeatureContext recorded sources and nulled the workspace on each
fixture, deferring all opens to a rebuild. Replace that with a single workspace
(created per scenario in the constructor) that each fixture is opened into
directly. The handler stack is built once and resolves against the live
workspace, so multi-file scenarios -- several files open at once -- are modeled
naturally without rebuild/invalidate juggling.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extract the shared in-memory world (workspace, full handler stack mirroring
LspDispatcherFactory, fixture Givens, position/assertion helpers) into
WorldTrait, and split the step definitions into one trait per theme: Navigate,
Edit, Understand, Validate, Find. FeatureContext is now a thin aggregator that
composes them. Pure refactor -- existing scenarios unchanged (2 passed, 7
failed); unit suite green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cross-file go-to-definition: jump from a generic instantiation to the class
declaration, and from a type-argument to the imported class. The generic-method
jump is tagged @todo (not yet resolved). Add a global @todo gherkin filter so
deferred scenarios are skipped and the suite stays green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Jump from a variable use to the class of its inferred type via the
worse-reflection-backed resolver. Add the typeDefinition dispatch and make the
"points to" matcher tolerant of the file:// URIs worse-reflection emits.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Find usages of a class across open documents: the declaration, the use import,
the instantiation, and a fully-qualified type hint (4 locations). Adds the
references/implementation/documentHighlight position dispatch and list-location
assertion steps.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
List the direct implementers of an interface across open documents.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Highlight the class declaration plus both usages in the current file (3 hits).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Outline a class with its constant, properties, constructor and method. Adds a
document-level request dispatcher and a recursive outline assertion.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Filter project symbols by a case-insensitive substring of the short name.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Prepare a call-hierarchy item at a method, then walk incoming calls (callers)
and outgoing calls (callees) across open documents.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Prepare a type-hierarchy item, then walk supertypes (parent class) and subtypes
(interface implementers) across open documents.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rename a class and have its declaration plus the use import and instantiation
all rewritten (2 files, 3 edits).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Quick-fixes from all three providers: import an unresolved class, optimize
(remove) an unused import, and fix an undefined-name typo from a diagnostic.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Emit a "Show references" lens above a declaration and lazily resolve it to a
usage count via codeLens/resolve.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Renaming a file whose basename matches its single class renames the class and
updates the importing file -- driven entirely from open documents, no disk.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Hover over a generic instantiation shows the specialized type ("Specializes
to:"), and hover over a type parameter explains it and its bound. Replaces the
earlier idealized cross_file_hover spec with assertions matching real output.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A generic method call ($users->first() where $users is Collection<User>) renders
the substituted return type ": ?App\Models\User" after the assignment. Replaces
the earlier idealized inlay spec with the real FQN-qualified output.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Show a free function's signature with the active parameter index, and advance
the active parameter past a comma.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fold the class body and each method body; single-line declarations are not
folded.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Emit a non-empty, 5-int-aligned token stream that classifies the generic T as a
typeParameter token.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Diagnostics produced in-memory over the open workspace: syntax error,
undefined-bareword warning, generic bound violation, and constructor argument
mismatch. Duplicate-template detection works in the analyzer but is tagged @todo
here because the per-file pull provider canonicalizes the edited file -- the
duplicate surfaces on the other file, pending cross-file diagnostic broadcast.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Context-aware type-argument completion: suggest workspace classes, choose the
fully-qualified vs short insert text by import scope, filter by typed prefix,
and filter by a generic bound (Stringable). Adds the Find step trait.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Resolving a class completion item lazily enriches it with the class docblock.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Point the parallel Make target at the theme subdirs (find features -name) and
warm the cache via navigate/definition. Rewrite features/README to describe the
theme layout, the WorldTrait + per-theme step traits, and the @todo scenarios.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address review finding #1 (the harness hand-wired handlers and bypassed the
dispatch layer, risking drift from production). Replace WorldTrait's ~150-line
re-derived handler stack with phpactor's LanguageServerTester, which builds the
production LspDispatcherFactory and routes real JSON-RPC through the full
middleware + argument-resolver stack. Scenarios now exercise:

  - the real initialize / ServerCapabilities handshake
  - JSON-RPC routing and middleware
  - textDocument/didOpen sync (fixtures opened via the server)
  - request-param deserialization (typed *Params, plus the LspObject resolver
    for codeLens/resolve and completionItem/resolve)
  - the real XphpPullDiagnosticsHandler (textDocument/diagnostic, pull mode)

There is now a single source of truth for the wiring (the factory), so the test
and production graphs cannot drift. Handler results come back typed and raw
(HandlerMethodRunner returns the handler's value unserialized), so the Then
assertions are unchanged; only the When steps now dispatch through the tester.

Everything stays in-memory (TestMessageTransmitter buffer; no stdio/sockets/
files), so parallel sharding remains conflict-free. bootstrap.php sets
XPHP_LSP_QUIET=1 via putenv to silence the warmers' stderr (shell env-prefixes
don't propagate through the containerized php proxy). Full suite: 39 passed
(2 @todo skipped); unit suite green.

Coverage deepening (negative cases, Scenario Outlines, assertion tightening)
remains a follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…asses

Per review preference, drop the trait composition (WorldTrait + 5 step traits in
one FeatureContext) in favor of plain classes:

  - World            -- shared per-scenario state + helpers (the tester, request
                        dispatch, position/assertion helpers); not a Context.
  - WorldExtension /
    WorldArgumentResolver
                     -- a small Behat extension that constructor-injects a fresh
                        World into every context (tag context.argument_resolver)
                        and resets it before each scenario/example (subscribes to
                        ScenarioTested/ExampleTested BEFORE). The reset-before-
                        construct ordering is guaranteed by Behat.
  - ServerContext    -- cross-theme fixture Givens + generic request dispatchers.
  - {Navigate,Edit,Understand,Validate,Find}Context
                     -- one class per theme, each `__construct(World $world)` and
                        delegating shared concerns to it.

Pure refactor: no feature files change. Full suite 39 passed (2 @todo skipped),
deterministic, parallel conflict-free; unit suite green. Per-scenario isolation
verified by content-conflicting scenarios (e.g. references asserts exactly 4
locations; 5 completion scenarios reuse URIs with different content).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…names

Add World::textForRange / decodeSemanticTokens (range-as-text helpers). Navigate
now asserts: each reference/implementation/highlight covers the exact source text
(not just a uri/count); the document outline's class has exactly 5 nested members
with the right kinds and a selectionRange covering the name; workspace search
returns exactly one result of kind class; call-hierarchy incoming/outgoing use
exact names (App\persist); type-hierarchy entries carry the expected fqn.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Code actions now assert kind + the actual edit: import inserts the use
statement (refactor.rewrite), optimize removes the unused-use line
(source.organizeImports), the typo fix replaces "nul" with "null" (quickfix).
Code lens resolves to the exact "2 usages" and carries the showReferences
locations. Rename edits each cover the old name; willRename inserts the new
class name.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
math3usmartins and others added 9 commits June 3, 2026 06:53
…ions

Signature label asserted exactly; hover requires the full pinned substring set
(specialized FQN, `T`, App\Box, Stringable). Inlay asserts exactly one hint and
its character position just after $first. Folding asserts the region kind.
Semantic tokens decode to (text,type) and assert a typeParameter token actually
covering "T".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Each diagnostic now asserts the exact source text its range underlines:
undefined-name -> "nul", bound violation -> "Box", ctor-arg-mismatch ->
"new User()", in addition to the code + message.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mentation

Completion now asserts the Plastic item's kind (class) and detail
(App\Models\Plastic) alongside its exact insertText; completionItem/resolve
asserts the documentation equals "A user account." exactly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…line

Negatives: go-to-definition of an undeclared class returns null; an interface
with no implementers yields 0 locations; a no-match workspace search is empty.
Convert the per-member outline assertions into a Scenario Outline over (kind,
member). Adds a shared `the response is null` step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A clean cursor position offers no code actions; renaming a non-symbol position
(a literal) returns a null WorkspaceEdit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tline

Negatives: hover over a literal is null; a file with no generic assignment
yields zero inlay hints; signature help outside a call is null. Convert the
active-parameter checks into a Scenario Outline over (cursor, param).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A well-formed file reports no diagnostics through the pull handler.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Convert prefix filtering into a Scenario Outline over (prefix, match, other) --
including a parameterized fixture. Negatives: a prefix matching no class
suggests none; resolving a class with no docblock adds no documentation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a behat-lsp job to ci-lsp.yml (alongside phpunit-lsp) that installs the
isolated tools/behat tooling and runs `make test/behat` on every PR and push to
main. The suite drives the real LSP dispatcher fully in-memory; @todo scenarios
are skipped via the gherkin tag filter, so the run is green.

Also pass -d memory_limit=-1 to the Behat command so the first scenario's
worse-reflection stub-map build (~512M, like the PHPUnit handler tests) doesn't
OOM on a cold CI cache.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@math3usmartins math3usmartins merged commit 6b5feed into main Jun 3, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant