feat: compute iso-Spectral source ranges for YAML ruleset diagnostics (#32)#33
feat: compute iso-Spectral source ranges for YAML ruleset diagnostics (#32)#33eskenazit wants to merge 3 commits into
Conversation
Document the intrinsic ambiguity of paths that traverse a key containing a
dot: a flat "x-meta.owner" key is indistinguishable from a nested
x-meta: { owner }, both keyed as $.x-meta.owner. The note is added to
JacksonSourceMap and to JsonPathEvaluator.toDotNotation.
Reword the existing dotted-key test to state honestly that it proves the
round-trip coincides for an isolated dotted key, not that the path is
unambiguous. Add a collision test pinning the putIfAbsent "first token wins"
behaviour when a flat dotted key and a nested structure both map to
$.x-meta.owner.
Add an inline comment documenting the putIfAbsent invariant, and document
that RuleExecutor.effectivePath's field argument must be a simple identifier
(not a JSONPath or bracket expression).
Refs #32
Review feedback addressed — commit 504f3f6Thanks for the review. All three findings are now addressed in 🔴 Finding 1 — dotted-key ambiguity (
🟡 Finding 2 — 🔵 Finding 3 — Validation: This PR remains a first step toward #32 (built-in functions only); the polyglot |
Document the intrinsic ambiguity of paths that traverse a key containing a
dot: a flat "x-meta.owner" key is indistinguishable from a nested
x-meta: { owner }, both keyed as $.x-meta.owner. The note is added to
JacksonSourceMap and to JsonPathEvaluator.toDotNotation.
Reword the existing dotted-key test to state honestly that it proves the
round-trip coincides for an isolated dotted key, not that the path is
unambiguous. Add a collision test pinning the putIfAbsent "first token wins"
behaviour when a flat dotted key and a nested structure both map to
$.x-meta.owner.
Add an inline comment documenting the putIfAbsent invariant, and document
that RuleExecutor.effectivePath's field argument must be a simple identifier
(not a JSONPath or bracket expression).
Refs #32
504f3f6 to
9799e8f
Compare
…st (#32) Review feedback (PR #33, @eskenazit): forContentShouldLocateScalarInside ArrayObjectOnYaml only asserted startLine for $.consumes[0].baseUri, while every other golden scalar (unquoted, quoted, escaped, block) asserts both startColumn and endColumn. A regression in column computation for array-nested scalars would have gone undetected. Add startColumn (13, the opening quote), endColumn (35, exclusive past the closing quote), endLine, and an endColumn > startColumn guard so the array-nested case carries the same column coverage as the rest of the golden fixture. Values measured empirically against JacksonSourceMap, iso-Spectral. No production change.
…agnostics (#32) Compute 0-based, end-exclusive source ranges (iso-Spectral) for built-in ruleset diagnostics on YAML capabilities. JacksonSourceMap spans each scalar's full token (plain, quoted, escaped, block, array-nested), and the range is propagated through RulesetValidator to every Diagnostic. Non-YAML formats keep SourceMap.NONE pending #34-#37. Refs #32
…nges (#32) Wire polyglot (JS) ruleset functions into production validation and resolve a source range per violation. The SPI carries a Violation record (message + path), FunctionRegistry resolves functions per ruleset, and PolyglotRuleFunction extracts the JsonPath of each violation so the diagnostic carries an iso-Spectral range. Includes golden and path-extraction tests. Refs #32
d2bf272 to
a45106b
Compare
|
@jlouvel this is now OK for review ! FYI this issue is blocking crafter using Polychro |
jlouvel
left a comment
There was a problem hiding this comment.
Thorough, high-quality PR for issue #32. The iso-Spectral golden fixtures with documented Spectral provenance, the exhaustive YAML end-scan edge-case coverage, the graceful degradation (unparseable content -> SourceMap.NONE, unknown path -> null range), and the backward-compatible FunctionProvider SPI evolution (deprecated functions() with a default delegating to the new context-aware overload) are all excellent.
No blocking issues. The 5 notes below are polish / hardening suggestions only:
- RuleExecutor: evaluatePaths runs a second JSONPath traversal alongside evaluate, doubling cost per rule.
- JacksonSourceMap: content.split is re-run per YAML scalar, risking O(n^2) on large documents.
- JacksonSourceMap: block-scalar end +1 is intentional but easy to misread at the call site.
- RuleExecutor: the synthetic Document Javadoc claims the null format is never read; worth hardening the note.
- PolyglotRuleFunction: the retained extractMessages shim may be unused now that the executor only calls evaluateViolations.
| for (String given : rule.given()) { | ||
| List<JsonNode> matches = evaluator.evaluate(root, given); | ||
| for (JsonNode match : matches) { | ||
| List<String> matchPaths = evaluator.evaluatePaths(root, given); |
There was a problem hiding this comment.
For every given selector, evaluatePaths(root, given) is called immediately after evaluate(root, given), performing a second full JSONPath traversal per rule. On large documents this roughly doubles the JSONPath cost. The correctness relies on Jayway returning the value-list and the AS_PATH_LIST path-list in identical order and cardinality (which the pathAt fallback hints is not guaranteed). Consider deriving both from a single traversal, or add a comment confirming the order/cardinality contract. Not blocking.
| * @return a two-element array {@code [endLine, endColumn]} (both 0-based, end exclusive) | ||
| */ | ||
| static int[] endOfYamlScalar(String content, int startLine, int startColumn) { | ||
| String[] lines = content.split("\n", -1); |
There was a problem hiding this comment.
endOfYamlScalar calls content.split("\n", -1) and is invoked once per YAML scalar via toRange. For a document with many scalars this re-splits the entire content on every call, giving O(n^2) behaviour on large files. Consider splitting the content into lines once in index() and threading the line array down to the end-scan helpers. Minor / perf only.
| trimmed--; | ||
| } | ||
| lastLine = i; | ||
| // include the trailing newline folded into the block scalar (iso-Spectral). |
There was a problem hiding this comment.
lastCol = trimmed + 1 to include the folded trailing newline is correct and iso-Spectral, but the + 1 differs from the plain mono-line scalar end (no +1) and produces an end column one past the physical line length. The class Javadoc explains the intent; a one-line comment right here reaffirming the +1 is the deliberate folded-newline inclusion would make the call site harder to misread. Nit.
| * {@link Document}; production validation goes through {@link #execute(Rule, Document)}. | ||
| * | ||
| * <p>The synthetic {@link Document} carries {@link SourceMap#NONE} precisely so range resolution | ||
| * is skipped: every lookup returns a {@code null} range and the {@code null} format is never |
There was a problem hiding this comment.
The Javadoc states the synthetic Document(root, null, null, SourceMap.NONE, null) is safe because the null format is never read. That holds today only because SourceMap.NONE.resolve(...) short-circuits before any format-dependent logic runs. A brief note that this invariant depends on SourceMap.NONE never triggering format resolution would harden it against a future change to the NONE path. Nit.
|
|
||
| /** | ||
| * Convert a script's result array into the {@code message} strings only, dropping any per-result | ||
| * {@code path}. Retained as a thin shim over {@link #extractViolations(Value)} for callers (and |
There was a problem hiding this comment.
Now that evaluate() delegates to evaluateViolations() and RuleExecutor only calls evaluateViolations, the retained extractMessages(Value) shim appears to have no production caller left. If it is only used by tests, a short note to that effect (or removing it) would avoid the impression that messages and violations are extracted on two separate passes. Confirm no path executes the sandboxed JS twice. Nit.
jlouvel
left a comment
There was a problem hiding this comment.
Add minor comments, not blocking so approving
Related Issue
Closes #32
What does this PR do?
Implements #32 — compute source ranges for ruleset diagnostics (path → SourceRange) — in
full for YAML documents, across both built-in and polyglot (JS/Python/Groovy) functions.
Until now every ruleset diagnostic carried
range = nulland apathequal to the rule'sgivenselector (e.g.$.consumes[*].baseUri), so consumers (CLI, SARIF, SDKs, the Craftereditor extension) could not map a violation back to a precise location.
Built-in functions — real
SourceMap+ concrete pathsSourceMapfor structured documents —JacksonSourceMapbuilds a dot-notationpath → SourceRangeindex by streaming the YAML/JSON content with Jackson and recordingeach token's
JsonLocation.Document.fromString(...)attaches it for structureddocuments (non-structured branch explicitly returns
SourceMap.NONE).end, matching Spectral and the LSP /
vscode.Positionconvention the Crafter extensionconsumes. A quoted scalar starts at the opening quote and ends just past the closing quote.
scalar (plain, single/double-quoted, block
|/>), so a violation underlines the wholevalue. Quoting/escaping (
\",\\,'') and trailing# comment/ whitespace are honoured.JsonPathEvaluatorenumerates the concrete matched path per violation(e.g.
$.consumes[0].baseUri) via JsonPathAS_PATH_LIST, converted to dot notation.RuleExecutor/RulesetValidator— the resolved path looks up theSourceRangein the document'sSourceMapand is used as the diagnosticpath. Anunresolvable path falls back to
range = null, exactly as before — no regression.Polyglot functions — per-violation path carried through the SPI (Layer 1)
RuleFunctionSPI evolved toList<Violation>— a newViolation(message, path)recordand a default
evaluateViolations(...)let custom functions report a per-violation relativepath. The 12 built-ins inherit the default (path
null), so they are unchanged.FunctionRegistrywires the ruleset context — custom providers are discovered viaServiceLoaderand receive the ruleset'sfunctionsDir/functions, so polyglot functionsnow actually execute in production (they were silently ignored before).
PolyglotRuleFunctionreadsmessage+pathfrom each returned item; the path iscombined with the rule base path and resolved against the
SourceMap, so polyglot-functionviolations now carry a real
SourceRangeinstead ofnull.Scope — what is NOT in this PR
NONErange on purpose, tracked under the [epic] feat: compute source ranges for all document formats #38 epic:JSON (feat: compute source ranges for JSON ruleset diagnostics #34), XML (feat: compute source ranges for XML ruleset diagnostics #35), Markdown (feat: compute source ranges for Markdown ruleset diagnostics #36), HTML (feat: compute source ranges for HTML ruleset diagnostics #37). YAML is the only format the Crafter
migration needs first, so it is the only one with full start+end spans here.
Tests
JacksonSourceMapTest) — resolves a known path to the expected 0-based,end-exclusive range for plain, quoted, escaped and block scalars; unknown paths and empty
content return
null. Targets are frozen against a Spectral provenance fixture.JacksonSourceMapEndScanTest) — branch-level coverage of the YAMLend-of-scalar scanners, including bounds guards and a terminal
#(empty trailingcomment, with one/two leading spaces and glued to the value) so the comment-marker scan
reaches the last character, iso-Spectral.
ViolationTest,FunctionRegistryTest,RuleExecutorTest(
combinePath+ a custom-function violation resolving to a concrete path and range),RulesetValidatorTest,PolyglotPathExtractionTest(extractPath/extractViolationsedge cases), and
PolyglotFunctionProviderTest(context-aware loading).PolyglotRulesetEndToEndTestdrives a custom JS rule through the publicvalidator API and asserts the diagnostic carries the exact path
$.consumes[1].namespaceand the precise
SourceRange(3,15,3,18);CliIntegrationTestcheckslint … --format jsonemits a populated
rangeand a concretepath.Full reactor
mvn clean verifyis green; JaCoCo 100% line + branch coverage holds for everytouched module.
Checklist
mainAgent Context (optional)
agent_name: GitHub Copilot
llm: Naftiko Claude Opus 4.8
tool: VS Code
confidence: high
source_event: Crafter VS Code extension migration from Spectral to Polychro (YAML scope of #32)
discovery_method: code_review
files_suspected: