perf: Optimize RST parsing with pattern and instance caching#1288
Open
CybotTM wants to merge 2 commits intophpDocumentor:mainfrom
Open
perf: Optimize RST parsing with pattern and instance caching#1288CybotTM wants to merge 2 commits intophpDocumentor:mainfrom
CybotTM wants to merge 2 commits intophpDocumentor:mainfrom
Conversation
edb847a to
6d2e211
Compare
This was referenced Jan 22, 2026
bcc53c1 to
b642af7
Compare
Add caching optimizations for hot paths in RST parsing: - InlineParser: reuse single InlineLexer instance instead of creating new one per parse call (lexer state fully reset via setInput()) - InlineLexer: cache expensive hyperlink pattern built from SUPPORTED_SCHEMAS (5600+ chars) as static variable - LineChecker: add static caches for isDirective(), isLink(), and isAnnotation() regex results with proper cache key handling - Buffer: ensure unindented flag is reset in all mutators (set, pop, clear) for consistent cache invalidation - CachableInlineRule: simplify type annotations Note: Lexer reuse assumes single-threaded parsing. Concurrent parsing would require separate instances. See https://cybottm.github.io/render-guides/ for benchmark data.
Add SUPPORTED_SCHEMAS_LIST and isSupportedScheme() to ExternalReferenceResolver for O(1) hash set lookup instead of regex matching against 371 IANA schemes. This is ~6x faster than the 5600+ character regex pattern. InlineLexer now uses ExternalReferenceResolver::isSupportedScheme() to validate URI schemes during tokenization. Note: This change is also in PR phpDocumentor#1287 - when both PRs merge, the conflict is trivially resolved by keeping one version.
b642af7 to
6a14fda
Compare
This was referenced Jan 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Optimizes RST parsing with instance reuse and O(1) hash set lookups for hyperlink validation.
Changes
SUPPORTED_SCHEMAS_LISTandisSupportedScheme()for O(1) hash set lookupExternalReferenceResolver::isSupportedScheme()for URI scheme validation (~6x faster)Performance Impact
See Performance Analysis Report for detailed benchmarks.
The hash set optimization for URI schemes provides approximately 6x speedup compared to the previous regex-based approach for the 371 IANA-registered schemes.
Merge Note
Both this PR and #1287 add the same
isSupportedScheme()method toExternalReferenceResolver. When the second PR merges, the conflict is trivially resolved by keeping the existing code.Related PRs
All PRs can be merged independently in any order.