track currently active features while scanning #318

mauke · 2025-11-12T07:48:30Z

... instead of repeatedly scanning backwards for all feature enabling/disabling statements every time we need to know the current feature set. In my tests, this massively speeds up parsing of big documents/source files.

Fixes #309.

Demonstration:

use v5.36;
use PPI::Document;

my $source = "use v5.36;\n";

for my $i (0 .. 399) {
    my $sub = "sub foo_$i(\$x, \$y) {\n";
    for my $j (0 .. 29) {
        $sub .= "    g(\$x->[\$y]" . join('', map ", $_", 1 .. $j) . ");\n";
    }
    $sub .= "}\n";
    $source .= "$sub\n";
}

say "parsing " . length($source) . " chars of code ...";
my $doc = PPI::Document->new(\$source);
say "done";

With PPI 1.284:

$ time perl try.pl
parsing 819901 chars of code ...
done

real    0m37.017s
user    0m36.700s
sys     0m0.310s

With this patch:

$ time perl -I"$HOME/Projects/PPI/lib" try.pl
parsing 819901 chars of code ...
done

real    0m7.387s
user    0m7.170s
sys     0m0.217s

... instead of repeatedly scanning backwards for all feature enabling/disabling statements every time we need to know the current feature set. In my tests, this massively speeds up parsing of big documents/source files. Fixes Perl-Critic#309.

wchristian · 2025-11-12T10:27:55Z

I'll go over this later, but a quick question for now: What happens when the user uses any of these editing methods? https://metacpan.org/pod/PPI::Element#insert_before-@Elements

mauke · 2025-11-12T16:28:40Z

@wchristian Honestly, I don't know. Presumably the same thing as before? But I can't tell what exactly these methods are supposed to do (or how they are supposed to interact with the tokenizer), anyway. I am reasonably sure that the documentation is wrong:

this method allows you to insert a single Element, and will perform some basic checking to prevent you inserting something that would be structurally wrong (in PDOM terms)

... because from looking at the code, you can insert multiple elements at once and there are no checks whatsoever.

So I guess my answer is:

All the tests are green. (If the tests are incomplete, write more tests or tell me what's missing and I'll try to get it working. 😃 )
If someone inserts a PPI::Structure::Signature element into a scope where signatures aren't enabled, the resulting code will break and they get to keep the pieces. ("You need to be very careful when modifying perl code, as it's easy to break things.") But I'm pretty sure that case didn't work before either because the whole structure would have to be re-tokenized as a single PPI::Token::Prototype or something, depending on the intended semantics of insert_before.

mauke · 2025-11-14T16:10:31Z

Perhaps a better answer: I hadn't thought much about structure-modifying methods like insert_before because this patch does not add any state to elements/nodes/tokens. (Even the presumed_features method is still present in its original form.)

The added state is strictly in PPI::Lexer and PPI::Tokenizer (and the latter is "passive state", fully driven by the actions of PPI::Lexer). The modifications to PPI::Statement::Compound and PPI::Statement::Include by this patch reduce the number of redundant method calls (and in particular, calls to presumed_features). There is no per-element "cache" that would need to be invalidated.

wchristian · 2025-11-17T11:15:51Z

Thanks @mauke, and first off, sorry for being slow in responding, i'm currently at the other end of completing a 800km apartment move.

Everything you said makes sense to me so far, and this particularly just being a speed-up of the initial parse and irrelevant after that's completed makes a lot of sense to me. I'll give it another read-through and release it on wednesday unless i find issues. :)

Tho i would like you to answer the question i asked here: https://github.com/Perl-Critic/PPI/pull/318/files/1787646cb412874944cc4e282e546231f5f5a206#diff-f0b7cd24f2b8ce5ab36bffa5ad498650827118f17d0f8369d5633fd5a11944b2R131 because in general i don't mind changes for readability, but prefer feature commit changes to be limited only to making that feature happen. Can you please look at that?

haarg · 2025-11-17T11:30:31Z

It isn't a change for readability. It eliminates unneeded calls to presumed_features. Without the change, all calls to type that aren't a for loop or label will call presumed_features, which isn't sped up by this PR.

wchristian · 2025-11-17T12:33:02Z

Aaaah, thanks. In that case i'll have to add a comment there. :)

mauke mentioned this pull request Nov 12, 2025

parsing slowdown with new features in 1.281 #309

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

track currently active features while scanning #318

track currently active features while scanning #318

Uh oh!

mauke commented Nov 12, 2025

Uh oh!

wchristian commented Nov 12, 2025

Uh oh!

mauke commented Nov 12, 2025

Uh oh!

mauke commented Nov 14, 2025 •

edited

Loading

Uh oh!

wchristian commented Nov 17, 2025

Uh oh!

haarg commented Nov 17, 2025

Uh oh!

wchristian commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

track currently active features while scanning #318

Are you sure you want to change the base?

track currently active features while scanning #318

Uh oh!

Conversation

mauke commented Nov 12, 2025

Uh oh!

wchristian commented Nov 12, 2025

Uh oh!

mauke commented Nov 12, 2025

Uh oh!

mauke commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wchristian commented Nov 17, 2025

Uh oh!

haarg commented Nov 17, 2025

Uh oh!

wchristian commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mauke commented Nov 14, 2025 •

edited

Loading