Skip to content

Conversation

@mauke
Copy link
Contributor

@mauke mauke commented Nov 12, 2025

... instead of repeatedly scanning backwards for all feature enabling/disabling statements every time we need to know the current feature set. In my tests, this massively speeds up parsing of big documents/source files.

Fixes #309.

Demonstration:

use v5.36;
use PPI::Document;

my $source = "use v5.36;\n";

for my $i (0 .. 399) {
    my $sub = "sub foo_$i(\$x, \$y) {\n";
    for my $j (0 .. 29) {
        $sub .= "    g(\$x->[\$y]" . join('', map ", $_", 1 .. $j) . ");\n";
    }
    $sub .= "}\n";
    $source .= "$sub\n";
}

say "parsing " . length($source) . " chars of code ...";
my $doc = PPI::Document->new(\$source);
say "done";

With PPI 1.284:

$ time perl try.pl
parsing 819901 chars of code ...
done

real    0m37.017s
user    0m36.700s
sys     0m0.310s

With this patch:

$ time perl -I"$HOME/Projects/PPI/lib" try.pl
parsing 819901 chars of code ...
done

real    0m7.387s
user    0m7.170s
sys     0m0.217s

... instead of repeatedly scanning backwards for all feature
enabling/disabling statements every time we need to know the current
feature set. In my tests, this massively speeds up parsing of big
documents/source files.

Fixes Perl-Critic#309.
@wchristian
Copy link
Member

I'll go over this later, but a quick question for now: What happens when the user uses any of these editing methods? https://metacpan.org/pod/PPI::Element#insert_before-@Elements

@mauke
Copy link
Contributor Author

mauke commented Nov 12, 2025

@wchristian Honestly, I don't know. Presumably the same thing as before? But I can't tell what exactly these methods are supposed to do (or how they are supposed to interact with the tokenizer), anyway. I am reasonably sure that the documentation is wrong:

this method allows you to insert a single Element, and will perform some basic checking to prevent you inserting something that would be structurally wrong (in PDOM terms)

... because from looking at the code, you can insert multiple elements at once and there are no checks whatsoever.

So I guess my answer is:

  • All the tests are green. (If the tests are incomplete, write more tests or tell me what's missing and I'll try to get it working. 😃 )
  • If someone inserts a PPI::Structure::Signature element into a scope where signatures aren't enabled, the resulting code will break and they get to keep the pieces. ("You need to be very careful when modifying perl code, as it's easy to break things.") But I'm pretty sure that case didn't work before either because the whole structure would have to be re-tokenized as a single PPI::Token::Prototype or something, depending on the intended semantics of insert_before.

@mauke
Copy link
Contributor Author

mauke commented Nov 14, 2025

Perhaps a better answer: I hadn't thought much about structure-modifying methods like insert_before because this patch does not add any state to elements/nodes/tokens. (Even the presumed_features method is still present in its original form.)

The added state is strictly in PPI::Lexer and PPI::Tokenizer (and the latter is "passive state", fully driven by the actions of PPI::Lexer). The modifications to PPI::Statement::Compound and PPI::Statement::Include by this patch reduce the number of redundant method calls (and in particular, calls to presumed_features). There is no per-element "cache" that would need to be invalidated.

@wchristian
Copy link
Member

Thanks @mauke, and first off, sorry for being slow in responding, i'm currently at the other end of completing a 800km apartment move.

Everything you said makes sense to me so far, and this particularly just being a speed-up of the initial parse and irrelevant after that's completed makes a lot of sense to me. I'll give it another read-through and release it on wednesday unless i find issues. :)

Tho i would like you to answer the question i asked here: https://github.com/Perl-Critic/PPI/pull/318/files/1787646cb412874944cc4e282e546231f5f5a206#diff-f0b7cd24f2b8ce5ab36bffa5ad498650827118f17d0f8369d5633fd5a11944b2R131 because in general i don't mind changes for readability, but prefer feature commit changes to be limited only to making that feature happen. Can you please look at that?

@haarg
Copy link
Collaborator

haarg commented Nov 17, 2025

It isn't a change for readability. It eliminates unneeded calls to presumed_features. Without the change, all calls to type that aren't a for loop or label will call presumed_features, which isn't sped up by this PR.

@wchristian
Copy link
Member

Aaaah, thanks. In that case i'll have to add a comment there. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

parsing slowdown with new features in 1.281

3 participants