Experiment - do not pass state and input between steps#81
Experiment - do not pass state and input between steps#81sergei-shabanau wants to merge 5 commits intomasterfrom
Conversation
| class S(var read: Int, var res: Boolean) | ||
|
|
||
| ManyConditional( | ||
| new Mutable[E, Array[Char], Array[Char]] { |
There was a problem hiding this comment.
Try to use simple by storing a bit into the Int for the boolean flag and using the rest of the bits for read.
There was a problem hiding this comment.
Sure, thanks for the hint!
This specific instance is neglected because it has very limited effect on my perf test result.
| final def needsMoreInput(s: State): Boolean = !s.done | ||
|
|
||
| final def feed(s: State, input: Array[Char], i: Int): Int = { | ||
| val last = input.indexWhere(!p(_), i) |
There was a problem hiding this comment.
Be aware that methods of Array are, in some cases, radically slower than Chunk, because many Array extension methods get the default iterable implementation.
There was a problem hiding this comment.
Thanks!
I even tried to use code from Java 🙂
In fact, I'll try Chunk instead of Array now and see if makes any difference.
There was a problem hiding this comment.
Regarding Array methods I use here, Chunk has same implementations 👍
| final def parser[S, E, A](grammar: Grammar[S, S, E, A]): Input => E \/ A = | ||
| input => step(grammar, new ParserState(input, 0, Nil)) | ||
|
|
||
| private def step[S, E, A](grammar: Grammar[S, S, E, A], ps: ParserState[E]): E \/ A = |
There was a problem hiding this comment.
Step should throw E values inside an exception type that does NOT store stack traces, to eliminate the overhead of Either boxing on each step.
There was a problem hiding this comment.
@jdegoes
I did some experiments with this function.
- Throwing exception instead of returning
Either(basically as control flow) is about 2.5 times slower. - Returning
nullorAinstead ofEitheris about 5% faster - hard to measure accurately.
The biggest perf improvement I had with this function so far is by avoiding lambdas and curried function signatures, specifically having a function with one parameters list returning a value (as opposed to several parameter lists and returning a function).
There was a problem hiding this comment.
What type of exception did you throw? With a stack trace or without?
Here's what I would suggest: rewrite step to be a while loop. Then you don't need Either or exceptions. You can use a product encoding of the either: a nullable variable that could be an error; and another nullable variable that could be the success.
There was a problem hiding this comment.
@jdegoes
Exception without stack trace
private object ParserError extends Throwable with NoStackTraceI already tried returning null or value instead of using Either or exception, it is only about 5% difference, not significant.
I don't think while loop is possible since I need to analyze values produced by parts of alt and zip 🤔
But even if it is, what is the advantage? I am basically reimplementing Java call stack, but all the code is the same, based on megamorphic function - the feature removed in FastParse 2. BTW, my current design is similar to FastParse 1 and sacrificing it made ~4x improvements possible in FastParse 2 😉
There was a problem hiding this comment.
I have uploaded the latest code.
There was a problem hiding this comment.
We could try to eliminate the megamorphic functions. To do that we have to eliminate all lambdas and runtime functions such as those appearing in Codec.
Essentially we'd be building a language for constructing data (case class / sealed trait). The language would including filtering, comparisons, basic math, etc. Then we can compile the parser to anything, including bytecode on the JVM.
How fast do you want to go? 😄
| } | ||
| ret | ||
|
|
||
| case zip: Grammar.GADT.ZipL[S, S, E, ta, tb] => |
There was a problem hiding this comment.
Good that you specialized ZipL / ZipR 👍
| else Left(e) | ||
| } | ||
|
|
||
| case zip: Grammar.GADT.Zip[S, S, E, ta, tb] => |
There was a problem hiding this comment.
This should be replaced by ZipAll so we can eliminate intermediate allocations (nested tuples).
There was a problem hiding this comment.
For sure!
I tested it before, and the overhead in this specific perf test is minimal. Currently I'm trying improvements in isolation but eventually I will merge all the work together.
| case Right(_) => step(zip.right, ps) | ||
| } | ||
|
|
||
| case alt: Grammar.GADT.Alt[S, S, E, ta, tb] => |
There was a problem hiding this comment.
This should be replaced by AltAll so we can eliminate intermediate allocations (nested eithers).
| final def parser[S, E, A](grammar: Grammar[S, S, E, A]): Input => E \/ A = | ||
| input => step(grammar, new ParserState(input, 0, Nil)) | ||
|
|
||
| private def step[S, E, A](grammar: Grammar[S, S, E, A], ps: ParserState[E]): E \/ A = |
There was a problem hiding this comment.
Add @tailrec after the deletion of Either return value.
There was a problem hiding this comment.
I don't think it's possible due to alt and zip - I need to examine the returned values of several branches.
Interesting code here:
final def parser[S, E, A]function inParserModule2InOut.scalaThe rest is a copy-paste of existing code, please ignore it.
⚠️
Trying the following:
Array[Char]InputcreatedThe parser signature currently is (ignore the
Sparameter):I tested the code in the perf test for parsing JSON that I use as a base line to compare with FastParse.
I have an improvement over previous design: 1,000,000 runs in ~15 sec vs ~25 sec as before. This is not great as FastParse does it in ~4 sec 😐
What do I have extra compared to FastParse?
Eitherreturned at ever stepgrammarduring executionstepfunction callFrom what I found while working on this part, polymorphic functions did the the biggest contribution to lower performance.