-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Labels
performanceFeatures or improvements that positively affect seq-db performanceFeatures or improvements that positively affect seq-db performance
Description
In #348, we decided to use anchored regular expressions by default.
This change was necessary to enable several optimizations for regular expression matching.
Prefix and Suffix Literals
Before matching all tokens for a field from a search query, we can extract two literals from the regular expression: prefix and suffix literals. Using these two literals, we can apply two key optimizations:
- Reduce the number of token blocks to examine by performing a binary search over token blocks using the
(strings|bytes).HasPrefix()function (as we already do for literals); - For each token we match against, use
(strings|bytes).HasSuffix()before running the regular expression NFA.
Set Lookups
Sometimes we can transform a regular expression into a basic search query. For example, consider this query using the re filter: k8s_pod:re("pod-1|pod-2"). It is easy to see that we can rewrite it as k8s_pod:'pod-1' OR k8s_pod:'pod-2'.
I am pretty sure there are many more optimizations we can implement, so this is a topic worth researching further.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
performanceFeatures or improvements that positively affect seq-db performanceFeatures or improvements that positively affect seq-db performance