-
Notifications
You must be signed in to change notification settings - Fork 3
Description
I've begun to think in hard detail about how to describe the rule matching semantics in detail, and in that process I have come to believe that we should no longer treat the virtual output as an optional fallback buffer and instead have ONE concrete matching path for each "kind" of rule.
On the explicit-rule-chaining branch, there are two distinct kinds of rules: anchor rules and chained rules. An anchor rule is any rule that matches on the triecodes in the buffer alone, and a chained rule is a rule that matches on one or more triecodes but then matches on the match index of a previous match (that is, it chains off that rule that was previously matched). This distinction has always existed conceptually, but now the distinction is explicit in the matching algorithm, and we can decide matching behavior based on that.
In the original system, since the addition of the vout fallback buffer, all rules could match on either the input buffer (literal keypresses) or the vout fallback buffer (if enabled). Priority was given to the literal input buffer so that the fallback wouldn't interfere with chaining rules. At this time, I simply disabled the vout fallback, because mixing it with chained rules was full of traps that would break rules in unexpected ways. Never-the-less, it was very useful for users who didn't rely so heavily on chained rules and wanted to write rule for common suffixes that would work on words generated by other rules and words that didn't have rules.
The Problem
With the explicit-rule-chaining branch, it is no longer a concern that rules that work on the vout will interfere with chained rules, because any rule that would interfere will create a warning from the generator, and the interaction can be fixed in a straight-forward manner. However the reverse problem still exists. Anchor rules that are intended to match on the actual output can match in weird ways on the literal input triecodes that triggered chained rules. Here is somewhat realistic example rule-set:
_*ou -> _though
_*or -> _thorough
ou@ -> ough
gh@ -> ghly
And now, typing _*or@ gives _thoroughly and typing _*ou@ gives _thoughgh?! This is a very surprising interaction for users. The ou@ rule was preferred over the gh@ rule here, because ou@ matches on three keys of literal input and the gh@ matches on only two keys of literal input (because the u keypress produced both the g and h).
Proposal
Match overlapping non-chained rules only on the virtual output. Put more explicitly:
- Anchor rules match on the output and may overlap with the output of previously matched rules.
- Chained rules match on one or more literal keypresses following a match on the previous rule in the chain.
For the previous example, that means completely disallowing the ou@ -> ough rule to match after typing _*ou, because ou@ is an anchor rule, and anchor rules only match on the output. So, gh@ is the only match, and we get the desired _thoughly output. If the gh@ rule didn't exist, we still wouldn't match on ou@. We would instead get something like _thoughn (from my rule h@ -> hn). This isn't useful, but at least it isn't surprising!
If the user wants to recover the old behavior, they can simply make a proper chain rule for the instances where it is desired. I haven't actually found any instances required yet in my own ruleset, but it would be similar to how I needed to add s*@ -> sks to override the implicit s*@ -> sknow behavior when I enabled the vout buffer originally.
Explanation for Users
Here is a rough draft of how I would explain the rule matching semantics to users:
There are two kinds of rules: anchor rules and chained rules.
- A rule is an anchor rule if no other rule has a sequence that is a prefix of this rule's sequence.
- A rule is a chained rule if one or more rules have sequences that are prefixes of this rule's sequence. For a chained rule, the longest prefix rule is called the sub-rule.
Example: If we have the rules:
_* -> _the
_*t -> _that
_*ts -> _that's
_* -> _theis an anchor rule, because none of the other rules have sequences that are prefixes of_*_*t -> thatis a chained rule, because_*is a prefix of_*t, and_* -> _theis its sub-rule_*ts -> that'sis a chained rule, because both of the previous rules have sequences that are prefixes of_*ts, and_*t -> _thatis its sub-rule, because_*tis a longer prefix than_*
Matching Anchor Rules
The sequence of an anchor rule is matched to the output as it is when the last character in the sequence is pressed. This output can come from any combination of regular keypresses and the output of previously matched rules.
<insert examples>
Matching Chained Rules
The sequence of a chained rule has two parts:
- the prefix, which is exactly the sequence of the sub-rule
- the suffix, which is everything that comes after the prefix (but usually only one symbol)
To match, it is first verified that the last keys pressed match to the suffix. Then, it is verified that last keypress immediately before the suffix triggered the sub-rule.
<insert examples>