Skip to content

[DISCUSSION] Standardizing Rule Matching Semantics #89

@Ikcelaks

Description

@Ikcelaks

I've begun to think in hard detail about how to describe the rule matching semantics in detail, and in that process I have come to believe that we should no longer treat the virtual output as an optional fallback buffer and instead have ONE concrete matching path for each "kind" of rule.

On the explicit-rule-chaining branch, there are two distinct kinds of rules: anchor rules and chained rules. An anchor rule is any rule that matches on the triecodes in the buffer alone, and a chained rule is a rule that matches on one or more triecodes but then matches on the match index of a previous match (that is, it chains off that rule that was previously matched). This distinction has always existed conceptually, but now the distinction is explicit in the matching algorithm, and we can decide matching behavior based on that.

In the original system, since the addition of the vout fallback buffer, all rules could match on either the input buffer (literal keypresses) or the vout fallback buffer (if enabled). Priority was given to the literal input buffer so that the fallback wouldn't interfere with chaining rules. At this time, I simply disabled the vout fallback, because mixing it with chained rules was full of traps that would break rules in unexpected ways. Never-the-less, it was very useful for users who didn't rely so heavily on chained rules and wanted to write rule for common suffixes that would work on words generated by other rules and words that didn't have rules.

The Problem

With the explicit-rule-chaining branch, it is no longer a concern that rules that work on the vout will interfere with chained rules, because any rule that would interfere will create a warning from the generator, and the interaction can be fixed in a straight-forward manner. However the reverse problem still exists. Anchor rules that are intended to match on the actual output can match in weird ways on the literal input triecodes that triggered chained rules. Here is somewhat realistic example rule-set:

_*ou -> _though
_*or -> _thorough
ou@ -> ough
gh@ -> ghly

And now, typing _*or@ gives _thoroughly and typing _*ou@ gives _thoughgh?! This is a very surprising interaction for users. The ou@ rule was preferred over the gh@ rule here, because ou@ matches on three keys of literal input and the gh@ matches on only two keys of literal input (because the u keypress produced both the g and h).

Proposal

Match overlapping non-chained rules only on the virtual output. Put more explicitly:

  • Anchor rules match on the output and may overlap with the output of previously matched rules.
  • Chained rules match on one or more literal keypresses following a match on the previous rule in the chain.

For the previous example, that means completely disallowing the ou@ -> ough rule to match after typing _*ou, because ou@ is an anchor rule, and anchor rules only match on the output. So, gh@ is the only match, and we get the desired _thoughly output. If the gh@ rule didn't exist, we still wouldn't match on ou@. We would instead get something like _thoughn (from my rule h@ -> hn). This isn't useful, but at least it isn't surprising!

If the user wants to recover the old behavior, they can simply make a proper chain rule for the instances where it is desired. I haven't actually found any instances required yet in my own ruleset, but it would be similar to how I needed to add s*@ -> sks to override the implicit s*@ -> sknow behavior when I enabled the vout buffer originally.

Explanation for Users

Here is a rough draft of how I would explain the rule matching semantics to users:

There are two kinds of rules: anchor rules and chained rules.

  • A rule is an anchor rule if no other rule has a sequence that is a prefix of this rule's sequence.
  • A rule is a chained rule if one or more rules have sequences that are prefixes of this rule's sequence. For a chained rule, the longest prefix rule is called the sub-rule.

Example: If we have the rules:

_* -> _the
_*t -> _that
_*ts -> _that's
  • _* -> _the is an anchor rule, because none of the other rules have sequences that are prefixes of _*
  • _*t -> that is a chained rule, because _* is a prefix of _*t, and _* -> _the is its sub-rule
  • _*ts -> that's is a chained rule, because both of the previous rules have sequences that are prefixes of _*ts, and _*t -> _that is its sub-rule, because _*t is a longer prefix than _*

Matching Anchor Rules

The sequence of an anchor rule is matched to the output as it is when the last character in the sequence is pressed. This output can come from any combination of regular keypresses and the output of previously matched rules.
<insert examples>

Matching Chained Rules

The sequence of a chained rule has two parts:

  • the prefix, which is exactly the sequence of the sub-rule
  • the suffix, which is everything that comes after the prefix (but usually only one symbol)

To match, it is first verified that the last keys pressed match to the suffix. Then, it is verified that last keypress immediately before the suffix triggered the sub-rule.
<insert examples>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions