regex adverb :ratchet

Documentation for regex adverb :ratchet assembled from the following types:

regex adverb :ignorecase

From :ignorecase

(:ignorecase) regex adverb :ratchet

The :ratchet or :r adverb causes the regex engine to not backtrack (see backtracking).

Without this adverb, parts of a regex will try different ways to match a string in order to make it possible for other parts of the regex to match. For example, in 'abc' ~~ /\w+ ./, the \w+ first eats up the whole string, abc but then the . fails. Thus \w+ gives up a character, matching only ab, and the . can successfully match the string c. This process of giving up characters (or in the case of alternations, trying a different branch) is known as backtracking.

say so 'abc' ~~ / \w+ . /;        # OUTPUT: «True␤» 
say so 'abc' ~~ / :r \w+ . /;     # OUTPUT: «False␤» 

Ratcheting can be an optimization, because backtracking is costly. But more importantly, it closely corresponds to how humans parse a text. If you have a regex my regex identifier { \w+ } and my regex keyword { if | else | endif }, you intuitively expect the identifier to gobble up a whole word and not have it give up its end to the next rule, if the next rule otherwise fails.

For example, you don't expect the word motif to be parsed as the identifier mot followed by the keyword if. Instead, you expect motif to be parsed as one identifier; and if the parser expects an if afterwards, best that it should fail than have it parse the input in a way you don't expect.

Since ratcheting behavior is often desirable in parsers, there's a shortcut to declaring a ratcheting regex:

my token thing { .... }
# short for 
my regex thing { :r ... }