regex tilde

Documentation for regex tilde assembled from the following types:

language documentation Regexes

From Regexes

(Regexes) regex tilde

The ~ operator is a helper for matching nested subrules with a specific terminator as the goal. It is designed to be placed between an opening and closing bracket, like so:

/ '(' ~ ')' <expression> /

However, it mostly ignores the left argument, and operates on the next two atoms (which may be quantified). Its operation on those next two atoms is to "twiddle" them so that they are actually matched in reverse order. Hence the expression above, at first blush, is merely shorthand for:

/ '(' <expression> ')' /

But beyond that, when it rewrites the atoms it also inserts the apparatus that will set up the inner expression to recognize the terminator, and to produce an appropriate error message if the inner expression does not terminate on the required closing atom. So it really does pay attention to the left bracket as well, and it actually rewrites our example to something more like:

$<OPEN> = '(' <SETGOAL: ')'> <expression> [ $GOAL || <FAILGOAL> ]

FAILGOAL is a special method that can be defined by the user and it will be called on parse failure:

grammar A { token TOP { '[' ~ ']' \w+  };
            method FAILGOAL($goal{
                die "Cannot find $goal near position {self.pos}"
            }
}
 
say A.parse: '[good]';  # OUTPUT: «「[good]」␤» 
A.parse: '[bad';        # will throw FAILGOAL exception 
CATCH { default { put .^name''.Str } };
# OUTPUT: «X::AdHoc: Cannot find ']'  near position 4␤» 

Note that you can use this construct to set up expectations for a closing construct even when there's no opening bracket:

"3)"  ~~ / <?> ~ ')' \d+ /;  # RESULT: «「3)」» 
"(3)" ~~ / <?> ~ ')' \d+ /;  # RESULT: «「3)」» 

Here <?> successfully matches the null string.

The order of the regex capture is original:

"abc" ~~ /a ~ (c) (b)/;
say $0# OUTPUT: «「c」␤» 
say $1# OUTPUT: «「b」␤»