regex Regex Interpolation

Documentation for regex Regex Interpolation assembled from the following types:

language documentation Regexes

From Regexes

(Regexes) regex Regex Interpolation

Instead of using a literal pattern for a regex match you can use a variable that holds that pattern.

This variable can then be 'interpolated' which means it is used as though it is the pattern that it holds.

my Str $pattern = 'camelia';
say 'camelia' ~~ / $pattern /;             # OUTPUT: «「camelia」␤» 

If the variable to be interpolated is statically typed as a Str or str and only interpolated literally, then the compiler can optimize it and it runs much faster (like $pattern above ).

Sometimes you may want to match a generated string in a regex. This can be done in the following way:

my Str $pattern = 'ailemac';
say 'camelia' ~~ / $($pattern.flip) /;     # OUTPUT: «「camelia」␤» 

There are four ways you can interpolate a string into regex as a pattern. That is using $pattern, $($pattern), <$pattern> or <{$pattern.method}>.

Note that the two examples above interpolate the string lexically, while <$pattern> and <{$pattern.method}> causes implicit EVAL, which is a known trap.

Here are examples showing all four ways:

my Str $text = 'camelia';
my Str $pattern0 = 'camelia';
my     $pattern1 = 'ailemac';
my str $pattern2 = '\w+';
 
say $text ~~ / $pattern0 /;                # OUTPUT: «「camelia」␤» 
say $text ~~ / $($pattern0/;             # OUTPUT: «「camelia」␤» 
say $text ~~ / $($pattern1.flip) /;        # OUTPUT: «「camelia」␤» 
say 'ailemacxflip' ~~ / $pattern1.flip /;  # OUTPUT: «「ailemacxflip」␤» 
say '\w+' ~~ / $pattern2 /;                # OUTPUT: «「\w+」␤» 
say '\w+' ~~ / $($pattern2/;             # OUTPUT: «「\w+」␤» 
 
say $text ~~ / <{$pattern1.flip}> /;       # OUTPUT: «「camelia」␤» 
# say $text ~~ / <$pattern1.flip> /;       # !!Compile Error!! 
say $text ~~ / <$pattern2> /;              # OUTPUT: «「camelia」␤» 
say $text ~~ / <{$pattern2}> /;            # OUTPUT: «「camelia」␤» 

When an array variable is interpolated into a regex, the regex engine handles it like a | alternative of the regex elements (see the documentation on embedded lists, above). The interpolation rules for individual elements are the same as for scalars, so strings and numbers match literally, and /type/Regex objects match as regexes. Just as with ordinary | interpolation, the longest match succeeds:

my @a = '2'23, rx/a.+/;
say ('b235' ~~ /  b @a /).Str;      # OUTPUT: «b23» 

The use of hash variables in regexes is preserved.

Regex boolean condition check

The special operator <?{}> allows the evaluation of a boolean expression that can perform a semantic evaluation of the match before the regular expression continues. In other words, it is possible to check in a boolean context a part of a regular expression and therefore invalidate the whole match (or allow it to continue) even if the match succeeds from a syntactic point of view.

In particular the <?{}> operator requires a True value in order to allow the regular expression to match, while its negated form <!{}> requires a False value.

In order to demonstrate the above operator, please consider the following example that involves a simple IPv4 address matching:

my $localhost = '127.0.0.1';
my regex ipv4-octet { \d ** 1..<?{ True }> }
$localhost ~~ / ^ <ipv4-octet> ** 4 % "." $ /;
say $/<ipv4-octet>;   # OUTPUT: [「127」 「0」 「0」 「1」] 

The octet regular expression matches against a number made by one up to three digits. Each match is driven by the result of the <?{}>, that being the fixed value of True means that the regular expression match has to be always considered as good. As a counter-example, using the special constant value False will invalidate the match even if the regular expression matches from a syntactic point of view:

my $localhost = '127.0.0.1';
my regex ipv4-octet { \d ** 1..<?{ False }> }
$localhost ~~ / ^ <ipv4-octet> ** 4 % "." $ /;
say $/<ipv4-octet>;   # OUTPUT: Nil 

From the above examples, it should be clear that it is possible to improve the semantic check, for instance ensuring that each octet is really a valid IPv4 octet:

my $localhost = '127.0.0.1';
my regex ipv4-octet { \d ** 1..<?{ $/.Int <= 255 && $/.Int >= 0 }> }
$localhost ~~ / ^ <ipv4-octet> ** 4 % "." $ /;
say $/<ipv4-octet>;   # OUTPUT: [「127」 「0」 「0」 「1」] 

Please note that it is not required to evaluate the regular expression in-line, but also a regular method can be called to get the boolean value:

my $localhost = '127.0.0.1';
sub check-octet ( Int $o ){ $o <= 255 && $o >= 0 }
my regex ipv4-octet { \d ** 1..<?{ &check-octet$/.Int ) }> }
$localhost ~~ / ^ <ipv4-octet> ** 4 % "." $ /;
say $/<ipv4-octet>;   # OUTPUT: [「127」 「0」 「0」 「1」] 

Of course, being <!{}> the negation form of <?{}> the same boolean evaluation can be rewritten in a negated form:

my $localhost = '127.0.0.1';
sub invalid-octetInt $o ){ $o < 0 || $o > 255 }
my regex ipv4-octet { \d ** 1..<!{ &invalid-octet$/.Int ) }> }
$localhost ~~ / ^ <ipv4-octet> ** 4 % "." $ /;
say $/<ipv4-octet>;   # OUTPUT: [「127」 「0」 「0」 「1」]