regex Regex Interpolation

Documentation for regex Regex Interpolation assembled from the following types:

language documentation Regexes

From Regexes

(Regexes) regex Regex Interpolation

If you want to build a regex using a pattern given at runtime, regex interpolation is what you are looking for.

There are four ways you can interpolate a string into regex as a pattern. That is using $pattern, $($pattern), <$pattern> or <{$pattern.method}>.

If the variable to be interpolated is statically typed as a Str or str (like $pattern0 and $pattern2 are below) and only interpolated literally (like the first example below), than the compiler can optimize that and it runs much faster.

my Str $text = 'camelia';
my Str $pattern0 = 'camelia';
my     $pattern1 = 'ailemac';
my str $pattern2 = '\w+';
 
say $text ~~ / $pattern0 /;                # OUTPUT: «「camelia」␤» 
say $text ~~ / $($pattern0/;             # OUTPUT: «「camelia」␤» 
say $text ~~ / $($pattern1.flip) /;        # OUTPUT: «「camelia」␤» 
say 'ailemacxflip' ~~ / $pattern1.flip /;  # OUTPUT: «「ailemacxflip」␤» 
say '\w+' ~~ / $pattern2 /;                # OUTPUT: «「\w+」␤» 
say '\w+' ~~ / $($pattern2/;             # OUTPUT: «「\w+」␤» 
 
say $text ~~ / <{$pattern1.flip}> /;       # OUTPUT: «「camelia」␤» 
# say $text ~~ / <$pattern1.flip> /;       # !!Compile Error!! 
say $text ~~ / <$pattern2> /;              # OUTPUT: «「camelia」␤» 
say $text ~~ / <{$pattern2}> /;            # OUTPUT: «「camelia」␤» 

Note that the first two syntax interpolate the string lexically, while <$pattern> and <{$pattern.method}> causes implicit EVAL, which is a known trap.

When an array variable is interpolated into a regex, the regex engine handles it like a | alternative of the regex elements (see the documentation on embedded lists, above). The interpolation rules for individual elements are the same as for scalars, so strings and numbers match literally, and /type/Regex objects match as regexes. Just as with ordinary | interpolation, the longest match succeeds:

my @a = '2'23, rx/a.+/;
say ('b235' ~~ /  b @a /).Str;      # OUTPUT: «b23» 

The use of hash variables in regexes is preserved.

Regex Boolean condition check

The special operator <?{}> allows the evaluation of a boolean expression that can perform a semantic evaluation of the match before the regular expression continues. In other words, it is possible to check in a boolean context a part of a regular expression and therefore invalidate the whole match (or allow it to continue) even if the match succeeds from a syntactic point of view.

In particular the <?{}> operator requires a True value in order to allow the regular expression to match, while its negated form <!{}> requires a False value.

In order to demonstrate the above operator, please consider the following example that involves a simple IPv4 address matching:

my $localhost = '127.0.0.1';
my regex ipv4-octet { \d ** 1..<?{ True }> }
$localhost ~~ / ^ <ipv4-octet> ** 4 % "." $ /;
say $/<ipv4-octet>;   # OUTPUT: [「127」 「0」 「0」 「1」] 

The octet regular expression matches against a number made by one up to three digits. Each match is driven by the result of the <?{}>, that being the fixed value of True means that the regular expression match has to be always considered as good. As a counter-example, using the special constant value False will invalidate the match even if the regular expression matches from a syntactic point of view:

my $localhost = '127.0.0.1';
my regex ipv4-octet { \d ** 1..<?{ False }> }
$localhost ~~ / ^ <ipv4-octet> ** 4 % "." $ /;
say $/<ipv4-octet>;   # OUTPUT: Nil 

From the above examples, it should be clear that it is possible to improve the semantic check, for instance ensuring that each octet is really a valid IPv4 octet:

my $localhost = '127.0.0.1';
my regex ipv4-octet { \d ** 1..<?{ $/.Int <= 255 && $/.Int >= 0 }> }
$localhost ~~ / ^ <ipv4-octet> ** 4 % "." $ /;
say $/<ipv4-octet>;   # OUTPUT: [「127」 「0」 「0」 「1」] 

Please note that it is not required to evaluate the regular expression in-line, but also a regular method can be called to get the boolean value:

my $localhost = '127.0.0.1';
sub check-octet ( Int $o ){ $o <= 255 && $o >= 0 }
my regex ipv4-octet { \d ** 1..<?{ &check-octet$/.Int ) }> }
$localhost ~~ / ^ <ipv4-octet> ** 4 % "." $ /;
say $/<ipv4-octet>;   # OUTPUT: [「127」 「0」 「0」 「1」] 

Of course, being <!{}> the negation form of <?{}> the same boolean evaluation can be rewritten in a negated form:

my $localhost = '127.0.0.1';
sub invalid-octetInt $o ){ $o < 0 || $o > 255 }
my regex ipv4-octet { \d ** 1..<!{ &invalid-octet$/.Int ) }> }
$localhost ~~ / ^ <ipv4-octet> ** 4 % "." $ /;
say $/<ipv4-octet>;   # OUTPUT: [「127」 「0」 「0」 「1」]