declarator regex

Documentation for declarator regex, assembled from the following types:

language documentation Grammars

From Grammars

(Grammars) declarator regex

The main ingredient of grammars is named regexes. While the syntax of Perl 6 Regexes is outside the scope of this document, named regexes have a special syntax, similar to subroutine definitions: [1]

my regex number { \d+ [ \. \d+ ]? }

In this case, we have to specify that the regex is lexically scoped using the my keyword, because named regexes are normally used within grammars.

Being named gives us the advantage of being able to easily reuse the regex elsewhere:

say so "32.51" ~~ &number;                         # OUTPUT: «True␤» 
say so "15 + 4.5" ~~ /<number>\s* '+' \s*<number>/ # OUTPUT: «True␤» 

regex isn't the only declarator for named regexes. In fact, it's the least common. Most of the time, the token or rule declarators are used. These are both ratcheting, which means that the match engine won't back up and try again if it fails to match something. This will usually do what you want, but isn't appropriate for all cases:

my regex works-but-slow { .+ q }
my token fails-but-fast { .+ q }
my $s = 'Tokens won\'t backtrack, which makes them fail quicker!';
say so $s ~~ &works-but-slow# OUTPUT: «True␤» 
say so $s ~~ &fails-but-fast# OUTPUT: «False␤» 
                              # the entire string get taken by the .+ 

Note that non-backtracking works on terms, that is, as the example below, if you have matched something, then you will never backtrack. But when you fail to match, if there is another candidate introduced by | or ||, you will retry to match again.

my token tok-a { .* d  };
my token tok-b { .* d | bd };
say so "bd" ~~ &tok-a;        # OUTPUT: «False␤» 
say so "bd" ~~ &tok-b;        # OUTPUT: «True␤» 

The only difference between the token and rule declarators is that the rule declarator causes :sigspace to go into effect for the Regex:

my token non-space-y { 'once' 'upon' 'a' 'time' }
my rule space-y { 'once' 'upon' 'a' 'time' }
say so 'onceuponatime'    ~~ &non-space-y# OUTPUT: «True␤» 
say so 'once upon a time' ~~ &non-space-y# OUTPUT: «False␤» 
say so 'onceuponatime'    ~~ &space-y;     # OUTPUT: «False␤» 
say so 'once upon a time' ~~ &space-y;     # OUTPUT: «True␤» 

language documentation Regexes

From Regexes

(Regexes) declarator regex

Just like you can put pieces of code into subroutines, you can also put pieces of regex into named rules.

my regex line { \N*\n }
if "abc\ndef" ~~ /<line> def/ {
    say "First line: "$<line>.chomp;      # OUTPUT: «First line: abc␤» 

A named regex can be declared with my regex named-regex { body here }, and called with <named-regex>. At the same time, calling a named regex installs a named capture with the same name.

To give the capture a different name from the regex, use the syntax <capture-name=named-regex>. If no capture is desired, a leading dot will suppress it: <.named-regex>.

Here's more complete code for parsing ini files:

my regex header { \s* '[' (\w+']' \h* \n+ }
my regex identifier  { \w+ }
my regex kvpair { \s* <key=identifier> '=' <value=identifier> \n+ }
my regex section {
my $contents = q:to/EOI/; 
my %config;
if $contents ~~ /<section>*/ {
    for $<section>.list -> $section {
        my %section;
        for $section<kvpair>.list -> $p {
            %section{ $p<key> } = ~$p<value>;
        %config{ $section<header>[0} = %section;
say %config.perl;
# OUTPUT: «{:passwords(${:jack("password1"), :joy("muchmoresecure123")}), 
#           :quotas(${:jack("123"), :joy("42")})}» 

Named regexes can and should be grouped in grammars. A list of predefined subrules is listed in S05-regex of design documents.