regex adverb :sigspace

Documentation for regex adverb :sigspace assembled from the following types:

regex adverb :ignorecase

From :ignorecase

(:ignorecase) regex adverb :sigspace

The :sigspace or :s adverb makes whitespace significant in a regex.

say so "I used Photoshop®"   ~~ m:i/   photo shop /;      # OUTPUT: «True␤»
say so "I used a photo shop" ~~ m:i:s/ photo shop /;   # OUTPUT: «True␤»
say so "I used Photoshop®"   ~~ m:i:s/ photo shop /;   # OUTPUT: «False␤»

m:s/ photo shop / acts the same as m/ photo <.ws> shop <.ws> /. By default, <.ws> makes sure that words are separated, so a b and ^& will match <.ws> in the middle, but ab won't.

Where whitespace in a regex turns into <.ws> depends on what comes before the whitespace. In the above example, whitespace in the beginning of a regex doesn't turn into <.ws>, but whitespace after characters does. In general, the rule is that if a term might match something, whitespace after it will turn into <.ws>.

In addition, if whitespace comes after a term but before a quantifier (+, *, or ?), <.ws> will be matched after every match of the term. So, foo + becomes [ foo <.ws> ]+. On the other hand, whitespace after a quantifier acts as normal significant whitespace; e.g., "foo+ " becomes foo+ <.ws>.

In all, this code:

rx :s {
    ^^
    {
        say "No sigspace after this";
    }
    <.assertion_and_then_ws>
    characters_with_ws_after+
    ws_separated_characters *
    [
    | some "stuff" .. .
    | $$
    ]
    :my $foo = "no ws after this";
    $foo
}

Becomes:

rx {
    ^^ <.ws>
    {
        say "No space after this";
    }
    <.assertion_and_then_ws> <.ws>
    characters_with_ws_after+ <.ws>
    [ws_separated_characters <.ws>]* <.ws>
    [
    | some <.ws> "stuff" <.ws> .. <.ws> . <.ws>
    | $$ <.ws>
    ] <.ws>
    :my $foo = "no ws after this";
    $foo <.ws>
}

If a regex is declared with the rule keyword, both the :sigspace and :ratchet adverbs are implied.

Grammars provide an easy way to override what <.ws> matches:

grammar Demo {
    token ws {
        <!ww>       # only match when not within a word 
        \h*         # only match horizontal whitespace 
    }
    rule TOP {      # called by Demo.parse; 
        a b '.'
    }
}
 
# doesn't parse, whitespace required between a and b 
say so Demo.parse("ab.");                 # OUTPUT: «False␤» 
say so Demo.parse("a b.");                # OUTPUT: «True␤» 
say so Demo.parse("a\tb .");              # OUTPUT: «True␤» 
 
# \n is vertical whitespace, so no match 
say so Demo.parse("a\tb\n.");             # OUTPUT: «False␤» 

When parsing file formats where some whitespace (for example, vertical whitespace) is significant, it's advisable to override ws.