In Regexes§
See primary documentation in context for Sigspace
The :sigspace
or :s
adverb changes the behavior of unquoted whitespace in a regex.
Without :sigspace
, unquoted whitespace in a regex is generally ignored, to make regexes more readable by programmers. When :sigspace
is present, unquoted whitespace may be converted into <.ws>
subrule calls depending on where it occurs in the regex.
say so "I used Photoshop®" ~~ m:i/ photo shop /; # OUTPUT: «True»say so "I used a photo shop" ~~ m:i:s/ photo shop /; # OUTPUT: «True»say so "I used Photoshop®" ~~ m:i:s/ photo shop /; # OUTPUT: «False»
m:s/ photo shop /
acts the same as m/ photo <.ws> shop <.ws> /
. By default, <.ws>
makes sure that words are separated, so a b
and ^&
will match <.ws>
in the middle, but ab
won't:
say so "ab" ~~ m:s/a b/; # OUTPUT: «False»say so "a b" ~~ m:s/a b/; # OUTPUT: «True»say so "^&" ~~ m:s/'^' '&'/; # OUTPUT: «True»
The third line is matched, because ^&
is not a word. For more clarification on how <.ws>
rule works, refer to WS rule description.
Where whitespace in a regex turns into <.ws>
depends on what comes before the whitespace. In the above example, whitespace in the beginning of a regex doesn't turn into <.ws>
, but whitespace after characters does. In general, the rule is that if a term might match something, whitespace after it will turn into <.ws>
.
In addition, if whitespace comes after a term but before a quantifier (+
, *
, or ?
), <.ws>
will be matched after every match of the term. So, foo +
becomes [ foo <.ws> ]+
. On the other hand, whitespace after a quantifier acts as normal significant whitespace; e.g., "foo+
" becomes foo+ <.ws>
. On the other hand, whitespace between a quantifier and the %
or %%
quantifier modifier is not significant. Thus foo+ % ,
does not become foo+ <.ws>% ,
(which would be invalid anyway); instead, neither of the spaces are significant.
In all, this code:
rx :s {^^characters_with_ws_after+ws_separated_characters *[| some "stuff" .. .| $$]:my $foo = "no ws after this";$foo}
Becomes:
rx {^^characters_with_ws_after+[ws_separated_characters ]*[| some "stuff" .. .| $$]:my $foo = "no ws after this";$foo}
If a regex is declared with the rule
keyword, both the :sigspace
and :ratchet
adverbs are implied.
Grammars provide an easy way to override what <.ws>
matches:
# doesn't parse, whitespace required between a and bsay so Demo.parse("ab."); # OUTPUT: «False»say so Demo.parse("a b."); # OUTPUT: «True»say so Demo.parse("a\tb ."); # OUTPUT: «True»# \n is vertical whitespace, so no matchsay so Demo.parse("a\tb\n."); # OUTPUT: «False»
When parsing file formats where some whitespace (for example, vertical whitespace) is significant, it's advisable to override ws
.