In Regexes§
See primary documentation in context for Sigspace
The :sigspace
or :s
adverb changes the behavior of unquoted whitespace in a regex.
Without :sigspace
, unquoted whitespace in a regex is generally ignored, to make regexes more readable by programmers. When :sigspace
is present, unquoted whitespace may be converted into <.ws>
subrule calls depending on where it occurs in the regex.
say so "I used Photoshop®" ~~ m:i/ photo shop /; # OUTPUT: «True» say so "I used a photo shop" ~~ m:i:s/ photo shop /; # OUTPUT: «True» say so "I used Photoshop®" ~~ m:i:s/ photo shop /; # OUTPUT: «False»
m:s/ photo shop /
acts the same as m/ photo <.ws> shop <.ws> /
. By default, <.ws>
makes sure that words are separated, so a b
and ^&
will match <.ws>
in the middle, but ab
won't:
say so "ab" ~~ m:s/a <.ws> b/; # OUTPUT: «False» say so "a b" ~~ m:s/a <.ws> b/; # OUTPUT: «True» say so "^&" ~~ m:s/'^' <.ws> '&'/; # OUTPUT: «True»
The third line is matched, because ^&
is not a word. For more clarification on how <.ws>
rule works, refer to WS rule description.
Where whitespace in a regex turns into <.ws>
depends on what comes before the whitespace. In the above example, whitespace in the beginning of a regex doesn't turn into <.ws>
, but whitespace after characters does. In general, the rule is that if a term might match something, whitespace after it will turn into <.ws>
.
In addition, if whitespace comes after a term but before a quantifier (+
, *
, or ?
), <.ws>
will be matched after every match of the term. So, foo +
becomes [ foo <.ws> ]+
. On the other hand, whitespace after a quantifier acts as normal significant whitespace; e.g., "foo+
" becomes foo+ <.ws>
. On the other hand, whitespace between a quantifier and the %
or %%
quantifier modifier is not significant. Thus foo+ % ,
does not become foo+ <.ws>% ,
(which would be invalid anyway); instead, neither of the spaces are significant.
In all, this code:
rx :s { ^^ { say "No sigspace after this"; } <.assertion_and_then_ws> characters_with_ws_after+ ws_separated_characters * [ | some "stuff" .. . | $$ ] :my $foo = "no ws after this"; $foo }
Becomes:
rx { ^^ <.ws> { say "No space after this"; } <.assertion_and_then_ws> <.ws> characters_with_ws_after+ <.ws> [ws_separated_characters <.ws>]* <.ws> [ | some <.ws> "stuff" <.ws> .. <.ws> . <.ws> | $$ <.ws> ] <.ws> :my $foo = "no ws after this"; $foo <.ws> }
If a regex is declared with the rule
keyword, both the :sigspace
and :ratchet
adverbs are implied.
Grammars provide an easy way to override what <.ws>
matches:
grammar Demo { token ws { <!ww> # only match when not within a word \h* # only match horizontal whitespace } rule TOP { # called by Demo.parse; a b '.' } } # doesn't parse, whitespace required between a and b say so Demo.parse("ab."); # OUTPUT: «False» say so Demo.parse("a b."); # OUTPUT: «True» say so Demo.parse("a\tb ."); # OUTPUT: «True» # \n is vertical whitespace, so no match say so Demo.parse("a\tb\n."); # OUTPUT: «False»
When parsing file formats where some whitespace (for example, vertical whitespace) is significant, it's advisable to override ws
.