regex <[ ]>

Documentation for regex <[ ]> assembled from the following types:

language documentation Regexes

From Regexes

(Regexes) regex <[ ]>

Sometimes the pre-existing wildcards and character classes are not enough. Fortunately, defining your own is fairly simple. Within <[ ]>, you can put any number of single characters and ranges of characters (expressed with two dots between the end points), with or without whitespace.

"abacabadabacaba" ~~ / <[ a .. c 1 2 3 ]>* /;
# Unicode hex codepoint range 
"ÀÁÂÃÄÅÆ" ~~ / <[ \x[00C0] .. \x[00C6] ]>* /;
# Unicode named codepoint range 
"αβγ" ~~ /<[\c[GREEK SMALL LETTER ALPHA]..\c[GREEK SMALL LETTER GAMMA]]>*/;

Within the < > you can use + and - to add or remove multiple range definitions and even mix in some of the unicode categories above. You can also write the backslashed forms for character classes between the [ ].

/ <[\d] - [13579]> /;
# starts with \d and removes odd ASCII digits, but not quite the same as 
/ <[02468]> /;
# because the first one also contains "weird" unicodey digits 

You can include Unicode properties in the list as well:

/<:Zs + [\x9] - [\xA0]>/
# Any character with "Zs" property, or a tab, but not a newline 

You can use \ to escape characters that would have some meaning in the regular expression:

say "[ hey ]" ~~ /<-[ \] \[ \s ]>+/# OUTPUT: «「hey」␤» 

To negate a character class, put a - after the opening angle:

say 'no quotes' ~~ /  <-[ " ]> + /;  # matches characters except " 

A common pattern for parsing quote-delimited strings involves negated character classes:

say '"in quotes"' ~~ / '"' <-[ " ]> * '"'/;

This first matches a quote, then any characters that aren't quotes, and then a quote again. The meaning of * and + in the examples above are explained in the next section, Quantifier.

Just as you can use the - for both set difference and negation of a single value, you can also explicitly put a + in front:

/ <+[123]> /  # same as <[123]>