regex <[ ]>
Documentation for regex
<[ ]> assembled from the following types:
Sometimes the pre-existing wildcards and character classes are not enough. Fortunately, defining your own is fairly simple. Within
<[ ]>, you can put any number of single characters and ranges of characters (expressed with two dots between the end points), with or without whitespace.
"abacabadabacaba" ~~ / * /;# Unicode hex codepoint range"ÀÁÂÃÄÅÆ" ~~ / * /;# Unicode named codepoint range"αβγ" ~~ /*/;
< > you can use
- to add or remove multiple range definitions and even mix in some of the unicode categories above. You can also write the backslashed forms for character classes between the
/ /;# starts with \d and removes odd ASCII digits, but not quite the same as/ /;# because the first one also contains "weird" unicodey digits
You can include Unicode properties in the list as well:
//# Any character with "Zs" property, or a tab, but not a newline
say "[ hey ]" ~~ /+/; # OUTPUT: «｢hey｣␤»
To negate a character class, put a
- after the opening angle:
say 'no quotes' ~~ / + /; # matches characters except "
A common pattern for parsing quote-delimited strings involves negated character classes:
say '"in quotes"' ~~ / '"' * '"'/;
This regex first matches a quote, then any characters that aren't quotes, and then a quote again. The meaning of
+ in the examples above are explained in the next section on quantifiers.
Just as you can use the
- for both set difference and negation of a single value, you can also explicitly put a
+ in front:
/ / # same as <>