regex <:property>

Documentation for regex <:property> assembled from the following types:

language documentation Regexes

From Regexes

(Regexes) regex <:property>

The character classes mentioned so far are mostly for convenience; another approach is to use Unicode character properties. These come in the form <:property>, where property can be a short or long Unicode General Category name. These use pair syntax.

To match against a Unicode property you can use either smartmatch or uniprop:

"a".uniprop('Script');                 # OUTPUT: «Latin␤» 
"a" ~~ / <:Script<Latin>/;           # OUTPUT: «「a」␤» 
"a".uniprop('Block');                  # OUTPUT: «Basic Latin␤» 
"a" ~~ / <:Block('Basic Latin')> /;    # OUTPUT: «「a」␤» 

These are the unicode general categories used for matching:

Short Long
L Letter
LC Cased_Letter
Lu Uppercase_Letter
Ll Lowercase_Letter
Lt Titlecase_Letter
Lm Modifier_Letter
Lo Other_Letter
M Mark
Mn Nonspacing_Mark
Mc Spacing_Mark
Me Enclosing_Mark
N Number
Nd Decimal_Number or digit
Nl Letter_Number
No Other_Number
P Punctuation or punct
Pc Connector_Punctuation
Pd Dash_Punctuation
Ps Open_Punctuation
Pe Close_Punctuation
Pi Initial_Punctuation
Pf Final_Punctuation
Po Other_Punctuation
S Symbol
Sm Math_Symbol
Sc Currency_Symbol
Sk Modifier_Symbol
So Other_Symbol
Z Separator
Zs Space_Separator
Zl Line_Separator
Zp Paragraph_Separator
C Other
Cc Control or cntrl
Cf Format
Cs Surrogate
Co Private_Use
Cn Unassigned

For example, <:Lu> matches a single, upper-case letter.

Its negation is this: <:!property>. So, <:!Lu> matches a single character that is not an upper-case letter.

Categories can be used together, with an infix operator:

Operator Meaning
+ set union
- set difference

To match either a lower-case letter or a number, write <:Ll+:N> or <:Ll+:Number> or <+ :Lowercase_Letter + :Number>.

It's also possible to group categories and sets of categories with parentheses; for example:

say $0 if 'perl6' ~~ /\w+(<:Ll+:N>)/ # OUTPUT: «「6」␤»