Entering Unicode Characters

Input methods for unicode characters in editors and the shell

Perl 6 allows the use of unicode characters as variable names. Many operators are defined with unicode symbols (in particular the set/bag operators) as well as some quoting constructs. Hence it is good to know how to enter these symbols into editors, the Perl 6 shell and the command line, especially if the symbols aren't available as actual characters on a keyboard.

General information about entering unicode under various operating systems and environments can be found on the Wikipedia unicode input page.

XCompose

Xorg includes digraph support using a Compose key . The default of AltGr + Shift can be remapped to something easier such as Capslock. In GNOME 2 and MATE this can be setup under Preferences → Keyboard → Layouts → Options → Position of Compose Key. So, for example, to input »+« you could type CAPSLOCK > > + CAPSLOCK < <

XCompose allows customising the digraph sequences using a .XCompose file and https://github.com/kragen/xcompose/blob/master/dotXCompose is an extremely complete one. In GNOME, XCompose was overridden and replaced with a hardcoded list, but it is possible to restore XCompose by setting GTK_IM_MODULE=xim in your environment. It might be necessary to install a xim bridge as well, such as uim-xim.

Editors and shells

Vim

In Vim, unicode characters are entered (in insert-mode) by pressing first Ctrl-V (also denoted ^V), then u and then the hexadecimal value of the unicode character to be entered. For example, the Greek letter λ (lambda) is entered via the key combination:

^Vu03BB

You can also use Ctrl-K/^K along with a digraph to type in some characters. So an alternative to the above using digraphs looks like this:

^Kl*

The list of digraphs Vim provides is documented here; you can add your own with the :digraph command.

Further information about entering special characters in Vim can be found on the Vim Wikia page about entering special characters.

Emacs

In Emacs, unicode characters are entered by first entering the chord C-x 8 RET at which point the text Unicode (name or hex): appears in the minibuffer. One then enters the unicode code point hexadecimal number followed by the enter key. The unicode character will now appear in the document. Thus, to enter the Greek letter λ (lambda), one uses the following key combination:

C-x 8 RET 3bb RET

Further information about unicode and its entry into Emacs can be found on the Unicode Encoding Emacs wiki page.

You can also use RFC 1345 character mnemonics by typing:

C-x RET C-\ rfc1345 RET

Or C-u C-\ rfc1345 RET.

To type special characters, type & followed by a mnemonic. Emacs will show the possible characters in the echo area. For example, Greek letter λ (lambda) can be entered by typing:

&l*

You can use C-\ to toggle input method.

Another input method you can use to insert special characters is TeX. Select it by typing C-u C-\ TeX RET. You can enter a special character by using a prefix such as \. For example, to enter λ, type:

\lambda

To view characters and sequences provided by an input method, run the describe-input-method command:

C-h I TeX

Unix shell

At the bash shell, one enters unicode characters by using entering Ctrl-Shift-u, then the unicode code point value followed by enter. For instance, to enter the character for the element-of operator (∈) use the following key combination (whitespace has been added for clarity):

Ctrl-Shift-u 2208 Enter

This also the method one would use to enter unicode characters into the perl6 REPL, if one has started the REPL inside a Unix shell.

Unicode characters useful in Perl 6

Guillemets

These characters are used in French and German as quotation marks. In Perl 6 they are also used as quotation marks (in POD as an alternative to the angle brackets and in normal code as an alternative to double quotes) as well as to denote the hyperoperators. The symbols and their unicode hex values are as follows:

symbol unicode code point ascii equivalent
« U+00AB <<
» U+00BB >>

Thus constructs such as these are now possible:

C« fixed-width POD text »
say (1, 2) »+« (3, 4);     # (4 6) - element-wise add
@array »+=» 42;            # add 42 to each element of @array
say «moo»;                 # moo
my $baa = 123; say «$baa»; # (123)

Set/bag operators

The set/bag operators operators all have set-theory-related symbols, the unicode code points and their ascii equivalents are listed below. To compose such a character, it is merely necessary to enter the character composition chord (e.g. Ctrl-V u in Vim; Ctrl-Shift-u in Bash) then the unicode code point hexadecimal number.

operator unicode code point ascii equivalent
U+2208 (elem)
U+2209 !(elem)
U+220B (cont)
U+220C !(cont)
U+2286 (<=)
U+2288 !(<=)
U+2282 (<)
U+2284 !(<)
U+2287 (>=)
U+2289 !(>=)
U+2283 (>)
U+2285 !(>)
U+227C (<+)
U+227D (>+)
U+222A (|)
U+2229 (&)
U+2216 (-)
U+2296 (^)
U+228D (.)
U+228E (+)

Mathematical symbols

Wikipedia contains a full list of mathematical operators and symbols in unicode as well as links to their mathematical meaning.

Greek characters

Greek characters may be used as variable names. For a list of Greek and Coptic characters and their unicode code points see the Greek in Unicode Wikipedia article.

For example, to assign the value 3 to π, enter the following in Vim (whitespace added to the compose sequences for clarity):

    my $Ctrl-V u 03C0 = 3;  # same as: my $π = 3;
    say $Ctrl-V u 03C0;     # 3    same as: say $π;

Superscripts and subscripts

A limited set of superscripts and subscripts can be created directly in unicode by using the U+207x, U+208x and (less often) the U+209x ranges. However, to produce a value squared (to the power of 2) or cubed (to the power of 3), one needs to use U+00B2 and U+00B3 since these are defined in the Latin1 supplement Unicode block.

Thus, to write the Taylor series expansion around zero of the function exp(x) one would input into e.g. vim the following:

    exp(x) = 1 + x + xCtrl-V u 00B2/2! + xCtrl-V u 00B3/3!
    + ... + xCtrl-V u 207F/n!
    # which would appear as
    exp(x) = 1 + x + x²/2! + x³/3! + ... + xⁿ/n!

Or to specify the elements in a list from 1 up to k:

    ACtrl-V u 2081, ACtrl-V u 2082, ..., ACtrl-V u 2096
    # which would appear as
    A₁, A₂, ..., Aₖ