Sets, Bags, and Mixes

Unordered collections of unique and weighted objects in Perl 6

Often you want to collect objects in a container but you do not care about the order of these objects. For such cases, Perl 6 provides the unordered collection types Set, SetHash, Bag, BagHash, Mix, and MixHash. Being unordered, these containers can be more efficient than Lists for looking up elements or dealing with repeated items.

If you want to get the contained objects (elements) without duplicates and you only care whether an element is in the collection or not, you can use a Set or SetHash. (If you want to get rid of duplicates but still preserve order, take a look at the unique routine for Lists.)

If you want to keep track of the number of times each object appeared, you can use a Bag or BagHash. In these Baggy containers each element has a weight (an unsigned integer) indicating the number of times the same object has been included in the collection. The types Mix and MixHash are similar, but they also allow fractional weights.

Set, Bag, and Mix are immutable types. Use the mutable variants SetHash, BagHash, and MixHash if you want to add or remove elements after the container has been constructed.

The six collection classes Set, SetHash, Bag, BagHash, Mix, MixHash, all share similar semantics.

For one thing, as far as they are concerned, identical objects refer to the same element – where identity is determined using the WHICH methods (i.e. the same way that the === operator checks identity). For value types like Str, this means having the same value; for reference types like Array, it means referring to the same object instance.

Secondly, they provide a Hash-like interface where the actual elements of the collection (which can be objects of any type) are the 'keys', and the associated weights are the 'values':

type of $a value of $a{$b} if $b is an element value of $a{$b} if $b is not an element
Set / SetHash True False
Bag / BagHash a positive integer 0
Mix / MixHash a non-zero real number 0

Set/Bag Operators

There are many infixes devoted to preforming common operations on Sets, such as unions and set differences. Other operations include boolean checks, like whether an object is an element of a Set, or whether one Set is a subset of another Set.

These infixes can be written using the UTF-8 character that represents the function (like , or ), or they can be written with an equivalent ASCII version (like (elem) or (|)).

Most of the time, explicitly using Set objects with these infixes is unnecessary. All of the infix operators will work on any objects of type Any for its arguments (e.g., Lists, Arrays, Mixes, etc.) and coerce them to Sets where needed.

In some cases, if the type of an argument is a Bag, the infix operator will behave in a different but analogous way to the way it would behave with only Set arguments.

Operators that return Bool

infix (elem)

multi sub infix:<(elem)>($aAny $b --> Bool)
multi sub infix:<(elem)>($aSet $b --> Bool)

Membership operator.

Returns True if $a is an element of $b.

infix ∈

only sub infix:<>($a$b --> Bool)

Membership operator (alternate).

Equivalent to (elem), at codepoint U+2208 (ELEMENT OF).

infix ∉

only sub infix:<>($a$b --> Bool)

Non-membership operator.

Equivalent to !(elem), i.e., returns True if $a is not an element of $b, at codepoint U+2209 (NOT AN ELEMENT OF).

infix (cont)

multi sub infix:<(cont)>(Any $a$b --> Bool)
multi sub infix:<(cont)>(Set $a$b --> Bool)

Contains operator.

Returns True if $a contains $b as an element.

infix ∋

only sub infix:<>($a$b --> Bool)

Contains operator (alternate).

Equivalent to (cont), at codepoint U+220B (CONTAINS AS MEMBER).

infix ∌

only sub infix:<>($a$b --> Bool)

Does not contain operator.

Equivalent to !(cont), i.e., returns True if $a does not contain $b, at codepoint U+220C (DOES NOT CONTAIN AS MEMBER).

infix (<=)

multi sub infix:<<(<=)>>(Any $aAny $b --> Bool)
multi sub infix:<<(<=)>>(Setty $aSetty $b --> Bool)

Subset of or equal to operator.

Returns True if $a is a subset or is equal to $b, i.e., if all the elements of $a are elements of $b and $a is a smaller or equal sized set than $b.

infix ⊆

only sub infix:<>($a$b --> Bool)

Subset of or equal to operator (alternate).

Equivalent to (<=), at codepoint U+2286 (SUBSET OF OR EQUAL TO).

infix ⊈

only sub infix:<>($a$b --> Bool)

Neither subset of nor equal to operator.

Equivalent to !(<=), at codepoint U+2288 (NEITHER A SUBSET OF NOR EQUAL TO).

infix (<)

multi sub infix:<<(<)>>(Any $aAny $b --> Bool)
multi sub infix:<<(<)>>(Setty $aSetty $b --> Bool)

Subset of operator.

Returns True if $a is a strict subset of $b, i.e., that all the elements of $a are elements of $b but $a is a smaller set than $b.

infix ⊂

only sub infix:<>($a$b --> Bool)

Subset of operator (alternate).

Equivalent to (<), at codepoint U+2282 (SUBSET OF).

infix ⊄

only sub infix:<>($a$b --> Bool)

Not a subset of operator.

Equivalent to !(<), at codepoint U+2284 (NOT A SUBSET OF).

infix (>=)

multi sub infix:<<(>=)>>(Any $aAny $b --> Bool)
multi sub infix:<<(>=)>>(Setty $aSetty $b --> Bool)

Superset of or equal to operator.

Like (<=) with reversed arguments. Returns True if $a is a superset of or equal to $b.

infix ⊇

only sub infix:<>($a$b --> Bool)

Superset of or equal to operator (alternate).

Equivalent to (>=), at codepoint U+2287 (SUPERSET OF OR EQUAL TO).

infix ⊉

only sub infix:<>($a$b --> Bool)

Neither a superset of nor equal to operator.

Equivalent to !(>=), at codepoint U+2289 (NEITHER A SUPERSET OF NOR EQUAL TO).

infix (>)

multi sub infix:<<(>)>>(Any $aAny $b --> Bool)
multi sub infix:<<(>)>>(Setty $aSetty $b --> Bool)

Superset of operator.

Like (<) with reversed arguments. Returns True if $a is a strict superset of $b.

infix ⊃

only sub infix:<>($a$b --> Bool)

Superset of operator (alternate).

Equivalent to (>), at codepoint U+2283 (SUPERSET OF).

infix ⊅

only sub infix:<>($a$b --> Bool)

Not a superset of operator.

Equivalent to !(>), at codepoint U+2285 (NOT A SUPERSET OF).

infix (<+)

multi sub infix:<<(<+)>>(Any $aAny $b --> Bool)
multi sub infix:<<(<+)>>(Baggy $aBaggy $b --> Bool)

Baggy subset of operator.

Returns True if $a is a Baggy subset of $b, i.e., if all the elements of $a are in $b and each element of $b is weighed at least as heavily as the element is in $a.

infix ≼

only sub infix:<>($a$b --> Bool)

Baggy subset of operator (alternate).

Equivalent to (<+), at codepoint U+227C (PRECEDES OR EQUAL TO).

infix (>+)

multi sub infix:<<(>+)>>(Baggy $aBaggy $b --> Bool)
multi sub infix:<<(>+)>>(Any $aAny $b --> Bool)

Baggy superset of operator.

Returns True if $a is a Baggy superset of $b, i.e., if all the elements of $b are in $a and no element of $b is weighted heavier than that element is in $a.

infix ≽

only sub infix:<>($a$b --> Bool)

Baggy superset of operator (alternate).

Equivalent to (>+), at codepoint U+227D (SUCCEEDS OR EQUAL TO).

Operators that return Set or Bag

infix (|)

only sub infix:<(|)>(**@p)

Union operator.

Returns the union of all its arguments. Generally, this creates a new Set that contains all the elements its arguments contain.

<a a b c d> (|) <h g f e d c> (|) <i j> === set <a b c d e f g h i j>

If any of its arguments are Baggy, it creates a new Bag that contains all the elements of the arguments, each weighed by the highest weight that appeared for that element.

bag(<a a b c a>(|) bag(<a a b c c>=== bag(<a a a b c c>)

infix ∪

only sub infix:<>(|p)

Union operator (alternate).

Equivalent to (|), at codepoint U+222A (UNION).

infix (&)

only sub infix:<(&)>(**@p)

Intersection operator.

Returns the intersection of all of its arguments. Generally, this creates a new Set that contains only the elements common to all of the arguments.

<a b c> (&) <b c d> === set <b c>
<a b c d> (&) <b c d e> (&) <c d e f> === set <c d>

If any of the arguments are Baggy, the result is a new Bag containing the common elements, each weighted by the largest common weight (which is the minimum of the weights of that element over all arguments).

bag(<a a b c a>(&) bag(<a a b c c>=== bag(<a a b c>)

infix ∩

only sub infix:<>(|p)

Intersection operator (alternate).

Equivalent to (&), at codepoint U+2229 (INTERSECTION).

infix (-)

only sub infix:<(-)>(**@p)

Set difference operator.

Returns the set difference of all its arguments. Generally, this returns the Set made up of all the elements the first argument has but the rest don't, i.e., of all the elements of the first argument, minus the elements from the other arguments.

If the first argument is Baggy, this returns a Bag that contains each element of the first argument with its weight subtracted by the weight of that element in each of the other arguments.

bag(<a a b c a d>(-) bag(<a a b c c>= bag(<a d>)
bag(<a a a a c d d d>(-) bag(<a b d a>(-) bag(<d c>= bag(<a a d d>)

infix ∖

only sub infix:<<"\x2216">>(|p)

Set difference operator (alternate).

Equivalent to (-).

infix (^)

multi sub infix:<(^)>(Any $aAny $b --> Setty)
multi sub infix:<(^)>(Set $aSet $b --> Setty)

Symmetric set difference operator.

Returns the symmetric set difference of all its arguments, i.e., a Set made up of all the elements that $a has but $b doesn't and all the elements $b has but $a doesn't. Equivalent to ($a ∖ $b) ∪ ($b ∖ $a).

infix ⊖

only sub infix:<>($a$b --> Setty)

Symmetric set difference operator (alternate).

Equivalent to (^), at codepoint U+2296 (CIRCLED MINUS).

infix (.)

only sub infix:<(.)>(**@p)

Baggy multiplication operator.

Returns the Baggy multiplication of its arguments, i.e., a Bag that contains each element of the arguments with the weights of the element across the arguments multiplied together to get the new weight.

<a b c> (.) <a b c d> === bag <a b c> # Since 1 * 0 == 0, in the case of 'd' 
bag(<a a b c a d>(.) bag(<a a b c c>=== ("a"=>6,"c"=>2,"b"=>1).Bag

infix ⊍

only sub infix:<>(|p)

Baggy multiplication operator (alternate).

Equivalent to infix (.), at codepoint U+228D (MULTISET MULTIPLICATION).

infix (+)

only sub infix:<(+)>(**@p)

Baggy addition operator.

Returns the Baggy addition of its arguments, i.e., a Bag that contains each element of the arguments with the weights of the element across the arguments added together to get the new weight.

bag(<a a b c a d>(+) bag(<a a b c c>=== ("a"=>5,"c"=>3,"b"=>2,"d"=>1).Bag

infix ⊎

only sub infix:<>(|p)

Baggy addition operator (alternate).

Equivalent to (+), at codepoint U+228E (MULTISET UNION).

term ∅

Equivalent to set(), aka the empty set, at codepoint U+2205 (EMPTY SET).