Performance

Measuring and improving run-time or compile-time performance

This page is about anything to do with computer performance in the context of Perl 6.

First, clarify the problem

Make sure you're not wasting time on the wrong code: start by identifying your "critical 3%" by "profiling" as explained below.

Time with now - INIT now

Expressions of the form now - INIT now, where INIT is a phase in the running of a Perl 6 program, provide a great idiom for timing code snippets.

Use the m: your code goes here perl6 channel evalbot to write lines like:

m: say now - INIT now
rakudo-moar abc1234: OUTPUT«0.0018558␤»

The now to the left of INIT runs 0.0018558 seconds later than the now to the right of the INIT because the latter occurs during the INIT phase.

Profile locally

When using the MoarVM backend the Rakudo compiler's --profile command line option writes profile information as an HTML file. However, if the profile is too big it can be slow to open in a browser. In that case, if you use the --profile- filename=file.extension option with an extension of .json, you can use the Qt viewer on the resulting JSON file.

Another option (especially useful for profiles too big even for the Qt viewer) is to use an extension of .sql. This will write the profile data as a series of SQL statements, suitable for opening in SQLite.

# create a profile 
perl6 --profile --profile-filename=demo.sql -e 'say (^20).combinations(3).elems'
 
# create a SQLite database 
sqlite3 demo.sqlite
 
# load the profile data 
sqlite> .read demo.sql
 
# the query below is equivalent to the default view of the "Routines" tab in the HTML profile 
sqlite> select
   ...>   case when r.name = "" then "<anon>" else r.name end,
   ...>   r.file,
   ...>   r.line,
   ...>   sum(entriesas entries,
   ...>   sum(case when rec_depth = 0 then inclusive_time else 0 endas inclusive_time,
   ...>   sum(exclusive_timeas exclusive_time
   ...> from
   ...>   callees c,
   ...>   routines r
   ...> where
   ...>   c.id = r.id
   ...> group by
   ...>   c.id
   ...> order by
   ...>   inclusive_time desc;

To learn how to interpret the profile info, use the prof-m: your code goes here evalbot (explained above) and ask questions on channel.

Profile compiling

The Rakudo compiler's --profile-compile option profiles the time and memory used to compile code.

Create or view benchmarks

Use perl6-bench.

If you run perl6-bench for multiple compilers (typically versions of Perl 5, Perl 6, or NQP) then results for each are visually overlaid on the same graphs to provide for quick and easy comparison.

Share problems

Once you've used the above techniques to pinpoint code and performance that really matters you're in a good place to share problems, one at a time:

Solve problems

This bears repeating: make sure you're not wasting time on the wrong code. Start by identifying the "critical 3%" of your code.

Line by line

A quick, fun, productive way to try improve code line-by-line is to collaborate with others using the perl6 evalbot camelia.

Routine by routine

With multidispatch you can drop in new variants of routines "alongside" existing ones:

# existing code generically matches a two arg foo call: 
multi sub foo(Any $aAny $b{ ... }
 
# new variant takes over for a foo("quux", 42) call: 
multi sub foo("quux"Int $b{ ... }

The call overhead of having multiple foo definitions is generally insignificant (though see discussion of where below), so if your new definition handles its particular case more quickly/leanly than the previously existing set of definitions then you probably just made your code that much faster/leaner for that case.

Speed up type-checks and call resolution

Most where clauses – and thus most subsets – force dynamic (run-time) type checking and call resolution for any call it might match. This is slower, or at least later, than compile-time.

Method calls are generally resolved as late as possible, so dynamically, at run- time, whereas sub calls are generally resolvable statically, at compile-time.

Choose better algorithms

One of the most reliable techniques for making large performance improvements regardless of language or compiler is to pick an algorithm better suited to your needs.

A classic example is Boyer-Moore. To match a small string in a large string, one obvious way to do it is to compare the first character of the two strings and then, if they match, compare the second characters, or, if they don't match, compare the first character of the small string with the second character in the large string, and so on. In contrast, the Boyer-Moore algorithm starts by comparing the *last* character of the small string with the correspondingly positioned character in the large string. For most strings the Boyer-Moore algorithm is close to N times faster algorithmically, where N is the length of the small string.

The next couple sections discuss two broad categories for algorithmic improvement that are especially easy to accomplish in Perl 6. For more on this general topic, read the wikipedia page on algorithmic efficiency, especially the See also section near the end.

Change sequential/blocking code to parallel/non-blocking

This is another very important class of algorithmic improvement.

See the slides for Parallelism, Concurrency, and Asynchrony in Perl 6 and/or the matching video.

Use existing high performance code

Is there an existing high (enough) performance implementation of what you're trying to speed up / slim down?

There are a lot of C libs out there. NativeCall makes it easy to create wrappers for C libs (there's experimental support for C++ libs too) such as Gumbo. (Data marshalling and call handling is somewhat poorly optimized at the time of writing this but for many applications that won't matter.)

Perl 5's compiler can be treated as a C lib. Mix in Perl 6 types, the MOP, and some hairy programming that someone else has done for you, and the upshot is that you can conveniently use Perl 5 modules in Perl 6.

More generally, Perl 6 is designed for smooth interop with other languages and there are a number of modules aimed at providing convenient use of libs from other langs.

Make the Rakudo compiler generate faster code

The focus to date (Feb 2016) regarding the compiler has been correctness, not how fast it generates code or, more importantly, how fast or lean the code it generates runs. But that's expected to change somewhat this year and beyond. You can talk to compiler devs on the freenode IRC channels #perl6 and #moarvm about what to expect. Better still you can contribute yourself:

Still need more ideas?

There are endless performance topics.

Some known current Rakudo performance weaknesses not yet covered in this page include use of gather/take, use of junctions, regexes, and string handling in general.

If you think some topic needs more coverage on this page please submit a PR or tell someone your idea. Thanks. :)

Not getting the results you need/want?

If you've tried everything on this page to no avail, please consider discussing things with a compiler dev on #perl6 so we can learn from your use-case and what you've found out about it so far.

Once you know one of the main devs knows of your plight, allow enough time for an informed response (a few days or weeks depending on the exact nature of your problem and potential solutions).

If that hasn't worked out either, please consider filing an issue discussing your experience at our user experience repo before moving on. Thanks. :)