Measuring and improving run-time or compile-time performance
This page is about anything to do with computer performance in the context of Perl 6.
Make sure you're not wasting time on the wrong code: start by identifying your "critical 3%" by "profiling" as explained below.
Expressions of the form
now - INIT now, where
INIT is a phase in the running of a Perl 6 program, provide a great idiom for timing code snippets.
m: your code goes here perl6 channel evalbot to write lines like:
m: say now - INIT nowrakudo-moar abc1234: OUTPUT«0.0018558␤»
now to the left of
INIT runs 0.0018558 seconds later than the
now to the right of the
INIT because the latter occurs during the INIT phase.
When using the MoarVM backend the Rakudo compiler's
--profile command line option writes profile information as an HTML file. However, if the profile is too big it can be slow to open in a browser. In that case, if you use the
--profile-filename=file.extension option with an extension of
.json, you can use the Qt viewer on the resulting JSON file.
Another option (especially useful for profiles too big even for the Qt viewer) is to use an extension of
.sql. This will write the profile data as a series of SQL statements, suitable for opening in SQLite.
# create a profileperl6 --profile --profile-filename=demo.sql -e 'say (^20).combinations(3).elems'# create a SQLite databasesqlite3 demo.sqlite# load the profile datasqlite> .read demo.sql# the query below is equivalent to the default view of the "Routines" tab in the HTML profilesqlite> select...> case when r.name = "" then "<anon>" else r.name end,...> r.file,...> r.line,...> sum(entries) as entries,...> sum(case when rec_depth = 0 then inclusive_time else 0 end) as inclusive_time,...> sum(exclusive_time) as exclusive_time...> from...> callees c,...> routines r...> where...> c.id = r.id...> group by...> c.id...> order by...> inclusive_time desc;
To learn how to interpret the profile info, use the
prof-m: your code goes here evalbot (explained above) and ask questions on channel.
The Rakudo compiler's
--profile-compile option profiles the time and memory used to compile code.
If you run perl6-bench for multiple compilers (typically versions of Perl 5, Perl 6, or NQP) then results for each are visually overlaid on the same graphs to provide for quick and easy comparison.
Once you've used the above techniques to pinpoint code and performance that really matters you're in a good place to share problems, one at a time:
For each problem you see, distill it down to a one-liner or short public gist of code that either already includes performance numbers or is small enough that it can be profiled using
prof-m: your code or gist URL goes here.
Think about the minimum speed increase (or ram reduction or whatever) you need/want. What if it took a month for folk to help you achieve that? A year?
Let folk know if your Perl 6 use-case is in a production setting or just for fun.
This bears repeating: make sure you're not wasting time on the wrong code. Start by identifying the "critical 3%" of your code.
With multidispatch you can drop in new variants of routines "alongside" existing ones:
# existing code generically matches a two arg foo call:multi sub foo(Any , Any )# new variant takes over for a foo("quux", 42) call:multi sub foo("quux", Int )
The call overhead of having multiple
foo definitions is generally insignificant (though see discussion of
where below), so if your new definition handles its particular case more quickly/leanly than the previously existing set of definitions then you probably just made your code that much faster/leaner for that case.
Method calls are generally resolved as late as possible, so dynamically, at run-time, whereas sub calls are generally resolvable statically, at compile-time.
One of the most reliable techniques for making large performance improvements regardless of language or compiler is to pick an algorithm better suited to your needs.
A classic example is Boyer-Moore. To match a small string in a large string, one obvious way to do it is to compare the first character of the two strings and then, if they match, compare the second characters, or, if they don't match, compare the first character of the small string with the second character in the large string, and so on. In contrast, the Boyer-Moore algorithm starts by comparing the *last* character of the small string with the correspondingly positioned character in the large string. For most strings the Boyer-Moore algorithm is close to N times faster algorithmically, where N is the length of the small string.
The next couple sections discuss two broad categories for algorithmic improvement that are especially easy to accomplish in Perl 6. For more on this general topic, read the wikipedia page on algorithmic efficiency, especially the See also section near the end.
This is another very important class of algorithmic improvement.
See the slides for Parallelism, Concurrency, and Asynchrony in Perl 6 and/or the matching video.
Is there an existing high (enough) performance implementation of what you're trying to speed up / slim down?
There are a lot of C libs out there. NativeCall makes it easy to create wrappers for C libs (there's experimental support for C++ libs too) such as Gumbo. (Data marshalling and call handling is somewhat poorly optimized at the time of writing this but for many applications that won't matter.)
Perl 5's compiler can be treated as a C lib. Mix in Perl 6 types, the MOP, and some hairy programming that someone else has done for you, and the upshot is that you can conveniently use Perl 5 modules in Perl 6.
More generally, Perl 6 is designed for smooth interop with other languages and there are a number of modules aimed at providing convenient use of libs from other langs.
The focus to date (Feb 2016) regarding the compiler has been correctness, not how fast it generates code or, more importantly, how fast or lean the code it generates runs. But that's expected to change somewhat this year and beyond. You can talk to compiler devs on the freenode IRC channels #perl6 and #moarvm about what to expect. Better still you can contribute yourself:
Rakudo is largely written in Perl 6. So if you can write Perl 6, then you can hack on the compiler, including optimizing any of the large body of existing high-level code that impacts the speed of your code (and everyone else's).
Most of the rest of the compiler is written in a small language called NQP that's basically a subset of Perl 6. If you can write Perl 6 you can fairly easily learn to use and improve the mid-level NQP code too, at least from a pure language point of view. To dig in to NQP and Rakudo's guts, start with NQP and internals course.
There are endless performance topics.
Some known current Rakudo performance weaknesses not yet covered in this page include use of gather/take, use of junctions, regexes, and string handling in general.
If you think some topic needs more coverage on this page please submit a PR or tell someone your idea. Thanks. :)
If you've tried everything on this page to no avail, please consider discussing things with a compiler dev on #perl6 so we can learn from your use-case and what you've found out about it so far.
Once you know one of the main devs knows of your plight, allow enough time for an informed response (a few days or weeks depending on the exact nature of your problem and potential solutions).
If that hasn't worked out either, please consider filing an issue discussing your experience at our user experience repo before moving on. Thanks. :)