Russian translation available;
Пост доступен на сайте softdroid.net: Почему
так трудно написать компилятор для Perl 6?.
Today’s deceptively simple question on #perl6: is it harder to write a
compiler for Perl 6 than for any other
The answer is simple: yes, it’s harder (and more work) than for many other
languages. The more involved question is: why?
So, let’s take a look. The first point is organizational: Perl 6 isn’t yet
fully explored and formally specified; it’s much more stable than it used to
be, but less stable than, say, targeting C89.
But even if you disregard this point, and target the subset that for
example the Rakudo Perl 6 compiler implements
right now, or the wait a year and target the first Perl 6 language release,
the point remains valid.
So let’s look at some technical aspects.
Static vs. Dynamic
Perl 6 has both static and dynamic corners. For example, lexical lookups
are statical, in the sense that they can be resolved at compile time. But
that’s not optional. For a compiler to properly support native types, it must
resolve them at compile time. We also expect the compiler to notify us of
certain errors at compile time, so there must be a fair amount of static
On the other hand, type annotations are optional pretty much anywhere, and
methods are late bound. So the compiler must also support features typically
found in dynamic languages.
And even though method calls are late bound, composing roles into classes
is a compile time operation, with mandatory compile time analysis.
The Perl 6 grammar can change
during a parse, for example by newly defined operators, but also through
more invasive operations such as defining slangs or macros. Speaking of slangs:
Perl 6 doesn’t have a single grammar, it switches back and forth between the
“main” language, regexes, character classes inside regexes, quotes, and all
the other dialects you might think of.
Since the grammar extensions are done with, well, Perl 6 grammars, it
forces the parser to be interoperable with Perl 6 regexes and grammars. At
which point you might just as well use them for parsing the whole thing, and
you get some level of minimally required self-hosting.
In a language like C++, the behavior of the object system is hard-coded
into the language, and so the compiler can work under this assumption, and
optimize the heck out of it.
In Perl 6, the object system is defined by other objects and classes, the
meta objects. So there is
another layer of indirection that must be handled.
Mixing of compilation and run time
Declarations like classes, but also
BEGIN blocks and the
right-hand side of
constant declarations are run as soon as they
are parsed. Which means the compiler must be able to run Perl 6 code while
compiling Perl 6 code. And also the other way round, through
More importantly, it must be able to run Perl 6 code before it has finished
compiling the whole compilation unit. That means it hasn’t even fully
constructed the lexical pads, and hasn’t initialized all the variables. So it
needs special “static lexpads” to which compile-time usages of variables can
fall back to. Also the object system has to be able to work with types that
haven’t been fully declared yet.
So, lots of trickiness involved.
Types are objects defined through their meta objects. That means that when
you precompile a module (or even just the setting, that is, the mass of
built-ins), the compiler has to serialize the types and their meta
objects. Including closures. Do you have any idea how hard it is to correctly
But, classes are mutable. So another module might load a precompiled
module, and add another method to it, or otherwise mess with it. Now the
compiler has to serialize the fact that, if the second module is loaded, the
object from the first module is modified. We say that the serialization
context from the second module repossesses the type.
And there are so many ways in which this can go wrong.
One of the many Perl 6 mottos is “torture the implementor on behalf of the
user”. So it demands not only both static and dynamic typing, but also
functional features, continuations, exceptions, lazy lists, a powerful grammar
engine, named arguments, variadic arguments, introspection of call frames,
closures, lexical and dynamic variables, packed types (for direct interfacing
with C libraries, for example), and phasers (code that is automatically run at
different phases of the program).
All of these features aren’t too hard to implement in isolation, but in
combination they are a real killer. And you want it to be fast, right?