Perl 6 By Example: Generating Good Parse Errors from a Parser

This blog post is part of my ongoing project
to write a book about Perl 6.

If you’re interested, please sign up for the mailing list at the bottom of
the article, or here. It will be
low volume (less than an email per month, on average).


Good error messages are paramount to the user experience of any product.
Parsers are no exception to this. Consider the difference between the message
“Square bracket [ on line 5 closed by curly bracket } on line 5”, in contrast
to Python’s lazy and generic “SyntaxError: invalid syntax”.

In addition to the textual message, knowing the location of the parse error
helps tremendously in figuring out what’s wrong.

We’ll explore how to generate better parsing error messages from
a Perl 6
grammar, using the INI file parse from the previous blog posts as an example
.

Failure is Normal

Before we start, it’s important to realize that in a grammar-based parser,
it’s normal for regex to fail to match. Even in an overall successful parse.

Let’s recall a part of the parser:

token block { [<pair> | <comment>]* }
token section { <header> <block> }
token TOP { <block> <section>* }

When this grammar matches against the string

key=value
[header]
other=stuff

then TOP calls block, which calls both pair and comment. The pair
match succeeds, the comment match fails. No big deal. But since there is a
* quantifier in token block, it tries again to match pair or comment.
neither succeeds, but the overall match of token block still succeeds.

A nice way to visualize passed and failed submatches is to install the
Grammar::Tracer module (zef install Grammar::Tracer or panda install
Grammar::Tracer
), and simple add the statement use Grammar::Tracer before
the grammar definition. This produces debug output showing which rules matched
and which didn’t:

TOP
|  block
|  |  pair
|  |  |  key
|  |  |  * MATCH "key"
|  |  |  ws
|  |  |  * MATCH ""
|  |  |  ws
|  |  |  * MATCH ""
|  |  |  value
|  |  |  * MATCH "value"
|  |  |  ws
|  |  |  * MATCH ""
|  |  |  ws
|  |  |  * MATCH ""
|  |  * MATCH "key=valuen"
|  |  pair
|  |  |  key
|  |  |  * FAIL
|  |  * FAIL
|  |  comment
|  |  * FAIL
|  * MATCH "key=valuen"
|  section
...

Detecting Harmful Failure

To produce good parsing error messages, you must distinguish between expected
and unexpected parse failures. As explained above, a match failure of a single
regex or token is not generally an indication of a malformed input. But you
can identify points where you know that once the regex engine got this far,
the rest of the match must succeed.

If you recall pair:

rule pair { <key>  '='  <value> n+ }

we know that if a key was parsed, we really expect the next character to be an
equals sign. If not, the input is malformed.

In code, this looks like this:

rule pair {
    <key> 
    [ '=' || <expect('=')> ]
     <value> n+
}

|| is a sequential alternative, which first tries to match the subregex on
the left-hand side, and only executes the right-hand side if that failed. On
the other hand, | executes all alternatives notionally in parallel, and
takes the long match.

So now we have to define expect:

method expect($what) {
    die "Cannot parse input as INI file: Expected $what";
}

Yes, you can call methods just like regexes, because regexes really are
methods under the hood. die throws an exception, so now the malformed input
justakey produces the error

Cannot parse input as INI file: Expected =

followed by a backtrace. That’s already better than “invalid syntax”, though
the position is still missing. Inside method expect, we can find the current
parsing position through method pos, a method supplied by the implicit
parent class Grammar that the grammar
declaration brings with it.

We can use that to improve the error message a bit:

method expect($what) {
    die "Cannot parse input as INI file: Expected $what at character {self.pos}";
}

Providing Context

For larger inputs, we really want to print the line number. To calculate that,
we need to get hold of the target string, which is available as method
target:

method expect($what) {
    my $parsed-so-far = self.target.substr(0, self.pos);
    my @lines = $parsed-so-far.lines;
    die "Cannot parse input as INI file: Expected $what at line @lines.elems(), after '@lines[*-1]'";
}

This brings us from the “meh” realm of error messages to quite good.

IniFile.parse(q:to/EOI/);
key=value
[section]
key_without_value
more=key
EOI

now dies with

Cannot parse input as INI file: Expected = at line 3, after 'key_without_value'

You can refine method expect more, for example by providing context both before
and after the position of the parse failure.

And of course you have to apply the [ thing || <expect('thing')> ] pattern
at more places inside the regex to get better error messages.

Finally you can provide different kinds of error messages too. For example
when parsing a section header, once the initial [ is parsed, you likely
don’t want an error message “expected rest of section header”, but rather
“malformed section header, at line …”:

rule pair {
    <key> 
    [ '=' || <expect('=')> ] 
    [ <value> || <expect('value')>]
     n+
}
token header { 
     '[' 
     [ ( <-[ [ ] n ]>+ )  ']'
         || <error("malformed section header")> ]
}
...

method expect($what) {
    self.error("expected $what");
}

method error($msg) {
    my $parsed-so-far = self.target.substr(0, self.pos);
    my @lines = $parsed-so-far.lines;
    die "Cannot parse input as INI file: $msg at line @lines.elems(), after '@lines[*-1]'";
}

Since Rakudo Perl 6 uses grammars to parse Perl 6 input, you can use
Rakudo’s own
grammar
as
source of inspiration for more ways to make error reporting even better.

Summary

To generate good error messages from a parser, you need to distinguish between
expected and unexpected match failures. The sequential alternative || is a
tool you can use to turn unexpected match failures into error messages by
raising an exception from the second branch of the alternative.

Subscribe to the Perl 6 book mailing list

* indicates required

  • Article By :

Random Article You May Like

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*