Gerard Goossen - Changing the Perl 5 optree build process into a Abstract Syntax Tree generation and a code generation step


Title: Changing the Perl 5 optree build process into a Abstract Syntax Tree generation and a code generation step

Name: Gerard Goossen

Grant Manager: Jeff Horwitz

Duration: One month

Started: September 2009

Synopsis:
This project is to change the Perl 5 optree build process into a step where the parser generates an Abstract Syntax Tree (AST) and a step where the executable code in the form of a list of function pointers (probably with arguments in some form) is generated from the AST.

Because this project is refactoring of a major part of the Perl 5 core, a good benchmark of Perl 5 to measure the effect on performance of the changes would be required.

Benefits to the Perl Community:
Changing the Perl 5 optree build process into two steps does have the following advantages:

  • Changing the target for the code generation will be easier. Especially using LLVM could benefit from the more flexible code generation step.
  • The Abstract Syntax Tree will be simpler than the current optree, and could be easier analysed.
  • The separation of the generation of the Abstract Syntax Tree and the code generation increases the maintainability.
  • Having two steps separate steps for the code generation might introduce some overhead, but the increased flexibility should compensate for this and might even improve performance.
  • Optimisations which don't fit very will into the current model of one function per OP would be possible.
  • code generation could be just-in-time, i.e. at the first time a sub is called. Saving the cost of generating code for not-used subs, which is might also make it advantages to do more aggressive optimisations.
  • The change does not require any change to Perl code (except of course some B:: modules) nor would it require changes to XS code.

Deliverables

  • The Perl 5 optree build process is changed to an Abstract Syntax Tree and code generation step.
  • Documentation of the Abstract Syntax Tree.
  • Documentation of the code generation from the Abstract Syntax Tree.
  • A benchmark for Perl 5
  • Investigate/try using LLVM as backend for Perl 5.

Project Details

For example the current optree for program C< for(@ARGV) { print "arg: $_\n"; } > is:

{
1 TYPE = leave ===> DONE
TARG = 1
FLAGS = (VOID,KIDS,PARENS)
PRIVATE = (REFCOUNTED)
REFCNT = 1
{
2 TYPE = enter ===> 3
}
{
3 TYPE = nextstate ===> 4
FLAGS = (VOID)
LINE = 1
PACKAGE = "main"
}
{
20 TYPE = leaveloop ===> 1
FLAGS = (VOID,KIDS)
{
8 TYPE = enteriter ===> 18
FLAGS = (LIST,KIDS,STACKED)
{
TYPE = null ===> (4)
(was pushmark)
FLAGS = (SCALAR)
}
{
TYPE = null ===> (7)
(was list)
FLAGS = (LIST,KIDS,MOD)
{
4 TYPE = pushmark ===> 5
FLAGS = (SCALAR,MOD)
}
{
6 TYPE = rv2av ===> 7
TARG = 1
FLAGS = (SCALAR,KIDS,REF,MOD)
{
5 TYPE = gv ===> 6
FLAGS = (SCALAR)
}
}
}
{
7 TYPE = gv ===> 8
FLAGS = (SCALAR)
}
}
{
TYPE = null ===> (20)
FLAGS = (VOID,KIDS)
{
19 TYPE = and ===> 20
FLAGS = (VOID,KIDS)
OTHER ===> 9
{
18 TYPE = iter ===> 19
FLAGS = (SCALAR)
}
{
21 TYPE = lineseq ===> 0
FLAGS = (VOID,KIDS)
{
9 TYPE = nextstate ===> 10
FLAGS = (VOID)
LINE = 1
PACKAGE = "main"
}
{
16 TYPE = print ===> 17
FLAGS = (VOID,KIDS)
{
10 TYPE = pushmark ===> 11
FLAGS = (SCALAR)
}
{
TYPE = null ===> (16)
(was stringify)
FLAGS = (SCALAR,KIDS)
{
TYPE = null ===> (11)
(was pushmark)
FLAGS = (SCALAR)
}
{
15 TYPE = concat ===> 16
TARG = 3
FLAGS = (SCALAR,KIDS,STACKED)
{
13 TYPE = concat ===> 14
TARG = 2
FLAGS = (SCALAR,KIDS)
{
11 TYPE = const ===> 12
FLAGS = (SCALAR)
SV = PV("arg: "\0)
}
{
TYPE = null ===> (13)
(was rv2sv)
FLAGS = (SCALAR,KIDS)
{
12 TYPE = gvsv ===> 13
FLAGS = (SCALAR)
}
}
}
{
14 TYPE = const ===> 15
FLAGS = (SCALAR)
SV = PV("\n"\0)
}
}
}
}
{
17 TYPE = unstack ===> 18
FLAGS = (VOID)
}
}
}
}
}
}

This would be changed to an Abstract Syntax Tree of:

{
TYPE = block
FLAGS = (VOID)
{
TYPE = forloop
FLAGS = (VOID)
{
TYPE = null
}
{
TYPE = rv2av
FLAGS = (SCALAR,REF,MOD)
{
TYPE = gv
FLAGS = (SCALAR)
REF = *ARGV
}
}
{
TYPE = block
{
TYPE = print
FLAGS = (VOID)
{
TYPE = stringify
FLAGS = (SCALAR)
{
TYPE = const
FLAGS = (SCALAR)
SV = PV("arg: "\0)
}
{
TYPE = rv2sv
FLAGS = (SCALAR,KIDS)
{
TYPE = gvsv ===> 13
FLAGS = (SCALAR)
}
}
{
TYPE = const
FLAGS = (SCALAR)
SV = PV("\n"\0)
}
}
}
}
}
}

Which would be converted to the code:

pp_enter()
pp_nextstate(state)
pp_pushmark()
pp_gv(gv)
pp_rv2av(SCALAR|REF|MOD)
pp_gv(gv)
pp_enteriter()
loop:
pp_iter()
pp_and(&leaveloop)
pp_nextstate(state)
pp_pushmark()
pp_const("arg: ")
pp_gvsv(gvsv)
pp_concat()
pp_const("\n")
pp_concat()
pp_print(VOID)
pp_unstack()
goto loop
leaveloop:
pp_leaveloop()
pp_leave()

But for example the print could be more aggressively compiled into something like:

...
pp_pushmark()
res1 = concat(PV("arg: "), gvsv)
res2 = concat(res1, "\n")
pp_push(res2)
pp_print(VOID)
...

Inch-stones

There is a lot of flexibility in this project, because a lot of time
has to be spent benchmarking/profiling/optimizing the code.

  • Writing a benchmark for Perl 5.
  • Leave the optree building process as it currently is (including linked-list generation and peephole optimisation). But add a code generation step, which just uses the C<< op->op_next >> linked-list to convert them to C<pp_xxx(op)> calls
  • Adjust B modules and other modules which use the hook into runops and similar things.
  • Directly generate the code without the intermediate C<< op->op_next >> linked-list.
  • Move the peephole optimisations and constant folding to the code generation step.
  • Simplify the Abstract Syntax Tree, by moving things the logic to the code generation step, like the C<for> loop in the example above.
  • Optimise everything using the benchmark.
  • Use LLVM as backend.

Project Schedule

One month full-time, starting about a month after acceptance of this proposal.

Bio:
About Two and a halve years ago I start Perl Kurila, an experimental fork of Perl 5, through it I have any about every part of the Perl internals, specifically I have done some optree rewriting. And some optree analysis as part of the Perl5-to-Perl5 converter using the annotated optree generated by Perl with Misc Attribute Decoration (MAD).

Amount Requested:
$5000