Compiling to Slux: Implementation Strategies

Local variables

Register allocation

Each variable can be assigned to its own register, since registers are private to each function frame.

Heap allocation

At entry to a function, you can allocate a static-struct, and store that in a register. Accesses into static structs are encoded efficiently and accessed at very high speeds (potentially without any checking by the terp), although of course this is notably slower than storing variables in registers.

If more than 256 heap-allocated variables are needed, allocate multiple static-structs. (One could try to use an array to get past the 256-item limit, but using an array is implausible, as it would require storing somewhere all the literal integer indices used to index into the array, which is difficult.)

Support for more than 125 local variables

Slux only allows 125 registers; if more than 125 local variables and compiler-generated temporary registers are needed, some of the variables should be heap allocated. Classical register-allocation strategies can be employed to avoid constantly saving/loading all the variables that aren't permanently register allocated.

Continuations

You can define nested blocks of code which can be passed around as functions by simply defining them as their own functions. If the containing function has some values that are accessed by the nested code-- parameters or registers--which are constant for the life of the nested function, these can be shared with the nested function by using currying.

If there are variables in the containing function which can be modified by the the nested function (they "escape"), they can be allocated as heap-allocated variables, and the static-struct or array containing them can be curried in.

For example, consider the following hypothetical language which supports arbitrary iterators using continuations:

 countProperty(object o, name p)
 {
    int n=0;
    o->children(
       func(x)
       {
          if (x->p())
             n = n+1;
       }
    );
    return n;
 }

An implementation can break this up into two different functions:

 countProperty(object o, name p)
 {
    static-struct a = [ 0 ];   // n
    func f = curry(countProperty#nested, a, p);
    o->children(f);
    return a[0];
 }

 countProperty#nested(static-struct a, name p, element x)
 {
    if (x->p())
       a[0] = a[0]+1;
 }

Constants

Literal Pool

Each function can have up to 128 unique constants in its literal pool. Most functions should use fewer than 128 unique constants.

Heap-allocated literals

Additional constants should go in static-structs. Unlike local variables, these should be allocated once, at compile time, and a reference to them placed in the literal pool. It would be very difficult to store extra constants in an array, as constant integers would be needed to index into the array, and those constant integers would need to be stored somewhere, or else computed on the fly from other constant integers.

note for future reference

In fact, the issue here is really more a limitation of the specification than of what an interpreter can reasonably do. It wouldn't be hard to have interpreters have extra opcodes that index into an array using a constant 16-bit integer that's encoded into the bytestream rather than the literal pool. The problem is that there's no way to express this in the specification, even if both ends of the pipeline (the compiler and the terp) are happy with the idea. Two plausible strategies are to add such opcodes to the specification itself, and/or change structset and structget to use 16-bit values (which the terp can recompile down to 8-bit in the common case).

Global Variables

Global Object implementation

The natural implementation of global variables is to store them as (name, value) pairs on the global object, and access them with getpropself and setpropself. This will be fairly efficient on most interpreters. However, it will require the allocation of a number of name objects which may not serve any other purpose. Also, implementations may have a "fast route" for handling names on the global object, but they will still need to update a normal "object"-style representation since such data can also be accessed by calling getprop on the global object, or even by sending a message to the global object.

Static-Struct Implementation

An alternative strategy is to allocate one or more static-structs at compile time. References to these static structs would appear in the functions that referenced the appropriate global variables. This will probably execute faster since there is no need to update the global object's "object representation". (Both implementations require typechecking--structset checks that its structure parameter is of the right type, and setglobalprop checks that its name parameter is really a name; in both of these cases the argument will be a literal so it's possible the typechecking will be suppressed.)