Compiling to Slux: Implementation Strategies
Local variables
Register allocation
Each variable can be assigned to its own register,
since registers are private to each function frame.
Heap allocation
At entry to a function, you can allocate a static-struct,
and store that in a register. Accesses into static structs
are encoded efficiently and accessed at very high speeds
(potentially without any checking by the terp), although
of course this is notably slower than storing variables in
registers.
If more than 256 heap-allocated variables are needed,
allocate multiple static-structs. (One could try to use
an array to get past the 256-item limit, but using an array
is implausible, as it would require storing somewhere all the literal
integer indices used to index into the array, which
is difficult.)
Support for more than 125 local variables
Slux only allows 125 registers; if more than 125 local
variables and compiler-generated temporary registers are
needed, some of the variables should be
heap allocated. Classical register-allocation
strategies can be employed to avoid constantly saving/loading
all the variables that aren't permanently register allocated.
Continuations
You can define nested blocks of code which can be
passed around as functions by simply defining them
as their own functions. If the containing function
has some values that are accessed by the nested code--
parameters or registers--which are constant for the
life of the nested function, these can be shared with the
nested function by using currying.
If there are variables in the containing function which
can be modified by the the nested function (they
"escape"), they can be allocated as heap-allocated
variables, and the static-struct or array containing
them can be curried in.
For example, consider the following hypothetical
language which supports arbitrary iterators using
continuations:
countProperty(object o, name p)
{
int n=0;
o->children(
func(x)
{
if (x->p())
n = n+1;
}
);
return n;
}
An implementation can break this up into two different
functions:
countProperty(object o, name p)
{
static-struct a = [ 0 ]; // n
func f = curry(countProperty#nested, a, p);
o->children(f);
return a[0];
}
countProperty#nested(static-struct a, name p, element x)
{
if (x->p())
a[0] = a[0]+1;
}
Constants
Literal Pool
Each function can have up to 128 unique constants in its
literal pool. Most functions should use fewer than 128
unique constants.
Heap-allocated literals
Additional constants should go in static-structs.
Unlike local variables, these should be allocated once,
at compile time, and a reference to them placed in the
literal pool. It would be very difficult to store extra
constants in an array, as constant integers would be
needed to index into the array, and those constant
integers would need to be stored somewhere, or else
computed on the fly from other constant integers.
note for future reference
In fact, the issue here is really more a limitation of
the specification than of what an interpreter can
reasonably do. It wouldn't be hard to have interpreters
have extra opcodes that index into an array using a
constant 16-bit integer that's encoded into the bytestream
rather than the literal pool. The problem is that there's
no way to express this in the specification, even if both
ends of the pipeline (the compiler and the terp) are happy
with the idea. Two plausible strategies are to add such
opcodes to the specification itself, and/or change
structset and structget to use 16-bit values (which
the terp can recompile down to 8-bit in the common case).
Global Variables
Global Object implementation
The natural implementation of global variables is to store
them as (name, value) pairs on the global object, and access
them with getpropself and setpropself. This will be fairly
efficient on most interpreters. However, it will require the
allocation of a number of name objects which may not serve
any other purpose. Also, implementations may have a "fast route"
for handling names on the global object, but they will still
need to update a normal "object"-style representation since
such data can also be accessed by calling getprop on the
global object, or even by sending a message to the global object.
Static-Struct Implementation
An alternative strategy is to allocate one or more static-structs
at compile time. References to these static structs would appear
in the functions that referenced the appropriate global variables.
This will probably execute faster since there is no need to update
the global object's "object representation". (Both implementations
require typechecking--structset checks that its structure parameter
is of the right type, and setglobalprop checks that its name parameter
is really a name; in both of these cases the argument will be a
literal so it's possible the typechecking will be suppressed.)