Slux

Slux development status

Slux is a "high-level" runtime-typed virtual machine, designed to be both general and an appropriate target for Interactive Fiction languages.

It offers garbage collection; dynamic typing; support for currying and closures; primitive types (31-bit integers, strings, names, objects, fixed-layout structs, and dynamically-sized heterogenous arrays) for speed; function-scoped, object-scoped, global-scoped, and dynamically-scoped variables; opcode-based for performance, but with opcodes cascading to message sends to allow "operator overloading"; pseudoclasses to define methods for the primitive types; virtual primitive types implemented in interpreted code; opcode variations allow targeting by multiple languages with different semantics; compiler-configurable multiple-inheritence lookup rules; efficiently encoded program files; save/restore and save/undo for IF.

Specification

As of Sep 15, 2002:

The preliminary Slux Virtual Machine specification, v. 1.0.0.14

Please report bugs in the specification to buzzard at nothings dot org! I'm fairly certain some of the opcode links are broken, which means the opcodes are undefined, but I'm not really up to clicking on them all. If you see an opcode that you don't know what it is and it's not defined anywhere, tell me.

Compiling to Slux

A guide to compilation strategies for dealing with Slux:

compilation strategies

Rationale

A rationale for many of the design decisions made for Slux.

FastSlux, an implentation

FastSlux is a high-performance implementation of the Slux specification. (There is no 'simple' reference specification, because I don't have the time to do both.)

FastSlux features:

object references stored as direct pointers (no object table), with tag bit
integers stored with tag bit, allow direct computation for add/sub
about 75% of instructions are implemented in the interpreter inner loop for max performance
mark-and-sweep-plus-compaction garbage collection
core engine, to be wrapped in more sophisticated I/O layers
recompiled internal bytecode for performance (still register-based)
some inspiration drawn from the OCaml bytecode interpreter
method-inheritence cache
object/hashes stored as hash tables or unsorted arrays depending on keys
run-time selectable N names stored as direct references on objects
hopefully under 6K lines of (fairly dense) code

Status, as of Jan 20, 2003

Noticed that 'testprop' wasn't working; fixed it. Now *really* finished testing the object/property/plist opcodes.

About 90 opcodes tested.

I've started writing a test codebase to test all the opcodes. A lot of infrastructure (an interpreted routine to print out entities) and I've tested all the string operators so far.

I did some performance testing at Iain Merrick's suggestion, and got around a 8:1 ratio of FastSlux to C code, doing all integer operations which FS was constantly typechecking, and with the inefficiencies in the interpreter dispatch loop described below.

I was testing this function:

int fib(int x)
{
   if (x < 2) return x;
   return fib(x-1) + fib(x-2);
}

which I manually compiled to (destinations on the left, # means a constant):

   jumpge  r0,#2,skip
   return  r0
 skip:
   sub     r0,r0,#1
   func    r1,#$fib,[r0]  ; parameter list is 'r0'
   sub     r0,r0,#1
   func    r0,#$fib,[r0]
   add     r0,r0,r1
   return  r0

A compiler would probably use additional registers to store the temporaries, but wouldn't need any additional instructions.

Looked at disassembly of inner loop under MSVC. Mostly good; pays an unnecessary rangecheck on every switch, as expected; all three of the crucial variables are register allocated as I requested. Here is what gets executed between the time 'noop' gets dispatched and the computed jump of the next instruction.

004024C5   inc         esi
004024C6   jmp         _callEntryPoint+1A80h (00403df0)

00403DF0   xor         eax,eax
00403DF2   mov         al,byte ptr [esi]
00403DF4   cmp         eax,0BBh
00403DF9   jbe         _callEntryPoint+146h (004024b6)

004024B6   mov         edx,0FFFFFFFCh
004024BB   or          ecx,0FFh
004024BE   jmp         dword ptr [eax*4+403E40h]

esi is the instruction pointer; the first instruction is the entire implementation of noop. I have no idea what the edx and ecx stuff at the end there are. ecx and edx elsewhere are temporaries. I was thinking maybe this was a subexpression being pulled up before the switch, but I couldn't find edx or ecx being used anywhere without being overwritten first. And 'or ecx,0ffh' is a bizarre instruction anyway.

Here's the implementation of 'return 0'; rapid call and return is crucial.

004029F2   mov         edi,dword ptr [edi-4]
004029F5   mov         esi,dword ptr [edi-0Ch]
004029F8   mov         ebp,dword ptr [edi-8]
004029FB   mov         dword ptr [esp+10h],ebp
004029FF   movsx       edx,byte ptr [esi-1]
00402A03   mov         dword ptr [edi+edx*4],0
00402A0A   jmp         _callEntryPoint+1A80h (00403df0)

First it restores the VM frame pointer, edi. Then it loads the saved VM ip into esi, and the saved VM literal-pool pointer into ebp, and then saves tha onto the hardware stack even though it's register allocated; I don't know why. Next it fetches, out of the old instruction stream, the destination register that the function call return value is supposed to write to, and puts that in edx. Then it writes a 0 there. Then off to the next instruction.

Building a new stack frame is a bit slower, sadly.

Done features:

begun testing/debugging of opcodes
- string opcodes checked
- array opcodes checked
- integer opcodes checked, including exact following of division/mod specifiction
- comparisons and branches checked, including computed branch
- return checks, including internal return variations
- object/plist get/set/delete opcodes
totally rewrote the encoding of entity types to allow for efficient UNDO processing
fixed all the string and object opcodes to overload and throw errors as appropriate
added dispatch to curried functions, but only for 'func'; decided you can't bind a curried function to an object to create a method, because curried functions have their own 'self' and so it just wouldn't make sense
made all messages dispatch to pseudoclasses if target isn't an object (had this for opcode overloading, but not for regular messages)
fixed bug in opcode decoding in the error handler
renumber registers so function parameters appear in the registers specified in the Slux spec
opcode error handling
opcode overloading, pseudoclass overloading
pseudoclass cacheing
Wrote an assembler using the C-preprocessor and C code, which will make it a lot easier for me to write test code. Still have to manually compute branch offsets, though. Successfully ran a "guess a number from 1 to 100" program.
100% of opcodes from 0x00-0x7F
- struct, copy, destroy, become
- currying
- arithmetic opcodes
- branch and comparison opcodes
- array opcodes
- string opcodes
- property get/set opcodes
- function-call and message-send opcodes
- object opcodes
- trap0..trap7
- super/dsuper opcodes
- console I/O
fast string->name conversion with hash table
multiple object implementations:
- hash table for name/integer indexing
- linear-probed array for arbitrary indexes
- direct-access for name access on global object
method-lookup cache
fast name-lookup on global object
runtime selectable n names that are fast lookup on all objects
file loading
recompilation of Slux bytecodes into internal bytecodes
GC marking

Still TODO:

finish debugging/testing
missing opcodes, 0xE0-0xEF
- time functions
- VM control
- debugging
saveundo/undo
save/restore
#classancestors for configurable multiple-inheritence
safecopy() for passing data around undo/restore
GC sweeping and compaction
'goto label_table[opcode]'-style opcode dispatch under gcc

The current codesize breakdown, in lines of code:

1000 parsing and translating from Slux to internal format
1000 interpreter inner loop; 129 (internal) opcode implementations
850 set/get/find properties on objects; object opcodes; method-lookup cache
550 string & array opcode implementations
150 name allocation, string to name lookup
250 opcode overload handling and error handling
200 garbage collection
100 cacheglobal, cachepseudo handling
100 top level interface
100 console I/O, I/O opcodes