Slux development status
Slux is a "high-level" runtime-typed virtual machine, designed to be
both general and an appropriate target for Interactive Fiction languages.
It offers garbage collection; dynamic typing; support for currying
and closures; primitive types (31-bit integers, strings, names,
objects, fixed-layout structs, and dynamically-sized heterogenous
arrays) for speed; function-scoped, object-scoped, global-scoped, and
dynamically-scoped variables; opcode-based for performance, but with opcodes cascading to message
sends to allow "operator overloading"; pseudoclasses to define methods for
the primitive types; virtual primitive types implemented in interpreted
code; opcode variations allow targeting by multiple languages with
different semantics; compiler-configurable multiple-inheritence lookup rules;
efficiently encoded program files; save/restore and save/undo for IF.
Specification
As of Sep 15, 2002:
- The preliminary Slux Virtual Machine specification,
v. 1.0.0.14
Please report bugs in the specification to buzzard at nothings dot org!
I'm fairly certain some of the opcode links are broken, which means the
opcodes are undefined, but I'm not really up to clicking on them all. If
you see an opcode that you don't know what it is and it's not defined
anywhere, tell me.
Compiling to Slux
A guide to compilation strategies for dealing with Slux:
Rationale
- A rationale for many of the design decisions
made for Slux.
FastSlux, an implentation
FastSlux is a high-performance implementation of the Slux
specification. (There is no 'simple' reference specification,
because I don't have the time to do both.)
FastSlux features:
- object references stored as direct pointers (no object table), with tag bit
- integers stored with tag bit, allow direct computation for add/sub
- about 75% of instructions are implemented in the interpreter inner loop for max performance
- mark-and-sweep-plus-compaction garbage collection
- core engine, to be wrapped in more sophisticated I/O layers
- recompiled internal bytecode for performance (still register-based)
- some inspiration drawn from the OCaml bytecode interpreter
- method-inheritence cache
- object/hashes stored as hash tables or unsorted arrays depending on keys
- run-time selectable N names stored as direct references on objects
- hopefully under 6K lines of (fairly dense) code
Status, as of Jan 20, 2003
Noticed that 'testprop' wasn't working; fixed it. Now
*really* finished testing the object/property/plist opcodes.
About 90 opcodes tested.
I've started writing a test codebase to test all the opcodes. A lot
of infrastructure (an interpreted routine to print out entities) and
I've tested all the string operators so far.
I did some performance testing at Iain Merrick's suggestion,
and got around a 8:1 ratio of FastSlux to C code, doing all
integer operations which FS was constantly typechecking, and
with the inefficiencies in the interpreter dispatch loop described below.
I was testing this function:
int fib(int x)
{
if (x < 2) return x;
return fib(x-1) + fib(x-2);
}
which I manually compiled to (destinations on the left, # means a constant):
jumpge r0,#2,skip
return r0
skip:
sub r0,r0,#1
func r1,#$fib,[r0] ; parameter list is 'r0'
sub r0,r0,#1
func r0,#$fib,[r0]
add r0,r0,r1
return r0
A compiler would probably use additional registers to store
the temporaries, but wouldn't need any additional instructions.
Looked at disassembly of inner loop under MSVC. Mostly good; pays
an unnecessary rangecheck on every switch, as expected; all three of
the crucial variables are register allocated as I requested.
Here is what gets executed between the time 'noop' gets dispatched
and the computed jump of the next instruction.
004024C5 inc esi
004024C6 jmp _callEntryPoint+1A80h (00403df0)
00403DF0 xor eax,eax
00403DF2 mov al,byte ptr [esi]
00403DF4 cmp eax,0BBh
00403DF9 jbe _callEntryPoint+146h (004024b6)
004024B6 mov edx,0FFFFFFFCh
004024BB or ecx,0FFh
004024BE jmp dword ptr [eax*4+403E40h]
esi is the instruction pointer; the first instruction is the
entire implementation of noop.
I have no idea what the edx and ecx stuff at the end there are.
ecx and edx elsewhere are temporaries. I was thinking maybe this
was a subexpression being pulled up before the switch, but I couldn't
find edx or ecx being used anywhere without being overwritten first.
And 'or ecx,0ffh' is a bizarre instruction anyway.
Here's the implementation of 'return 0'; rapid call and return
is crucial.
004029F2 mov edi,dword ptr [edi-4]
004029F5 mov esi,dword ptr [edi-0Ch]
004029F8 mov ebp,dword ptr [edi-8]
004029FB mov dword ptr [esp+10h],ebp
004029FF movsx edx,byte ptr [esi-1]
00402A03 mov dword ptr [edi+edx*4],0
00402A0A jmp _callEntryPoint+1A80h (00403df0)
First it restores the VM frame pointer, edi. Then it loads the
saved VM ip into esi, and the saved VM literal-pool pointer into ebp, and
then saves tha onto the hardware stack even though it's register
allocated; I don't know why. Next it fetches, out of the old instruction
stream, the destination register that the function call return
value is supposed to write to, and puts that in edx. Then it
writes a 0 there. Then off to the next instruction.
Building a new stack frame is a bit slower, sadly.
Done features:
- begun testing/debugging of opcodes
- string opcodes checked
- array opcodes checked
- integer opcodes checked, including exact following of division/mod specifiction
- comparisons and branches checked, including computed branch
- return checks, including internal return variations
- object/plist get/set/delete opcodes
- totally rewrote the encoding of entity types to allow for efficient UNDO processing
- fixed all the string and object opcodes to overload and throw errors as appropriate
- added dispatch to curried functions, but only for 'func'; decided you can't
bind a curried function to an object to create a method, because curried functions
have their own 'self' and so it just wouldn't make sense
- made all messages dispatch to pseudoclasses if target isn't an object (had this
for opcode overloading, but not for regular messages)
- fixed bug in opcode decoding in the error handler
- renumber registers so function parameters appear in the registers specified in the Slux spec
- opcode error handling
- opcode overloading, pseudoclass overloading
- pseudoclass cacheing
- Wrote an assembler using the C-preprocessor and C code, which
will make it a lot easier for me to write test code. Still have
to manually compute branch offsets, though. Successfully ran a
"guess a number from 1 to 100" program.
- 100% of opcodes from 0x00-0x7F
- struct, copy, destroy, become
- currying
- arithmetic opcodes
- branch and comparison opcodes
- array opcodes
- string opcodes
- property get/set opcodes
- function-call and message-send opcodes
- object opcodes
- trap0..trap7
- super/dsuper opcodes
- console I/O
- fast string->name conversion with hash table
- multiple object implementations:
- hash table for name/integer indexing
- linear-probed array for arbitrary indexes
- direct-access for name access on global object
- method-lookup cache
- fast name-lookup on global object
- runtime selectable n names that are fast lookup on all objects
- file loading
- recompilation of Slux bytecodes into internal bytecodes
- GC marking
Still TODO:
- finish debugging/testing
- missing opcodes, 0xE0-0xEF
- time functions
- VM control
- debugging
- saveundo/undo
- save/restore
- #classancestors for configurable multiple-inheritence
- safecopy() for passing data around undo/restore
- GC sweeping and compaction
- 'goto label_table[opcode]'-style opcode dispatch under gcc
The current codesize breakdown, in lines of code:
- 1000 parsing and translating from Slux to internal format
- 1000 interpreter inner loop; 129 (internal) opcode implementations
- 850 set/get/find properties on objects; object opcodes; method-lookup cache
- 550 string & array opcode implementations
- 150 name allocation, string to name lookup
- 250 opcode overload handling and error handling
- 200 garbage collection
- 100 cacheglobal, cachepseudo handling
- 100 top level interface
- 100 console I/O, I/O opcodes