It offers garbage collection; dynamic typing; support for currying and closures; primitive types (31-bit integers, strings, names, objects, fixed-layout structs, and dynamically-sized heterogenous arrays) for speed; function-scoped, object-scoped, global-scoped, and dynamically-scoped variables; opcode-based for performance, but with opcodes cascading to message sends to allow "operator overloading"; pseudoclasses to define methods for the primitive types; virtual primitive types implemented in interpreted code; opcode variations allow targeting by multiple languages with different semantics; compiler-configurable multiple-inheritence lookup rules; efficiently encoded program files; save/restore and save/undo for IF.
As of Sep 15, 2002:
Please report bugs in the specification to buzzard at nothings dot org! I'm fairly certain some of the opcode links are broken, which means the opcodes are undefined, but I'm not really up to clicking on them all. If you see an opcode that you don't know what it is and it's not defined anywhere, tell me.
FastSlux features:
Noticed that 'testprop' wasn't working; fixed it. Now *really* finished testing the object/property/plist opcodes.
About 90 opcodes tested.
I've started writing a test codebase to test all the opcodes. A lot of infrastructure (an interpreted routine to print out entities) and I've tested all the string operators so far.
I did some performance testing at Iain Merrick's suggestion, and got around a 8:1 ratio of FastSlux to C code, doing all integer operations which FS was constantly typechecking, and with the inefficiencies in the interpreter dispatch loop described below.
I was testing this function:
int fib(int x)
{
if (x < 2) return x;
return fib(x-1) + fib(x-2);
}
which I manually compiled to (destinations on the left, # means a constant):
jumpge r0,#2,skip return r0 skip: sub r0,r0,#1 func r1,#$fib,[r0] ; parameter list is 'r0' sub r0,r0,#1 func r0,#$fib,[r0] add r0,r0,r1 return r0A compiler would probably use additional registers to store the temporaries, but wouldn't need any additional instructions.
Looked at disassembly of inner loop under MSVC. Mostly good; pays an unnecessary rangecheck on every switch, as expected; all three of the crucial variables are register allocated as I requested. Here is what gets executed between the time 'noop' gets dispatched and the computed jump of the next instruction.
004024C5 inc esi 004024C6 jmp _callEntryPoint+1A80h (00403df0) 00403DF0 xor eax,eax 00403DF2 mov al,byte ptr [esi] 00403DF4 cmp eax,0BBh 00403DF9 jbe _callEntryPoint+146h (004024b6) 004024B6 mov edx,0FFFFFFFCh 004024BB or ecx,0FFh 004024BE jmp dword ptr [eax*4+403E40h]esi is the instruction pointer; the first instruction is the entire implementation of noop. I have no idea what the edx and ecx stuff at the end there are. ecx and edx elsewhere are temporaries. I was thinking maybe this was a subexpression being pulled up before the switch, but I couldn't find edx or ecx being used anywhere without being overwritten first. And 'or ecx,0ffh' is a bizarre instruction anyway.
Here's the implementation of 'return 0'; rapid call and return is crucial.
004029F2 mov edi,dword ptr [edi-4] 004029F5 mov esi,dword ptr [edi-0Ch] 004029F8 mov ebp,dword ptr [edi-8] 004029FB mov dword ptr [esp+10h],ebp 004029FF movsx edx,byte ptr [esi-1] 00402A03 mov dword ptr [edi+edx*4],0 00402A0A jmp _callEntryPoint+1A80h (00403df0)
First it restores the VM frame pointer, edi. Then it loads the saved VM ip into esi, and the saved VM literal-pool pointer into ebp, and then saves tha onto the hardware stack even though it's register allocated; I don't know why. Next it fetches, out of the old instruction stream, the destination register that the function call return value is supposed to write to, and puts that in edx. Then it writes a 0 there. Then off to the next instruction.
Building a new stack frame is a bit slower, sadly.
Done features:
Still TODO:
The current codesize breakdown, in lines of code: