SLUX

A High-Level Polymorphic Virtual Machine

Version 1.0.0.14
Sep 15 2002
Sean Barrett - sean at nothings dot org

Introduction

This document defines Slux, a byte-coded virtual machine designed primarily as a target machine for adventure game (Interactive Fiction (IF)) languages and secondarily as a generally targetable dynamically-typed virtual machine.

Distinguishing properties of Slux are:

General

garbage collection
dynamic typing
currying support (from which general closures can be synthesized)
function-scoped, object-scoped, global-scoped, and dynamically-scoped variables
object model
- objects inherit from objects (no classes)
- multiple inheritence
- run-time mutable inheritence hierarchy
- new instance variables can be added to objects on the fly
- objects also serve as general hashes
built-in types that are not objects
- 31-bit integers
- immutable strings
- immutable names (hashed strings)
- immutable functions
- mutable, resizeable heterogenous arrays
- mutable fixed-size structs
pseudoclasses for non-object types
- allow defining methods for non-object types
- allow defining behaviors for opcodes
- allow efficient interpretaion of languages in which all data appears to be an object
definition of new "built-in" types
- pseudoclasses define all standard behaviors with interpreted code
- interpreters can acclerate recognized types with native code
supports linking multiple languages together
multiple interpretation options
- a "naive" interpreter will be small and moderately efficient
- sufficient information is availble for an interpreter to do an easy 1-to-1 translation of the bytecode to more efficient forms
- just-in-time or full-native-code recompilation is possible
- compiler-emitted type assertions can simplify the task of omitting typechecking

IF-specific

memory save/load and save/undo
object model favors "all objects are unique and different" instead of "lots of mostly identical objects"

About this document

This document defines the file format for an executable Slux program and the manner in which it executes, as well as an optional file format for Slux "save games" (snapshots of the executable state). [[Interpreters need not support the save game format to be compliant, but it is recommended, since interoperable save games can be convenient for players.]]

Care has been taken to divorce the "on disk" storage format of Slux from its in-memory representation; multiple implementations of many features of Slux are possible.

[[Comments set off like this are not normative (not part of the specification); they are generally verbose rationale descriptions.]]

Where the term program appears, it normally means "the program as it appears in a file"; since interpreters can implement the virtual machine in any way desired, it doesn't make any sense to talk about how the program is stored when being executed.

The notation 0x100 identifies hexadecimal numbers (i.e. decimal 256); numbers without a prefix are decimal.

The notation $foo refers to an entity of type Name associated with the three-character string "foo".

Data Types

Data types are really only defined by the operations on them, but for the sake of exposition this section defines the data types in very general terms.

Element: Not a data type, but a generic term for an entity which can be any of the following data types.
Integers: Numbers on which arithmetic can be performed. They range from -(2^30) to (2^30)-1 [[ i.e. signed 31-bit integers ]]
String: An ordered sequence of characters which can be printed. A number of transformation operations are provided which produce new strings; a given string object, once created, is immutable.
Name: An ordered sequence of characters whose element is identical for identical sequences. (I.e. a name allows detecting equal strings by comparing "pointers".)
Array: An ordered collection of elements (heterogenous) which can be rapidly randomly accessed by index number. Arrays can also be modified in-place (which means that other references to the array will see the changes).
Object: An object is a dictionary--a collection of pairs of elements which supports rapid random access on a 'key'. Additionally objects are the targets of message sends, and also serve double-duty as classes that form a hierarchy, discussed elsewhere; however, objects can also just be used as a structured data type. Objects are modified in-place.
Function: A function is an executable data element--a piece of code to be interpreted by the interpreter.
Curried Function: A curried function is a function which has had one or more of its parameters bound in advance.
Struct: An ordered collection of elements which can only be accessed with "hardcoded" offsets--run-time random indexing is not possible. A Slux file can define multiple different struct types, each of different sizes.
Static-struct: Like a struct, but a static-struct does not have a fixed size. It is a single type which can be used to represent multiple different struct types which have been statically typechecked.
Null: A null element is not the valid target of any operation; it is used as a placeholder when another item is explicitly destroyed, so that dangling references still point to something.

File Format

Program File

A program file consists of the following elements:

Offset (in bytes) Size (in bytes) Field
0 16 Signature: "sLUx pRogrAm\012\032\015\377"
16 16 Header:
16 4 checksum
20 4 version number
24 4 global-object element #
28 4 num-db-entries
32 32 VM configuration:
32 4 number of VM-classes
36 4 fast hint array element #
40 24 reserved, set to 0 on write, ignore on read
64 program data

Version number

The version number consists of four components, each of which is one byte long, in the following order: the product version number, the major version number, the minor version number, and the release number.

Compilers should encode what version number of this specification they are generating code against. Interpreters can check the version number and determine if that version is supported. To judge whether an interpreter written against some version of this specification can execute a given program file, consider the following rules:

release number: Incremented for every release of the specification. It is the only number which will increase for changes to the specification which do not affect the virtual machine (e.g. for specification changes which are clarifications or other corrections).
minor version number: Incremented on changes to the specification which are basically forwards and backwards compatible. (See discussion for special case when major version number is 0.) Reset when the major version number changes.
major version number: Incremented when a change to the specification is backwards compatible (a new interpreter can run old programs, but an old interpreter will not be able to run new programs); e.g. introduction of a new opcode or datatype. Reset when the product version number changes.
product version number: Incremented when a change to the specification prevents backwards compatibility. Release numbers are reset on such increments.

If the major version number is 0, the product is in prerelease; during this time, increments of minor version numbers may not be either forwards or backwards compatible. If the minor version number is 0, the major version is in prerelease; during this time, versions will only update the release version number, but may not be forwards and backwards compatible.

Release versions of compilers should not output version numbers that include 0's, because the feature sets of such versions are too unpredictable for interpreters to accurately gauge their abilitiy to execute programs written to such versions.

Global object element #

This value is a big-endian 32-bit integer which is interpreted as an element reference (see section on element encoding). It is used to determine the entry points into the program; see the discussion of entry points in the next section.

num-db-entries

This is the number of entities which will appear in the entity database after the VM class names.

VM configuration section

The VM configuration section of the header is used to configure aspects of the VM that must be defined at load time, before any code starts running (and thus cannot be passed using normal opcodes), and perhaps before data is loaded into the VM.

Fast hint array #

This field should either contain an element reference to an array or should be 0. If it is a reference to an array, the VM should take the elements in the array as a hint to accelerate operations on those elements. If the VM cannot accelerate operations on all of the elements, it should favor the lowest-indexed elements in the array. The operations accelerated depends on the type of the element: for a name, accelerate all property-lookups on that name; for an object, accelerate all property-lookups on that object; for an array, accelerate dynamic resizing of the array; for a string, accelerate printing and converting to arrays and names; for a function, accelerate execution.

[[ For example, a VM might hardcode four properties per object in fixed locations, and thus accelerate four names. It might allow a certain number of objects to use fast hash tables instead of slower-but-smaller data structures; it might choose to not keep strings compressed in memory; for a function, it might recompile the function into a faster execution format. Alternately, of course, a VM might totally ignore the hint array, and accelerate everything, or nothing, or perform run-time analysis. ]]

Format of Data

After the header, a Slux program contains a list of VM class names followed by a database stored as a sequence of data items.

VM Class Names

In the header is a list of the number of VM classes in the file. This section of the file lists names for all of the VM classes. Each name appears as a length byte n, followed by n bytes which are interpreted as an ASCII string which is the name of the class. All of the bytes of the string must be >= 32 and < 127. A length byte of 0 indicates the end of the list.

The first name defined in the VM class name section is assigned the internal class ID 1; the second gets internal class ID 2, etc.

The interpreter will recognize some of these names and automatically know how to parse them. All interpreters know the following VM class names:

int
null
name
string
object
array
function
curried
static-struct

Some interpreters may recognize other names as well. Extra names handled by interpreters should be of the format 'x-CONTEXT-TYPENAME', where 'x' is a literal character x, 'CONTEXT' is a string name representing the application, and 'TYPENAME' is the name of the type. [[ For example, an interpreter extended to support Tads3 might include 'x-T3-grammar_object'. ]]

It is perfectly valid for the list of class names to include names the interpreter doesn't recognize. Such classes will have implementations solely through operator overloading defined on the corresponding pseudoclass (assuming there is one). If the interpreter doesn't know the type, and there is no pseudoclass defined for it, then it is treated as a plain 'struct' that is only manipulable with 'struct', 'structget', and 'structset'.

References to Data

References to data in a Slux program are encoded in a 32-bit value which is called an "element". Integers are restricted to only 31 bits (sign-extended), and appear directly, not referenced. The "name" data type uses values in the range 0x40000000-0x7FFFFFFF, and all other data types use values in the range 0x80000001-0xBFFFFFFF:

   Meaning             Hex Range           Bit Value
  positive integer    00000000-3FFFFFFF   00xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  name                40000000-7FFFFFFF   01xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 
  other types         80000001-BFFFFFFF   10xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 
  negative integer    C0000000-FFFFFFFF   11xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Note that the value 0x80000000 is reserved as an "out of band" value that should never appear in valid data.

[[Because Slux defines integer operations to be on 31-bit values, and because the above encoding correctly sign-extends the 31-bit value to 32-bits, the above encoding is a natural representation for storing elements in an interpreter. However, it is perfectly valid for an interpreter to unify all data types to be indirected, instead. (Furthermore, note that in the final file format, the above data is further compressed, so the above encoding represents an idealized format that an interpreter never need explicitly encode.)]]

Compilers and interpreters are encouraged to favor "other types" and "names" being generated in a compact region of their ranges, starting at the lowest values. [[This maximizes compression and allows interpreters to use a directly indexed table to map coded numbers to pointers, rather than a hash table.]]

Database

After the VM class names comes a sequence of "data items"; there are <b>num-db-entries</b> of these, and then the file ends. Ignoring the use of compression, the sequence of data items appears as follows:

Offset (in bytes) Size (in bytes) Field
0 4 element # to define
4 4 class-id typecode, a small integer stored as an element
8 4 size of uncompressed element data in bytes, stored as an element
12 size ...type-specific-data...

The sequence of data items must be sorted by their first field, the element # defined. The class-id typecode is an index into the class names as described above.

If an element is referenced in type-specific data, then that element must be defined somewhere in the file, or else the program file is invalid. If an element # is defined multiple times in the file, the program file is invalid.

Type-Specific Data

Integers

Because integers are used directly, rather than by reference, integers never appear except as data within other types. It is invalid for the class-id typecode corresponding to 'int' to appear in an element header.

Strings

The first byte of a string is a compression code. The only current defined compression mode is '0', which means no compression; the data following consists of size bytes representing the string in order. All 256 byte values can legally appear in a string. A 0-length string is valid (but will still have the one-byte compression code). [[This needs to be extended to incorporate Unicode support.]]

Name

A name appears as a sequence of size bytes. All 256 byte values can legally appear in a name. The same sequence of bytes for a name cannot appear for more than one name in the file; every name must be unique. EXCEPTION: The 0-length name can appear multiple times, and each such name is considered different. [[This is used to allow names to be used for message dispatch but to allow the textual versions of the names to be stripped from the executable. But note that entry-point names and pseudoclass names must not be stripped.]]

Names whose first byte is 0 are reserved for use by the specification, so in the interest of forwards- and backwards-compatibility, compilers should avoid using such names. All names indicated in this specification with a number-sign prefix actually mean a 0-byte prefix instead of the number-sign.

Array

An array appears as a sequence of size/4 elements (size must be a multiple of four), which sequentially define the 0th element of the array, the 1st element of the array, etc. A 0-length array is valid.

Object

An object has an initial element, called its "class", followed by (size-4)/8 pairs of elements (size must be an odd multiple of four), each of which defines a <key, value> pair, in the order key, then value. The order of pairs is meaningless, and no key can appear more than once. The set of pairs can be empty.

Null

size must be 0.

Function

A function consists of three major subcomponents: the function signature, the function bytecode, and the function literal pool. These components are stored as follows in the type-specific data:

Offset (in bytes) Size (in bytes) Field
0 1 number of parameters/arguments function takes
1 1 number of variables in the function (max 125) [includes parms]
2 1 count of elements in literal pool
3 1 reserved for future use, must be 0
4 4*count elements of literal pool
4+4*count ... bytecode

The executable bytecode is a stream of (uncompressed) bytes which fills out the remaining data. The executable bytecode is defined in a separate section.

Curried Function

A curried function is a function in which some of its arguments have been bound; specifically, count = (size - 12)/4 of them.

Offset (in bytes) Size (in bytes) Field
0 4 function which has been curried
4 4 'self' element
8 4 'class' element
12 4*count bound parameters

Additional types

All further types are interpreted identically to array, as a series of size/4 elements. Of course they create elements of the appropriate type, rather than arrays. The data in these types can only be accessed through the 'structget' and 'structset' operands.

Each of these other types must use a single fixed size for all of its instances. If two entries of the same type have different size, or if the opcode 'struct' appears in a bytestream defining an object of this type of a different size--even if that opcode is never executed--it is an error.

The type static-struct is special in that it does not follow the rules of the previous paragraph. Entries of type static-struct can be of different sizes. VMs are not obligated to range check accesses to static-struct, so this should only be used if the compiler has statically verified the type of the entity.

Compressed File Format

The encoding of all the program data described above consists of one of two things: a 32-bit element reference or an 8-bit byte. (All 32-bit counts are actually restricted to signed 31-bit integers, thus making them valid element references.)

Therefore, the compressed file format simply defines how 32-bit elements are compressed and how 8-bit bytes are compressed.

To maximize compressability using "zip" or other compression programs, the compressed file format leaves all data compressed to byte boundaries. Therefore, 8-bit data is uncompressed (except for compressed strings). 32-bit element data is compressed depending upon its absolute value, ignoring the type field.

First, for clarity, I define how to encode a 32-bit element for storage in the file format:

Compressing element references

Step 1: Normalize

Given a 32-bit number "yzxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", rotate it left by two bits, producing "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyz". This shifts the two-bit type code down to the bottom bits.

Now, if and only if the bottom two bits are both 1 (that is, (value & 3) == 3), then xor the number with 0xFFFFFFFC. This takes the one's complement of negative integers.

After this step, "small" numbers and small element references will have all 0's in their top bits.

Step 2: Compress

Let the current number be value. If value < 0x80, then output value as a single byte.

Otherwise, if value < 0x4000, output the 2-byte big-endian number value+0x8000.

Otherwise, if value < 0x200000, output the 3-byte big-endian number value+0xC00000.

Otherwise, if value < 0x1F000000, output the 4-byte big-endian number value+0xE0000000.

Otherwise, output the byte 0xFF, followed by the 4-byte big-endian number value.

Decompressing element references

Lookahead a byte. Look the byte up in the following table, then read the indicated N-byte big-endian number, and subtract the value indicated.

Lookahead value N-byte read Value to subtract
0x00-0x7F 1 0x00
0x80-0xBF 2 0x8000
0xC0-0xDF 3 0xC00000
0xE0-0xEE 4 0xE0000000
0xFF discard lookahead, then read 4 0x00000000

If the bottom two bits of the result are both 1, xor the value with 0xFFFFFFFC. Then, regardless of the bottom two bits, rotate the entire item two bits to the right.

Approximate implementation:

uint32 read_compressed_integer(FILE *f)
{
   unsigned char buffer[4];
   unsigned char c = fgetc(f);
   uint32 val;
   ungetc(c,f);
   buffer[0] = buffer[1] = buffer[2] = buffer[3] = 0;
   if (c < 0x80) fread(buffer+3,1,1,f);
   else if (c < 0xc0) { fread(buffer+2,2,1,f); buffer[2] -= 0x80; }
   else if (c < 0xe0) { fread(buffer+1,3,1,f); buffer[1] -= 0xc0; }
   else if (c < 0xff) { fread(buffer+0,4,1,f); buffer[0] -= 0xe0; }
   else { fgetc(f);     fread(buffer+0,4,1,f); }

   val = (((((buffer[0] << 8) + buffer[1])<<8) + buffer[2])<<8) + buffer[3];
   if ((val & 3) == 3) val ^= 0xfffffffc;
   val = (val >> 2) + (val << 30);
   return val;
}

Delta File ("save game")

A delta file consists of the following elements:

Offset (in bytes) Size (in bytes) Field
0 16 signature: "sLUx eXtenDs\012\032\015\377"
16 16 Header:
16 4 checksum (of program file, not delta file)
20 4 version number
24 4 global object element #
28 4 num-db-entries (in delta file)
32 program data

This is basically the same data as the program file. The delta file version number should be the version number that the interpreter was written against, not the original version number; in other words, if this save file were read by another interpreter, what version must that interpreter implement in order to handle to the delta file?

Program Data

The data in the delta file consists of a series of data items formatted identically to the program file. Any data type can appear at any number; datatypes are encoded using identical class-ids to the program file. As with the program file, it is illegal for the same element to be defined more than once, within this single delta file.

Conceptually, element descriptions in the delta file override those defined in the program file. Because both files contain the data items sorted by element #, a single-pass merge operation can merge the data as it is read in from disk, regardless of the way the data is represented in memory.

In addition to overriding data descriptions with entirely new descriptions, additional typecode values can be used:

Typecode = 0

A typecode of 0 indicates that a data item has been deleted (i.e. garbage collected, not to be confused with being set to a null datatype).

Negative typecodes

A negative typecode indicates that the element is still the same type as before, but its data is itself delta encoded from the description in the program file.

In this version of the specification, the only valid negative typecode is a delta-encoded object.

Delta-encoded object

A delta-encoded object description looks identical to a non-delta-encoded object description, but it is interpreted differently. The class field always overrides the definition in the program file. The <key,value> pairs are used to update the object: if a given key already exists, then the value in the delta file overrides the one in the program file; otherwise, the pair is created. As a special case, if the value is 0x80000000 (which is never a valid element number), the pair defined in the program file is deleted.

Entry Points

The entry points to the program are found by using the global element number as an object reference, and sending a message to it, using a name. Thus, for any entry points the program needs to use, the compiler must create the appropriate name and define a method on the global object which implements it. (See the discussion of message passing for more information on this process.)

The following entry points are defined (here a C-style function declaration is used to indicate parameters; the name is just the identifier, not including anything including or after the left parenthesis); in every case, 'parm' is a parameter this program supplied to the opcode which triggered the entry point (restrictions on the contents of parm are discussed with the opcodes): [[this allows programs to pass data "around" undos and restores]]

#start(): invoked when the program is loaded
#save(error_message, parm): invoked when a deltafile is saved; error_message is either 0, indicating success, or something else, indicating failure; interpreters that return a string should expect the string to be printed
#restore(parm): invoked when a deltafile is loaded successfully
#saveundo(parm): invoked after successfully creating an undo mark
#restoreundo(parm): invoked after a successful undo
#error(error_message, stack_trace, grouping): invoked when there is a program error; error_message is of type string, stack_trace is an array, and grouping indicates the structure of the stack_trace array (every sequential 'grouping' of array elements is a level of the stack). Interpreters may pass in 0 for any of these.

Note that the use of a leading number-sign (#) character is a convention in this document that stands for a prefix of a 0-byte; for example, the name "#start" should be taken to mean the character array { 0, 's', 't', 'a', 'r', 't' }.

Note that if a method has the wrong function signature (takes the wrong number of arguments), the normal error handling will occur (a #doesNotUnderstand message, and if that fails, an error; if the error cannot be successfully dispatched for an entry point, the interpreter should terminate the program).

Additionally, the VM looks for certain certain names to be defined on the global object, and uses the values of them for certain purposes.

#int: The pseudoclass for ints
#name: The pseudoclass for names
other VM class names in the same format: As above
#objectname: If global.#objectname is a name, when debug printing an object, checks for the existence of this name as an instance variable, and if the value of it is a string, prints that string. [[Debug printing is an optional feature of debugging interpreters, so this isn't very important; typically, debug printing will be handled directly by the program.]]

The global should not have a parent class.

Bytecode format

The executable code for a function is represented as a bytecode stream, which is described below. This stream consists of a series of 8-bit bytes which can be parsed into 'instructions'; each instruction conists of an opcode followed by a series of 0 or more operands.

For a program to be valid, all of its functions' bytecode must be strictly parseable. A bytecode stream is strictly parseable if:

Starting at the beginning of the bytestream and parsing out instructions, with each new instruction beginning on the next byte after the previous instruction, eventually reaches the end of the bytestream without encountering any invalid opcodes.
Let the instructions encountered by the above process be the strictly parsed instructions. Then if any strictly parsed instruction is a branch instruction, any target of that branch instruction must itself be a strictly parsed instruction.
All references to literals and frame variables are within the range defined by the function's literal pool and frame size.
The last strictly parsed instruction must be one of the return opcodes or an unconditional branch.

[[The high concept is that the bytecode is well-formed; there is no internal padding with jumps around it; there is no sequence of bytecode which is executed on different passes "out of phase". These restrictions allow VMs to trivially alter or recompile bytecode.]]

Slux bytecode uses a register model [[rather than the traditional stack machine model]]. Each activated function gets its own set of registers [[which are normally mapped 1-to-1 to variables]]. Register #0 always has the value of self, the object which provides the current context for instance-variable references and self-message sends, while register #1 always has the value of class, which provides the context necessary for superclass message sends. Neither of these registers can be written to.

Opcode table

Opcodes! We got opcodes!

Function bytecode data consists of a stream of variable-length instructions. First comes the opcode byte. This is followed by zero or more operand bytes; the opcode determines how many operand words are used, although some opcodes use an operand value to indicate a variable number of operands.

Control flow
0x00: noop 0x01: cmpz 0x02: cmpnz 0x03: cmpe 0x04: cmpne 0x05: cmpg 0x06: cmpge 0x07: cmpgu 0x08: cmpgue 0x09: jump 0x0A: cjump 0x0B: jumpz 0x0C: jumpnz 0x0D: jumpe 0x0E: jumpne 0x0F: jumpg 0x10: jumpge 0x11: jumpgu 0x12: jumpgeu 0x13: return 0x14: return0 0x15: return1 0x16: func 0x17: msg 0x18: msgfunc 0x19: msgself 0x1A: msgselffunc 0x1B: msgglobal 0x1C: super 0x1D: dsuper 0x1E: funcperform 0x1F: perform
Integers & Variables
0x20: add 0x21: sub 0x22: mul 0x23: div 0x24: mod 0x25: modu 0x26: lshift 0x27: rshifts 0x28: rshiftu 0x29: bitand 0x2A: bitor 0x2B: bitxor 0x2C: bitnot 0x2D: neg 0x2E: abs 0x2F: forceint 0x30: move 0x31: getglobalprop 0x32: setglobalprop 0x33: getpropself 0x34: setpropself 0x35: findpropself 0x36: getcontext 0x37: pushcontext 0x38: popcontext 0x39: caller 0x3A: global 0x3B 0x3C 0x3D: type 0x3E: istype 0x3F: typeassert
Strings & Arrays
0x40: strlen 0x41: strget 0x42: strcmp 0x43: strncmp 0x44: strcasecmp 0x45: strextract 0x46: strcat 0x47: strsubs 0x48: lowercase 0x49: uppercase 0x4A: lowerston 0x4B: ston 0x4C: ntos 0x4D: stoi 0x4E: itos 0x4F: stoa 0x50: atos 0x51: newarray 0x52: array 0x53: arrlen 0x54: arrget 0x55: arrset 0x56: arrappend 0x57: arrshrink 0x58: arrresize 0x59: arrfastdel 0x5A: arrsearch 0x5B: arrjoin 0x5C: arrinsarr 0x5D: arrextract 0x5E: arrinsert 0x5F: arrdelete
Objects, Structs, Misc
0x60: create 0x61: plist 0x62: objkeys 0x63: objvals 0x64: otoa 0x65: findprop 0x66: getprop 0x67: testprop 0x68: setprop 0x69: addprop 0x6A: delprop 0x6B: getclass 0x6C: setclass 0x6D: subclass 0x6E 0x6F 0x70: struct 0x71: structget 0x72: structset 0x73: curry 0x74 0x75: copy 0x76: destroy 0x77: become 0x78: trap0 0x79: trap1 0x7A: trap2 0x7B: trap3 0x7C: trap4 0x7D: trap5 0x7E: trap6 0x7F: trap7

Base Input & Output
0xE0: time 0xE1: ctime 0xE2: deltatime 0xE3: randtime 0xE4 0xE5 0xE6: input 0xE7: ouput
VM control
0xE8: cacheglobal 0xE9: cachepseudo 0xEA 0xEB 0xEC 0xED 0xEE 0xEF: gestalt
Debugging
0xF0: compile 0xF1: identifier 0xF2 0xF3 0xF4 0xF5 0xF6 0xF7
VM state
0xF8: restart 0xF9: save 0xFA: restore 0xFB: saveundo 0xFC: restoreundo 0xFD 0xFE: error 0xFF: exit

Each opcode is followed by a series of data values, which represent the location of the source of the data to the operation, and the output from it. The number of data values is defined by the opcode. The data values are 8-bit signed values.

The value of the data item is interpreted as referring to a location as follows:

value < 0: The |value|-1'th item in the literal pool (e.g. value=-1 means the first value defined in the literal pool); for an output (destination), the value -1 means 'discard the output', and all other negative values are invalid
value >= 0: The value'th register

The registers are defined when a function is called: 0=self, 1=class, 2,3,4...n+1 = n function arguments, n+2...n+2+m=local variables.

Only registers can be the targets of operations (but not registers 0 & 1); targets of operations are referred to as 'dest' in the tables below.

Some instructions use a 16-bit signed integer literal operand, which is parsed in a big-endian manner from the bytestream (byte0 * 256 + byte1); these operands are notated with the suffix _i in the tables below.

Many opcodes support 'opcode overloading' so as to provide operator overloading in an underlying language. If an opcode described as supporting overloading is run and would otherwise typecheck, then if operand p0 is an object, a message is generated of this form:

   p0.#opcode(p1,p2,...)

If p0 is not an object, then a message is sent to p0's pseudoclass, with p0 as the first argument:

   getclass(p0).#opcode(p0,p1,p2,...)

(Note that the # prefix stands for a 0-byte, not the number-sign character.)

The result of this message is written to 'dest', if such an operand exists in the original opcode.

Miscellaneous instructions

noop: Has no effect.
move dest p0: Stores p0 to dest.
caller dest: Stores the value of the calling frame's pseudovariable 'self' in dest.
global dest: Stores the value of the global object in dest.
copy dest p0: Creates a copy of p0 and stores it in dest. For arrays and objects, this is a real copy; copy(x)!=x. For strings, names, integers, functions, currieds, and null objects, copy(x)==x.
destroy dest p0 If p0 a mutable type (object, array, struct), turns p0 into an entity of type NULL. If p0 is a system object, informs the system that p0 will no longer be reached, and turns it into an entity of type NULL. typechecks for names, strings, int, function, curried, and static-struct.
type dest p0: Returns the type of p0 as a name: $int, $string, $name, $array, $object, $function, $curried, $null--similarly for other names defined in the header.
not dest p0: If p0 is 0, writes 1 to dest; otherwise, writes 0.
randomseed dest: Stores in dest a random number in the full signed 31-bit range suitable for seeding a pseudorandom generator [[ e.g. it could be the current time in milliseconds; successive values would not be particularly random, but a single value would be ]]

Comparison instructions

Comparison instructions evaluate a condition and write the result to a register. [[These instructions generate C-style booleans, but most of the time the branch instructions should be used.]]

cmpz dest p0: dest = (p0 == 0)
cmpnz dest p0: dest = (p0 != 0)
cmpe dest p0 p1: dest = (p0 == p1)
cmpne dest p0 p1: dest = (p0 != p1)
cmpg dest p0 p1: dest = (p0 > p1)
cmpge dest p0 p1: dest = (p0 >= p1)
cmpgu dest p0 p1: dest = (p0 unsigned> p1)
cmpgeu dest p0 p1: dest = (p0 unsigned>= p1)

Branch instructions

During normal execution, bytecode instructions are executed sequentially; after executing instruction an instruction at location N which was encoded in M words, then next instruction executed is at word offset N+M. Branch instructions change this; if a branch is taken, then the operand data (which is always interpreted as a literal integer) is the offset to the new address to continue execution; if the value is X, then the new exectution address (expressed as a word offset) is N+M+X; in other words, the branch distance is X*2 bytes.

Branch instructions overload by overloading to the corresponding comparison instruction, and using the result of that to determine the branch.

jump offset_i: Always branches to offset location.
jumpz/jumpnz p0 offset_i: Branches if p0 is 0 or is not 0, respectively. Does not typecheck.
jumpe p0 p1 offset_i: If p0 equals p1, branches to offset. Note that this is object equality; two strings are equal only if they are the same string, not if they have the same contents; similarly for other data structures. Integers are the same if they have the same value.
jumpne p0 p1 offset_i: Branches if p0 does not equal p1.
jumpg p0 p1 offset_i: Branches if p0 is greater than p1 if both are integers; otherwise typecheck
jumpge p0 p1 offset_i: Branches if p0 is greater than or equal to p1; both must be integers.
jumpgu/jumpgeu p0 p1 offset_i: Branches based on comparison of p0 to p1 with both numbers considered unsigned [[implementors: note that if using the idealized tagged 32-bit representation, the 31-bit numbers are sign extended; however, a direct-unsigned comparison is still valid without masking off the highest bit]].
cjump p0 numparm_i off0 off1 ... offn-1: off0..offn-1 are signed 16-bit branch offsets. If p0 is not an integer, typecheck. If p0 <= 0, jump to off0. If p0=1, jump to off1. If p0 >= numparm-1, jump to offn-1.

Integer Math

All of these operations support overloading.

add dest p0 p1: Computes the sum of p0 and p1 and stores it to dest. p0 and p1 are added using 32-bit addition, and truncated to 31-bits; the same instruction serves for both signed and unsigned addition.
sub dest p0 p1: Computes p0 minus p1 and stores it in dest.
mul dest p0 p1: Computes p0 multiplied by p1 using a signed 32*32->64-bit multiply, truncated to 31 bits.
div dest p0 p1: Computes p0 divided by p1. Division by 0 generates a domaincheck error. Results are rounded towards.
mod dest p0 p1: Computes the remainder of dividing p0 by p1. The remainder is computed consistently with the divide, so that ((X div Y) mul Y + (X mod Y)) equals X.
modu dest p0 p1: Computes the remainder of dividing p0 by p1 such that the remainder is always non-negative; 0 <= p0 modu p1 < abs(p1).
neg dest p0: Computes the negative of p0 and stores it in dest. Negation is identical to subtraction from 0; therefore (neg 0x40000000) equals 0x400000000.
abs dest p0: Computes the absolute value of p0; equivalent to negating p0 if p0 is negative; therefore (abs 0x40000000) equals 0x40000000.

Integer Bit Operations

All of these operations support overloading.

lshift dest p0 p1: Computes p0 bitwise left-shifted by p1 bits; the shift amount is evaluated by taking the bottom five bits of p1, computing a 32-bit shift, and truncating the result to 31 bits. [[This means the natural effect is obtained for 0..31 bit shifts, but the shift can be implemented without any conditional testing.]]
rshifts dest p0 p1: Computes p0 bitwise right-shifted by p1 bits, with the bit length as described above. The result is sign-extended; that is, the highest bit of p0 is replicated into the top p1 bits.
rshiftu dest p0 p1: Computes p0 bitwise right-shifted by p1 bits as above, but without sign-extension--the result is filled with 0s. [[Note that implementation of this is subtle, since the natural 32-bit representation includes one already sign-extended bit; clearing the high bit before shifting works for every case except a 0-length shift, where it needs to be reconstructed.]]
bitand dest p0 p1: Computes the bitwise-and of p0 and p1: each bit of the destination is 1 if both of the corresponding bits of p0 and p1 were 1; otherwise the result bit is 0. This computation uses a 2's complement representation of p0 and p1 and dest.
bitor/bitxor dest p0 p1: Computes the bitwise-or and bitwise-xor respectively of p0 and p1. Bitwise or produces a 1 bit if either corresponding bit of p0 or p1 is 1; Bitwise xor produces 1 bit if the corresponding bits of p0 and p1 are different.
bitnot dest p0: Inverts all of the bits of p0, stores into dest.

String operations

All opcodes support overloading unless otherwise noted.

output p0 (not overloaded): If p0 is a string, outputs p0 to a display; otherwise does nothing.
input dest (not overloaded): Allows the user to type in text, and stores it in dest as a string.
strlen dest p0: If p0 is not a string, typecheck. Stores the number of characters in the string in dest.
strextract dest p0 p1 p2: If p0 is not a string or if p1 or p2 are not integers, typecheck. If p1 < 0 or p1 > strlen(p0), domaincheck. Otherwise, produces a new string which consists of the characters x where p1 <= x < p1+p2, in sequential order, and stores it in dest. If p1+p2 > strlen(dest), substiute p2 = strlen(dest)-p1.
strcat dest p0 p1: If p0 or p1 are not strings, typecheck. Produces a new string consisting of the characters of p0 followed by the characters of p1.
strget dest p0 p1: If p0 is not a string, typecheck. If p1 < 0 or p1 >= strlen(p0), domaincheck. Stores the p1'th character of p0 into dest. The character is interpreted as an unsigned 8-bit number.
strsubs dest p0 p1 p2: All three values must be strings. Searches for 'p1' in p0, and replaces all copies with p2. Searching occurs left-to-right; after a substitution, searching continues after the replaced string, so only the leftmost of overlapping patterns are substituted, and copies of the target string created by substitution are not themselves substituted. The result is stored in dest.
lowercase/uppercase dest p0: Produces a copy of p0 with all alphabetic characters converted to lowercase/uppercase respectively.
strcmp dest p0 p1: p0 and p1 must be strings. Returns -1 if p0 < p1, 0 if p0 == p1, and 1 if p0 > p1, where comparison is in "dictionary order".
strncmp dest p0 p1 p2: Compares first p2 characters of p0 with those of p1, as above. Identical to strcmp(strextract(p0,0,p2),strextract(p1,0,p2)).
strcasecmp dest p0 p1: Compares p0 and p1 like strcmp, but performs the comparison in a case-insensitive fashion.

String conversions

ntos dest p0: Converts name p0 to a string. An interpreter may return different strings each time; that is, the test (ntos(x) == ntos(x)) can vary between different interpreters.
ston dest p0: Converts string p0 to a name with the identical byte pattern. Any two strings with a given sequence of bytes will generate the same name; ston("foo")==ston(strcat("fo","o")), but it's not required that "foo"==strcat("fo","o"). It is required that ston(ntos(n))==n, but not ntos(ston(s))==s
lowerston dest p0: Converts string to name, converting the characters of the string to lowercase first. lowerston(x) == ston(lowercase(x)); [[Is faster because it avoids creating any temporary data items--significantly faster if the name already exists.]]
stoi dest p0: Converts a string to an integer . If the string is not a number, produces 0. If the number cannot be represented with a signed 31-bit range, produces unpredictable results. Note that the first character of the string must be one of the characters "-0123456789", or else 0 will be produced.
itos dest p0: Converts an integer to a string representation of the base-10 encoding of that integer.
stoa dest p0: Creates an array of length strlen(p0), with each of the elements one character from p0; the values are treated as 8-bit unsigned values. arrayget(stoa(str),i) == strget(str,i).
atos dest p0: If p0 is not an array, or any of its elements are not integers, typecheck. A new string is created, arraylen characters long. The bottom 8 bits of each integer in the array are turned into a character.

Array Processing

Array operations may have to operate on many elements. The opcodes below indicate the performance that compilers (and language users) should expect the opcodes to have (possibly amortized over many such operations); interpreters should attempt to provide such performance; N indicates the number of elements in the array.

Note that arrays are generally modified in place, rather than returning a new array.

All allow opcode overloading except as noted.

arrlen dest p0 O(1): If p0 is an array, then stores the number of elements in p0 into dest. Otherwise, generates an overloaded operator message.
arrget dest p0 p1 O(1): If p0 is an array, then if p1 is not an integer, generates a typecheck error. If p1 < 0 or p1 >= arrlen(p0), generates a domaincheck error; otherwise, stores a copy of the p1'th element of p0 in dest. If p0 is not an array, typecheck.
arrset p0 p1 p2 O(1): If p0 is an array, then typecheck and domaincheck p1 as above. If no error, the p1'th element of p0 is overwritten with p2.
arrappend p0 p1 O(1): If p0 is an array, then grow it by one element, and place p1 in the final slot of the array just created. If p0 is not an array, typecheck.
arrshrink p0 O(1): If p0 is an array, deletes the final element of the array and shrinks it. If p0 is already empty, generate domaincheck error. If p0 is not an array, typecheck.
arrresize p0 p1 O(N): If p0 is an array and p1 is a positive integer, resize the array to length p1, padding with 0s. If p1 is negative, generate domaincheck error. Otherwise, typecheck.
arrinsert p0 p1 p2 O(N): Typecheck if p0 is not an array or p1 is not an integer. Domaincheck if p1 < 0 or p1 > arraylength(p0). [[Note this is different than the standard array domaincheck.]] Otherwise, grow the array p0 by one element, and shift all the elements starting at the p1'th element up to a one-higher location; store p2 into the p1'th slot.
arrdelete p0 p1 O(N): Typecheck if p0 is not an array or p1 is not an integer. Domaincheck if p1 < 0 or p1 >= arraylength(p0). Delete the p1'th element of the array, shifting the elements starting at p1+1 down by one element.
arrsearch dest p0 p1 p2 O(N): Typecheck if p0 is not an array or p1 is not an integer. Domaincheck if p1 < 0. Store in dest the integer location of the lowest-numbered slot >= p1 which contains element p2, or -1 if p2 is not in any such spot in the array.
arrfastdel p0 p1 O(N): Typecheck if p0 is not an array. Let loc be the lowest-numbered slot containing element p1, or -1 if no such slot. Then if loc >= 0, the following occurs. The last element in the array is copied into slot loc, and then the array is shrunk by one. [[Implementation note: a tight sentinel-based linear search, followed by a swap and a shrink.]]
arrjoin dest p0 p1 O(M+N): Typecheck if either p0 or p1 is not an array. Construct a new array whose length is the sum of the length of p0 and p1, whose contents consist of the elements of p0 followed by the elements of p1.
arrinsarr p0 p1 p2 O(M+N): Typecheck if either p0 or p2 is not an array, or p1 is not an integer. Domaincheck if p1 < 0 or p1 > arraylength(p0). Insert the elements of p2 starting at p1, in order. If p1=0, roughly equivalent to arrayjoin(p2,p0); if p1=arraylength(p0), roughly equivalent to arrayjoin(p0,p2); if p2 is a single element array, identical to arrayinsert(p0,p1,arrayget(p2,0)).
newarray p0 dest (no overloading): Create a new array of length p0, filled with 0s. Typecheck if p0 is not an integer. Store in dest.
array num p0 p1 p2 ... p(num-1) dest (no overloading): Create an array with num elements p0,p1, etc.

Object Data Access

No opcode overloading.

testprop dest p0 p1: p0 is an object, p1 is any data type. if p1 is a key in the object, dest is set to 1; otherwise to 0
getprop dest p0 p1: p0 is an object, p1 is any data type. If p1 is a key in the object, dest is set to the corresponding value; otherwise dest is set to 0. [[Use testprop to determine existence.]]
setprop p0 p1 p2: p0 is an object. Set the <key,value> pair <p1,p2> to p0; If p1 is not currently a key of p0, domaincheck.
addprop p0 p1 p2: p0 is an object. Set the <key,value> pair <p1,p2> to p0; if p1 is currently a key of p0, the old value is replaced.
findprop dest p0 p1: p0 is an object, p1 is any data type. If p1 is a key in the object, dest is set to the corresponding value. Otherwise, the class tree starting with getclass(p0) is searched depth-first [[see discussion of overriding search order]] for a corresponding key; if found, the value is put in dest. Otherwise 0 is stored in dest. [[There is no way to determine existence.]]
delprop p0 p1: p0 is an object, p1 is any data type. If p1 is a key in p0, that <key,value> pair is removed from the object.
getpropself dest p0: Identical to "dest = getprop(self(), p0)"; 'self' is a pseudo-variable not accessible in the normal stack frame.
findpropself dest p0: Identical to "dest = findprop(self(), p0)".
setpropself p0 p1: Identical to "setprop(self(), p0, p1)".
getglobalprop dest p0: Identical to "dest = getprop(global(), p0)".
setglobalprop p0 p1: Identical to "setprop(global(), p0, p1)".
objkeys dest p0: Creates an array of all of the keys found in p0. The order will vary with implementation, and the performance is unknown.
objvals dest p0: Creates an array of all of the values found in p0. The order will vary with implementation, and the performance is unknown.
otoa dest p0: Creates an array with all of the <key,value> pairs found in p0, stored in alternating order--all of the keys are in even-numbered slots, all of the values in odd-numbered slots. The overall ordering of keys is implementation dependent.

[[Interpreters should optimize (and compilers should expect to be optimized) getpropself, setpropself, findpropself, getglobalprop when the key is a name, or, ideally, no matter the type of key. In particular, getglobalprop should be as-fast-as-possible O(1) for names, so it can be used as a global namespace. Also, findpropself (and findprop) should presumably use the same system for fast lookup as message->method determination, e.g. a lookup cache, since they have identical semantics.]]

Object manipulation

create dest p0: Creates an empty object with its class set to p0. [[It's up to compilers to synthesize constructors. Also, since this creates an empty property list, a compiler which wants to allow a class to define a set of always-defined instance variables should create an object of that class with the instance variables pre-defined, and copy that object instead of create'ing the class. This can be done by hanging a compiler-private method off the class that caries the relevent object.]] p0 must be an object, an array of objects, or 0, or else a typecheck error will be generated.
plist dest numparm p1 p2 ... pn: Creates an object with instance variables <p1,p2>, <p3,p4>, etc. The object's class is 0.
getclass dest p0: If p0 is an object, reads the class field of the object and stores it in dest. Otherwise, stores p0's pseudoclass in 'getclass'. See discussion of pseudoclasses.
subclass dest p0 p1: p1 must be an object. If p0 is not an object, checks if getclass(p0)==p1, and writes the result in dest (1 if true, 0 if false). If p0 is an object, checks if p1 is an ancestor of p0.
setclass p0 p1: p0 must be an object; p1 must be an object, an array of objects, or 0. None of the objects referenced by p1 can be pseudoclasses. Changes the class field of p0 to p1. [[Note: can be arbitrarily slow.]]

Special operations

become p0 p1: All references to p0 become references to p1, and vice versa. p0 and p1 must be the same datatype, and must be either objects or arrays [[that is, one of the resizeable by-reference datatypes]]. If either is not of those types, typecheck. If the two types are different, domaincheck.
trap0 p0
trap1 p0
trap2 dest p0
trap3 dest p0
trap4 p0 p1
trap5 p0 p1
trap6 dest p0 p1
trap7 dest p0 p1: All trap opcodes produce errors if executed; however, they all support opcode overloading. Thus, they provide a sort of concise global function calls which can be used as a form of compression. [[ For example, an adventure library might use trap0 to encode 'print', and have the overloaded operator call 'output', but allowing client programs to intercept print or do 'stream redirection'. An overloaded trap routine can use 'caller' to determine the value of 'self' at the time of the trap to allow compressed messages to self. ]]

Function calls / Message passing

The details of method lookup are discussed in a separate section. For now, note that there are three pieces of explicitly visible execution context. The simplest two are 'self' and 'class' (accessible via identically named opcodes). 'self' is the object on whose behalf the current code is running; property lookups such as 'getpropself' use this to implement instance variables. 'class' is the class whose method the currently executing function comes from; for example, if class foo is a subclass of bar is a subclass of baz, and I call joe.blah(), if blah is defined in foo, class will be foo; if blah is not defined in foo, but is defined in bar, class will be bar. This context provides the information needed for 'super' calls.

When a message is passed, in the new execution context, 'self' will be the target of the message send, and 'class' will be the class which provided the method. When a function is called, instead of a method, the current 'self' and 'class' are "inherited" from the previous context (that is, they are dynamically scoped). Compilers can disallow "loose" functions they compile from referencing 'self' and 'class' and using functions like 'getpropself' or 'msgself'; however, it is always possible to turn a method into a "loose" function, e.g. by using 'findprop' instead of a message send, so it would be hard for a language built on Slux to not expose this behavior.

The frame variables that are created for each function call are: 0=self, 1=class, 2,3,4...n+1 = n function arguments, n+2...n+2+m=local variables.

In addition to the two pseudovariables self and class, Slux provides for arbitrary compiler-defined dynamically-scoped variables, using an explicit "context stack", controlled by the opcodes 'pushcontext', 'popcontext', and 'getcontext'. These context values are automatically propogated with full dynamic scope. [[ Although the intent is for the scope to match the behavior of the function call stack, the specification does not force this behavior; as a result it is possible for the context scope to be even stranger, although if it is not made to mimic the call stack it will almost certainly over- or underflow. Contexts as defined in Slux do not appear to correspond to anything in mainstream languages, but are a simple generalization of the pseudovariable 'player' which appears in some mud languages. The design of contexts also echoes two different Postscript mechanisms: 'save'/'restore' which has the same stack semantics, but is normally used on a much coarser scale, and the dictionary stack, which is semantically different but is used to achieve similar effects; the dictionary stack would be roughly equivalent to pushcontext pushing an object of bindings onto the context stack. ]]

All message sends are overloaded; if the target is not an object, the equivalent message is sent to the pseudoclass of the target, with the target as an additional leading parameter.

If a function/message invocation does not match the function's signature (i.e. they don't both name the same number of parameters), a domaincheck error is generated. (Actually, for a message, a #doesNotUnderstand is generated.)

If a "method" binding for a message is not of type function, or if the opcode func is called on something not of type function, opcode overloading of func (in both cases) is invoked. [[ For example, Inform-like semantics would have the pseudoclass for strings define func as printing the string, a newline, and returning true. ]]

getcontext dest p1

Store the most recent binding of p1 into dest. p1 must be a name.

pushcontext numbind p1 v1 p2 v2 p3 v3 ... pn vn

Defines 0 or more bindings of names to values; if a given name is already bound, the value is temporarily replaced by the new value, until the corresponding popcontext is seen. p1 ... pn must be names.

popcontext

Undoes the effects of the most recent pushcontext which hasn't already been popcontext'd.

return p0

return0

return1

Returns from the current function/method, writing p0/0/1 into the caller's 'dest' variable location.

msg dest target mess numparm p1 p2 ... pn

Sends a message named 'mess' (which must be a name) to 'target'. The return value is written to 'dest'. msg0 sends a message with 0 parameters; msg1 sends a message with 1 parameter, etc. Note that this is what is meant elsewhere in this document by the notation

 
dest = target.mess(p1,p2, ..., pn)

perform dest target mess paramlist

Like msg, but the parameters are extracted from the paramlist, which must be an array. [[Compiler writers: if you expose perform in your language, you should be careful to make sure your argument passing order corresponds naturally.]]

msgself dest mess numparm p1 p2 ... pn

Like msg, but target=self. Note that 'self' is always an object, so this cannot overload to a pseudoclass.

msgglobal dest mess numparm p1 p2 ... pn

Like msg, but target=global. Note that 'global' is always an object, so this cannot overload to a pseudoclass.

super dest mess numparm p1 p2 ... pn

Traverses the class tree above 'class' and executes the first method found. If no method is found, 0 is written to dest.

dsuper dest target mess numparm p1 p2 ... pn

Like 'super', but specifies the class's method to use as 'target'. If 'target' does not define 'mess', traverses the tree above 'target'. Note that 'target' may not be one of self's ancestors; this is still valid.

func dest func numparm p1 p2 ... pn

'func' must be a function or a closure. Calls 'func' with the specified parameters and writes the return value to 'dest'. 'self' and 'class' are unchanged.

funcperform dest func parmlist

calls the function 'func', but with the parameters derived from parmlist, which must be an array.

msgselffunc dest messfunc numparm p1 p2 ... pn

If 'messfunc' is a function or closure, calls it as in 'func'. If 'messfunc' is a name, does an equivalent 'msgself'. [[This allows a language syntax that is polymorphic between function calls and self-message sends.]]

msgfunc dest msgfunc numparm p1 p2 ... pn

If msgfunc is a function or closure, behaves like 'func'; if msgfunc is a name, does a message send with p1 as the target and one fewer parameters. [[This allows polymorphism given the Sludge language syntax where a message pass is written as message(obj,p1,p2).]]

curry dest func numparm p1 p2 ... pn

Creates a curried function from a function or curried function. If a function, the first two parameters, p1 & p2, are bound to 'self' and 'class' in the curried function (as if they were two hidden parameters to every function). The remaining operands are bound to the firt numparm-2 arguments of the function. If already a curried function, the next N parameters after those already curried are bound.

Caching control

cacheglobal: Certain aspects of the VM rely on specific values found in the global object. Rather than monitor all object writes to check if they are to relevent global elements, the VM is defined as potentially only reacting to those changes if cacheglobal is called (and initially computed after loading the program file [and delta file]--that is, it is as if the first instruction executed before start() or restore() is cacheglobal. The relevent keys stored on the global that must be cached on cacheglobal are: #int, #name, #string, #array, #function, #closure, #objectname. [[#pageprompt does not have cache semantics.]] Implicitly performs a cachepseudo immediately afterwards.
cachepseudo: The pseudoclasses provide the definitions of overloaded operators. The VM may cache the values of all of the overloaded operators from the pseudoclasses. If the methods for the overloaded operators are changed, operators may still use the old definitions (although explicit message passes to the overloaded operators will see the new ones). After a call to cachepseudo, the overloaded operators will be in sync again.; Both of these operators are defined so that implementation-dependent results occur if changes are made and cachefoo is not called.

System operations

Many system operations cause the current function call stack to be cancelled and reenter the global object through a particular entry point. All of these functions allow a "passaround" parameter which is passed into the global entry point. However, in cases where the VM is being restored (restore and restoreundo), this passaround parameter may be copied on the way; this is notated as 'safecopy()'. The result of a safecopy() on each type is:

integer: preserved
name: preserved
string: a copy is made
array: a copy is made, and the elements are safecopy()'d. cyclic array references produce a 0.
object: a copy is made; all of its elements (keys and values) are safecopy()'d; however, any of its elements which are themselves object references are 0'd.
function: produces 0
closure: produces 0
null: produces 0

Note that shared references may be lost under this scenario. VMs can also not copy, or make copies but keep shared references. [[Compilers and program authors should expect performance of safecopy() to be very slow and memory consumptive, and use passarounds as little as necessary.]]

exit: Unconditionally terminates execution of the program. Execution never proceeds after 'exit'.
start: Restarts the program as if it had just been loaded.
save p0 p1: p0 should be 0 or a string. if p0 is 0, the interpreter may prompt the player for a location to put a delta file. if p0 is a string, the interpreter must either use that string as the name of the file, or prompt the player, at its discretion. If the user cancels the save, execution will proceed after this instruction. If the save is attempted, then execution will be stopped and revectored with the call global.save(0, p1) if it succeeds, or global.save(error_string, p1) if it fails. If the interpreter displays sufficient error information to the user, it should simply pass a 1 for error_string; the global save() routine should print the error_string if it's a string, on the assumption that the interpreter did not give the user any feedback. See the gestalt "saverestoreui".
restore p0 p1: p0 should be 0 or a string. if p0 is 0, the interpreter will prompt the player for a delta file to read. If it is a string, the interpreter may use that string as the name of the file, if and only if it allows saving using similar strings. If the user cancels the restore, or if the delta file does not exist or is obviously corrupt, execution will proceed after this instruction. If the restore is succesful, then execution will be stopped and revectored with the call global.restored(safecopy(p1)). If the restore is not succesful, then one of two things will happen: if the interpreter can fully recover the VM state from the execution of the restore, execution will proceed after the restore statement exactly as if the restore was cancelled or trivially failed; if the interpreter cannot fully recover the VM state, then execution of the program terminates.
saveundo p0: Records an 'undo' location, then revectors with global.saveundo(p0).
restoreundo p0 p1: p0 is an integer. Restores the VM state to the p0'th previous 'saveundo', where 0 means the most recent, 1 means the second most recent, etc. If the undo is successful, revectors with global.undo(safecopy(p1)). If the undo is not successful, continues execution.
error p0: Generates an error like normal, revectors with global.error(p0, stacktrace, grouping), unless the interpreter does its own error display, in which case those will be all 0.

OS operations

time dest: Returns the current time in some unknown format. The output may be any type, depending on the interpreter.
ctime dest p0: If p0 is 0, writes the current date and time to a string and stores it in dest. Otherwise, if p0 is of the type returned by time, then computes the date and time represented by p0 and writes that to a string and stores it in dest.
deltatime dest p0 p1 p2: Given p0 and p1 both in the format returned by time, and p2 an integer, computes the difference between time p0 and p1 in units of p2 seconds. In other words, convert p0 and p1 to seconds since January 1, 1970, with infinite integer precision. Compute (p0-p1)/p2. If the result cannot be represented in 31 bits, store the maximum possible value of the correct sign (0x3FFFFFFF if positive, 0x40000000 if negative).; Example: Let x = time() at the beginning of play. Let y = time() at when the player types a command to query how long she's been playing. Compute z = deltatime(y,x,3600); z is the number of hours. Use deltatime(y,x,60) for the number of minutes, deltatime(y,x,1) for the number of seconds, deltatime(y,x,8640000) for the number of days. [[Rather than load down VMs with the code to break it down into days/hours/minutes/seconds, then make libraries turn that back into the desired units, the desired units can be directly computed. It's also possible to compute days/hours/minutes/seconds by using remainders.]]
writefile dest p0 p1: p0 is a string or 0; p1 is a string. Creates a file named 'p0' and writes p1 to it; that is, the file will be of length strlen(p1). If writefile fails, dest will be written with 0. Otherwise, dest will contain a string which can be passed to 'readfile' to reopen the same file. Writefile can fail if there is insufficient diskspace or if there are invalid characters in the filename. Filenames containing only alphanumeric characters and 12 characters or less long should never be invalid names.
readfile dest p0: p0 is a string or 0. Reads a file named 'p0' if p0 is a string, or prompts the player if p0 is 0. If the read succeeds, returns the entire file a string in dest. Otherwise, writes 0 to dest.

[[Writefile should be constrained by interpreters to either never be silent (i.e. always request approval from the player), or else be restricted to a single directory "sandbox", e.g. a fixed directory associated with interpreter, so as to avoid security hazards.]]

VM meta-operations

gestalt dest p1 p2

Performs a generic 'gestalt' test, which allows extended 'opcodes' without specific compiler support. p1 is the gestalt 'selector', which must be of type name. If a VM does not recognize a name, it should output 0. The value of p2 varies depending on the selector. If p2 is not correctly formatted for the given selector, the VM should generate a domaincheck error. If p1 is not a name, it should generate a typecheck.

Currently defined selectors:

p1=$sluxversion

If p2 is 0, returns the highest Slux specification version number supported by the interpreter, formatted as a 4-character string. If p2 >= 1, returns the highest Slux specification version number assuming p2 is a product number. If that product version number is not supported, returns 0.

p1=$interpname

p2 must be 0. Returns a string with the name of the interpreter, suitable for printing in a sentence.

p1=$interpversion

p2 must be 0. Returns a string with the version number of the interpreter, suitable for appending to the interpreter name. E.g.

print gestalt($interpname,0)," version ",gestalt($interpversion,0)

p1=$undo

p2 must be 0. Returns 0 if the interpreter does not support saveundo/restoreundo, that is, if opcode undo will always fail.

p1=$saverestoreui

p2 must be 0. Returns 0 if there is no built-in ui for doing save/restore, or 1 if there is. If this returns 0, then attempts to call save/restore with 0 for the name will always fail.

p1=$cacheglobal

p2 must be 0. Produces an array of all the names that the cachekernel opcode will cache from the kernel.

p1=$compile

p2 must be 0. If the compile opcode is not a noop, returns a human-readable string naming the language which the compile opcode compiles; otherwise returns 0.

p1=$error

p2 must be 0. returns 0 if the interpreter has no built-in error display handling, 1 if the interpreter displays errors automatically.

p1=$writefile

p2 must be 0. returns 0 if writefile is not allowed, 1 if writefile always prompts the player for the file location, and 2 if direct writefile is allowed.

Debugging operations

All interpreters must implement these operations, but they can all be implemented to do nothing at all; those that have a 'dest' parameter should write a 0 to it. A "debugging" interpreter should provide as many of these functions as possible, which will facilitate integrated debugging and interactive development.

compile dest p0 p1: p0 is a string. p0 will be compiled; the result of compilation will be put in dest. That result will be one of: a string encoding a compile error, an object if the input was an object definition, or a function if the input was a function definition. Note that the language defined by 'compile' is implementation-dependent. p1 can be 0, or it can be an object which contains a mapping of names to classes, which is used for the compiler to hook up a class definition to the object hierarchy.
identifier dest p0: Produces an integer identifier for element p0, which should be as unique as possible (given the limit of only 31-bit integers) and as persistent as possible.

Message passing

Certain function-call-style operations are defined as "message passing". Such an operation takes an object and a name, computes an appropriate function to call, and calls it. Computing an appropriate function to call involves looking up the class hierarchy ancestors of the objects for a property with that name.

Method lookup

The exact definition of this operation is as follows:

Search for a property of that name on the object.
Search for a property named #classancestors on the object. If it is defined and is an array, iterate through its elements in order, checking for a property of the message name.
If #classancestors is not defined, search for the property starting from the parent class; if not found there, recurse upwards. If the parent class of a given object is an array, not a single object, then iteratively recurse up each element of the array; that is, perform a depth first search. The first implementation found is used.

Some clarifications:

#classancestors is only checked once; it is not part of the recursive process described in the last section above
#classancestors provides a way to override the lookup order in an arbitrary fashion. There is no requirement that the elements of #classancestors actually be part of the class ancestry of the class which defines it.
If #classancestors does not contain an array, or if the array contains anything other than objects, typecheck.

Changing the value of #classancestors corresponding to an object which has a currently-pending method defined on it (that is, there is an unfinished context in which 'self' contains an object whose parent class is undergoing the #classancestors change) has implementation-dependent results, and so should be avoided. [[This allows implementations to cache data associated with #classancestors, such as the offset within the array for a given stack frame.]]

Super lookup

The message passing mechanism allows classes to inherit partial implementations from one another by means of 'super' calls. The super call invokes a superparent. The search is conducted above the class which defined the current method (which is found in the 'class' pseudovariable).

The search method varies significantly depending on whether the self object (or its parent class) defines #classancestors or not:

If neither self nor the parent class of self define #classancestors, then a normal recursive search up from 'class' is conducted. [[Note that this is very different from "find the next method that the original traversal would have found", since the latter will consider branches of the class hierarchy which are not above the current class.]]
If self or the parent class of self does define #classancestors, then find the current class in the #classancestors array (treating the parent class of self as the first element of the array). Search iteratively through the #classancestors array beyond this point. If the current class is not in the #classancestors array, the message send fails.

Note that there is no requirement that the message sent be the same as the current message, and the semantics in this situation are well-defined, but I don't see the semantics as particularly meaningful, and I recommend avoiding relying on these semantics since they might change.

Directed super

A message can be directed to a particular superclass using the directed super (dsuper) opcodes. These are defined as simply looking up the message on the appropriate object, and doing no traversal. It is the job of the compiler to bind dsupers to the appropriate object up front. (Note, though, that 'super' calls from dsuper calls do use the normal traversal rules.)

[[Directed supers allow the compiler to make super calls go to exactly the right place, but they make it impossible for the super call in a given class to behave differently in different circumstances, which is what the #classancestors with normal supers allows.]]

It is the intention of this specification that between directed supers and #classancestors, it is possible for languages which create static, compile-time-known class hierarchies to experiment with different rules for dealing with inheritence when supporting multiple-inheritence. [[If this is not sufficient, it will change.]]

Method resolution

When a method lookup finds a property value corresponding to the name of the message, the following sequence of events occurs:

If the value is an integer, name, string, array, object, or null, then opcode overloading on func is checked for the appropriate pseudoclass; if not defined, the value itself is immediately "returned" as the result value of the message pass.
If the value is a function or a closure, then compare the function signature with the number of parameters in the message pass. If the number of parameters match, build a frame variable frame with those parameters and execute the function or closure, producing the value from a 'return' opcode as the result of the message. (A function with a non-zero number of closure elements is an unbound closure, and cannot be executed; this will produce a closurecheck error.) If the number of parameters do not match, the message send fails--see below.

findprop

The findprop opcodes use the exact same lookup mechanism, but always simply returns the value, including functions and closures. However, there is no way to do a 'super' findprop. [[One can actually use a 'super' function call if func hasn't been overloaded, since super simply returns the values if the result is not a function.]]

Failed message send

If a message send fails, a new message is sent to the object, #doesNotUnderstand. This message is sent using the same full search routine, but always to the object from scratch (e.g. a failed super call's #doesNotUnderstand starts over from the bottom of the tree).

The #doesNotUnderstand message has two parameters; the first is the name of the message that failed, and the second is an array containing the parameters of the original message.

[[It is possible for an object which defines a message to receive a #doesNotUnderstand for that message, for instance if it attempts to do a 'super' dispatch where the method is not visible.]]

Function execution

If a function has a non-zero closure data size, then the function can not be executed, only instantiated by the closure opcode.

If a function has a closure data size of zero, then the function can be executed by a function call or message send with the correct number of parameters. If the parameters match, then the function is executed by creating a new set of "frame variables" for it; the number of frame variables corresponds to the number defined in the function frame definition. The passed-in parameters are copied into the first N slots of the frame variables, and the others are filled with zeroes.

If a closure is executed with the appropriate number of parameters (according to the function defined in the closure), then a new set of "frame variables" is created, based on the size specified by the function stored in the closure. The N closure variables are copied into the first N slots of the frame variables, followed by the M parameters to the function call; the remainder are filled with zeroes.

Version History

1.0.0.14: Reversed the numbering between names and other types. This means that (a) the out-of-band value is 0x80000000, which is nicer, and (b) that all names will come first in the file, which is theoretically good since it means you can process them fully in the first pass, and process them from the fast hint array earlier.
1.0.0.13: Removed fullcurry opcode; changed curry opcode to allow specifying explicit self & class. Added 'null' datatype to known vmclass names. Added missing description of destroy opcode.
1.0.0.12: Removed displayerror opcode which should be handled with terp UI. Altered the #classancestors spec to be simpler, although it is probably a little trickier to use this way (but the same trickiness was always necessary if an object's parent was an array of classes). Altered arrsearch to allow searching from the middle.
1.0.0.11: Added num-db-entries to header. Added arrshrink opcode and renumbered other opcodes as a result. Removed redundant size in curried-function layout. Clarified file layout transition from vm classes to database.
1.0.0.10: Introduced VM-classes, basically user-configurable data types. Back to 8-bit bytecode, yay! Also added comparison opcodes that correspond to the branch opcodes, which does two things: shrinks bytecode for operations return x < y (and speeds up the more rare z = x < y), and more imporantly provides something which can be overloaded so branches can be overloaded. Removed redundant halt opcode. Added forceint and typeassert opcode to allow optimizing interpreters to assume types without replicating code. Rewrote the introduction to provide information instead of advocating, and to distinguish the IF components from the non-IF components. Renamed 'kernel' to be 'global' so it's more recognizable for targetting other languages. All function dispatches can do tail recursion by specifying a dest of '-2', which means 'return this value'.
1.0.0.9

Offset (in bytes)	Size (in bytes)	Field
0	16	Signature: "sLUx pRogrAm\012\032\015\377"
16	16	Header:
16	4	checksum
20	4	version number
24	4	global-object element #
28	4	num-db-entries
32	32	VM configuration:
32	4	number of VM-classes
36	4	fast hint array element #
40	24	reserved, set to 0 on write, ignore on read
64		program data

Lookahead value	N-byte read	Value to subtract
0x00-0x7F	1	0x00
0x80-0xBF	2	0x8000
0xC0-0xDF	3	0xC00000
0xE0-0xEE	4	0xE0000000
0xFF	discard lookahead, then read 4	0x00000000