SLUX
A High-Level Polymorphic Virtual Machine
Version 1.0.0.14
Sep 15 2002
Sean Barrett - sean at nothings dot org
Introduction
This document defines Slux, a byte-coded virtual machine
designed primarily as a target machine for adventure
game (Interactive Fiction (IF)) languages and secondarily as a
generally targetable dynamically-typed virtual machine.
Distinguishing properties of Slux are:
General
- garbage collection
- dynamic typing
- currying support (from which general closures can be synthesized)
- function-scoped, object-scoped, global-scoped, and dynamically-scoped variables
- object model
- objects inherit from objects (no classes)
- multiple inheritence
- run-time mutable inheritence hierarchy
- new instance variables can be added to objects on the fly
- objects also serve as general hashes
- built-in types that are not objects
- 31-bit integers
- immutable strings
- immutable names (hashed strings)
- immutable functions
- mutable, resizeable heterogenous arrays
- mutable fixed-size structs
- pseudoclasses for non-object types
- allow defining methods for non-object types
- allow defining behaviors for opcodes
- allow efficient interpretaion of languages in which
all data appears to be an object
- definition of new "built-in" types
- pseudoclasses define all standard behaviors with interpreted code
- interpreters can acclerate recognized types with native code
- supports linking multiple languages together
- multiple interpretation options
- a "naive" interpreter will be small and moderately efficient
- sufficient information is availble for an interpreter to
do an easy 1-to-1 translation of the bytecode to more efficient forms
- just-in-time or full-native-code recompilation is possible
- compiler-emitted type assertions can simplify the task of
omitting typechecking
IF-specific
- memory save/load and save/undo
- object model favors "all objects are unique and different"
instead of "lots of mostly identical objects"
About this document
This document defines the file format for an executable
Slux program and the manner in which it executes, as well
as an optional file format for Slux "save games"
(snapshots of the executable state). [[Interpreters
need not support the save game format to be compliant, but
it is recommended, since interoperable save games can
be convenient for players.]]
Care has been taken to divorce the "on disk" storage format
of Slux from its in-memory representation; multiple
implementations of many features of Slux are possible.
[[Comments set off like this are not normative (not part of
the specification); they are generally verbose rationale
descriptions.]]
Where the term program appears, it normally means "the
program as it appears in a file"; since interpreters can
implement the virtual machine in any way desired, it doesn't
make any sense to talk about how the program is stored when
being executed.
The notation 0x100 identifies hexadecimal
numbers (i.e. decimal 256); numbers without a prefix are decimal.
The notation $foo refers to an entity of type Name associated
with the three-character string "foo".
Data Types
Data types are really only defined by the operations on them, but for
the sake of exposition this section defines the data types in very general
terms.
- Element
- Not a data type, but a generic term for an entity which can be
any of the following data types.
- Integers
- Numbers on which arithmetic can be performed. They range
from -(2^30) to (2^30)-1 [[ i.e. signed 31-bit integers ]]
- String
- An ordered sequence of characters which can be printed. A number of
transformation operations are provided which produce new strings;
a given string object, once created, is immutable.
- Name
- An ordered sequence of characters whose element is identical
for identical sequences. (I.e. a name allows detecting equal
strings by comparing "pointers".)
- Array
- An ordered collection of elements (heterogenous) which can be rapidly
randomly accessed by index number. Arrays can also be modified
in-place (which means that other references to the array will see the
changes).
- Object
- An object is a dictionary--a collection of pairs of elements which
supports rapid random access on a 'key'. Additionally objects are the
targets of message sends, and also serve double-duty as classes that
form a hierarchy, discussed elsewhere; however, objects can also just
be used as a structured data type. Objects are modified in-place.
- Function
- A function is an executable data element--a piece of code to be
interpreted by the interpreter.
- Curried Function
- A curried function is a function which has had one or more of its
parameters bound in advance.
- Struct
- An ordered collection of elements which can only be accessed
with "hardcoded" offsets--run-time random indexing is not possible.
A Slux file can define multiple different struct types, each
of different sizes.
- Static-struct
- Like a struct, but a static-struct does not have a fixed size. It
is a single type which can be used to represent multiple different
struct types which have been statically typechecked.
- Null
- A null element is not the valid target of any operation; it is used
as a placeholder when another item is explicitly destroyed, so that
dangling references still point to something.
File Format
Program File
A program file consists of the following elements:
Offset (in bytes) | Size (in bytes) | Field
|
0 | 16 | Signature: "sLUx pRogrAm\012\032\015\377"
|
16 | 16 | Header:
|
16 | 4 | checksum
|
20 | 4 | version number
|
24 | 4 | global-object element #
|
28 | 4 | num-db-entries
|
32 | 32 | VM configuration:
|
32 | 4 | number of VM-classes
|
36 | 4 | fast hint array element #
|
40 | 24 | reserved, set to 0 on write, ignore on read
|
64 | | program data
|
Version number
The version number consists of four components, each of which
is one byte long, in the following order: the product version number,
the major version number, the minor version number, and the release number.
Compilers should encode what version number of this specification
they are generating code against. Interpreters can check the version number
and determine if that version is supported. To judge whether an interpreter
written against some version of this specification can execute a
given program file, consider the following rules:
- release number: Incremented for every release of the specification.
It is the only number which will increase for changes to the specification
which do not affect the virtual machine (e.g. for specification changes which
are clarifications or other corrections).
- minor version number: Incremented on changes to the specification
which are basically forwards and backwards compatible. (See discussion for
special case when major version number is 0.) Reset when the major version
number changes.
- major version number: Incremented when a change to the specification
is backwards compatible (a new interpreter can run old programs, but an
old interpreter will not be able to run new programs); e.g. introduction of
a new opcode or datatype. Reset when the product version number changes.
- product version number: Incremented when a change to the
specification prevents backwards compatibility. Release numbers are
reset on such increments.
If the major version number is 0, the product is in prerelease; during
this time, increments of minor version numbers may not be either forwards
or backwards compatible. If the minor version number is 0, the major
version is in prerelease; during this time, versions will only update
the release version number, but may not be forwards and backwards compatible.
Release versions of compilers should not output version numbers that
include 0's, because the feature sets of such versions are too unpredictable
for interpreters to accurately gauge their abilitiy to execute programs
written to such versions.
Global object element #
This value is a big-endian 32-bit integer which is interpreted
as an element reference (see section on element encoding). It is used
to determine
the entry points into the program; see the discussion of entry points
in the next section.
num-db-entries
This is the number of entities which will appear in the entity database
after the VM class names.
VM configuration section
The VM configuration section of the header is used to configure aspects
of the VM that must be defined at load time, before any code starts
running (and thus cannot be passed using normal opcodes), and perhaps
before data is loaded into the VM.
Fast hint array #
This field should either contain an element reference to an array
or should be 0. If it is a reference to an array, the VM should take
the elements in the array as a hint to accelerate operations on those
elements. If the VM cannot accelerate operations on all of the elements,
it should favor the lowest-indexed elements in the array. The
operations accelerated depends on the type of the element:
for a name, accelerate all property-lookups on that name;
for an object, accelerate all property-lookups on that object;
for an array, accelerate dynamic resizing of the array;
for a string, accelerate printing and converting to arrays and names;
for a function, accelerate execution.
[[ For example, a VM might hardcode four properties per object in fixed
locations, and thus accelerate four names. It might allow a certain number
of objects to use fast hash tables instead of slower-but-smaller data
structures; it might choose to not keep strings compressed in memory;
for a function, it might recompile
the function into a faster execution format. Alternately, of course, a
VM might totally ignore the hint array, and accelerate everything, or
nothing, or perform run-time analysis. ]]
Format of Data
After the header, a Slux program contains a list of VM class names followed
by a database stored as a sequence of data items.
VM Class Names
In the header is a list of the number of VM classes in the file.
This section of the file lists names for all of the VM classes.
Each name appears as a length byte n, followed by n bytes which
are interpreted as an ASCII string which is the name of the class.
All of the bytes of the string must be >= 32 and < 127.
A length byte of 0 indicates the end of the list.
The first name defined in the VM class name section is assigned
the internal class ID 1; the second gets internal class ID 2,
etc.
The interpreter will recognize some of these names and automatically
know how to parse them. All interpreters know the following VM
class names:
- int
- null
- name
- string
- object
- array
- function
- curried
- static-struct
Some interpreters may recognize other names as well. Extra names
handled by interpreters should be of the format 'x-CONTEXT-TYPENAME',
where 'x' is a literal character x, 'CONTEXT' is a string name representing
the application, and 'TYPENAME' is the name of the type. [[ For example,
an interpreter extended to support Tads3 might include 'x-T3-grammar_object'. ]]
It is perfectly valid for the list of class names to include
names the interpreter doesn't recognize. Such classes will
have implementations solely through operator overloading defined
on the corresponding pseudoclass (assuming there is one).
If the interpreter doesn't know the type, and there is no
pseudoclass defined for it, then it is treated as a plain 'struct'
that is only manipulable with 'struct', 'structget', and 'structset'.
References to Data
References to data in a Slux program are encoded in a 32-bit value
which is called an "element". Integers are restricted to only 31 bits
(sign-extended), and appear directly, not referenced. The "name" data
type uses values in the range 0x40000000-0x7FFFFFFF, and all other data
types use values in the range 0x80000001-0xBFFFFFFF:
Meaning Hex Range Bit Value
positive integer 00000000-3FFFFFFF 00xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
name 40000000-7FFFFFFF 01xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
other types 80000001-BFFFFFFF 10xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
negative integer C0000000-FFFFFFFF 11xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Note that the value 0x80000000 is reserved as an "out of band" value
that should never appear in valid data.
[[Because Slux defines integer operations to be on 31-bit values, and
because the above encoding correctly sign-extends the 31-bit value to
32-bits, the above encoding is a natural representation for storing elements
in an interpreter. However, it is perfectly valid for an interpreter to
unify all data types to be indirected, instead. (Furthermore, note that
in the final file format, the above data is further compressed, so the above
encoding represents an idealized format that an interpreter never need
explicitly encode.)]]
Compilers and interpreters are encouraged to favor "other types" and
"names" being generated in a compact region of their ranges, starting
at the lowest values. [[This maximizes compression and allows
interpreters to use a directly indexed table to map coded numbers to
pointers, rather than a hash table.]]
Database
After the VM class names comes a sequence of "data items"; there are
<b>num-db-entries</b> of these, and then the file ends.
Ignoring the use of compression, the sequence of data items appears
as follows:
Offset (in bytes) | Size (in bytes) | Field
|
0 | 4 | element # to define
|
4 | 4 | class-id typecode, a small integer stored as an element
|
8 | 4 | size of uncompressed element data in bytes, stored as an element
|
12 | size | ...type-specific-data...
|
The sequence of data items must be sorted by their first field,
the element # defined. The class-id typecode is an index into the
class names as described above.
If an element is referenced in type-specific data, then that
element must be defined somewhere in the file, or else the
program file is invalid. If an element # is defined multiple
times in the file, the program file is invalid.
Type-Specific Data
Integers
Because integers are used directly, rather than by reference,
integers never appear except as data within other types. It is
invalid for the class-id typecode corresponding to 'int' to
appear in an element header.
Strings
The first byte of a string is a compression code. The only
current defined compression mode is '0', which means no
compression; the data following consists of size bytes
representing the string in order. All 256 byte values
can legally appear in a string. A 0-length string is valid (but
will still have the one-byte compression code).
[[This needs to be extended to incorporate Unicode support.]]
Name
A name appears as a sequence of size bytes. All 256 byte
values can legally appear in a name. The same sequence of
bytes for a name cannot appear for more than one name in the
file; every name must be unique. EXCEPTION: The 0-length
name can appear multiple times, and each such name is considered
different. [[This is used to allow names to be used for message
dispatch but to allow the textual versions of the names to be
stripped from the executable. But note that entry-point names
and pseudoclass names must not be stripped.]]
Names whose first byte is 0 are reserved for use by the specification,
so in the interest of forwards- and backwards-compatibility, compilers
should avoid using such names. All names indicated in this specification
with a number-sign prefix actually mean a 0-byte prefix instead of the
number-sign.
Array
An array appears as a sequence of size/4 elements (size must be
a multiple of four), which sequentially define the 0th element of
the array, the 1st element of the array, etc. A 0-length array is
valid.
Object
An object has an initial element, called its "class", followed by
(size-4)/8 pairs of elements (size must be an odd multiple of
four), each of which defines a <key, value> pair, in the order
key, then value. The order of pairs is meaningless, and no key
can appear more than once. The set of pairs can be empty.
Null
size must be 0.
Function
A function consists of three major subcomponents: the function
signature, the function bytecode, and the function literal pool.
These components are stored as follows in the type-specific data:
Offset (in bytes) | Size (in bytes) | Field
|
0 | 1 | number of parameters/arguments function takes
|
1 | 1 | number of variables in the function (max 125) [includes parms]
|
2 | 1 | count of elements in literal pool
|
3 | 1 | reserved for future use, must be 0
|
4 | 4*count | elements of literal pool
|
4+4*count | ... | bytecode
|
The executable bytecode is a stream of (uncompressed) bytes which fills
out the remaining data. The executable bytecode is defined in
a separate section.
Curried Function
A curried function is a function in which some of its arguments
have been bound; specifically, count = (size - 12)/4 of them.
Offset (in bytes) | Size (in bytes) | Field
|
0 | 4 | function which has been curried
|
4 | 4 | 'self' element
|
8 | 4 | 'class' element
|
12 | 4*count | bound parameters
|
Additional types
All further types are interpreted identically to array, as
a series of size/4 elements. Of course they create elements
of the appropriate type, rather than arrays. The data in these
types can only be accessed through the 'structget' and
'structset' operands.
Each of these other types must use a single fixed size for
all of its instances. If two entries of the same type have
different size, or if the opcode 'struct' appears in a bytestream
defining an object of this type of a different size--even if
that opcode is never executed--it is an error.
The type static-struct is special in that it does not follow
the rules of the previous paragraph. Entries of type static-struct
can be of different sizes. VMs are not obligated to range check
accesses to static-struct, so this should only be used if the
compiler has statically verified the type of the entity.
Compressed File Format
The encoding of all the program data described above consists of
one of two things: a 32-bit element reference or
an 8-bit byte. (All 32-bit counts are actually restricted to
signed 31-bit integers, thus making them valid element references.)
Therefore, the compressed file format simply defines how 32-bit
elements are compressed and how 8-bit bytes are compressed.
To maximize compressability using "zip" or other compression
programs, the compressed file format leaves all data compressed
to byte boundaries. Therefore, 8-bit data is uncompressed (except
for compressed strings).
32-bit element data is compressed depending
upon its absolute value, ignoring the type field.
First, for clarity, I define how to encode a 32-bit element
for storage in the file format:
Compressing element references
Step 1: Normalize
Given a 32-bit number "yzxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", rotate
it left by two bits, producing "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyz".
This shifts the two-bit type code down to the bottom bits.
Now, if and only if the bottom two bits are both 1 (that is,
(value & 3) == 3), then xor the number with 0xFFFFFFFC. This
takes the one's complement of negative integers.
After this step, "small" numbers and small element references
will have all 0's in their top bits.
Step 2: Compress
Let the current number be value. If value < 0x80, then output
value as a single byte.
Otherwise, if value < 0x4000, output the 2-byte big-endian
number value+0x8000.
Otherwise, if value < 0x200000, output the 3-byte big-endian
number value+0xC00000.
Otherwise, if value < 0x1F000000, output the 4-byte big-endian
number value+0xE0000000.
Otherwise, output the byte 0xFF, followed by the 4-byte big-endian
number value.
Decompressing element references
Lookahead a byte. Look the byte up in the following table, then
read the indicated N-byte big-endian number, and subtract the value
indicated.
Lookahead value | N-byte read | Value to subtract
|
0x00-0x7F | 1 | 0x00
|
0x80-0xBF | 2 | 0x8000
|
0xC0-0xDF | 3 | 0xC00000
|
0xE0-0xEE | 4 | 0xE0000000
|
0xFF | discard lookahead, then read 4 | 0x00000000
|
If the bottom two bits of the result are both 1, xor the
value with 0xFFFFFFFC. Then, regardless of the bottom two
bits, rotate the entire item two bits to the right.
Approximate implementation:
uint32 read_compressed_integer(FILE *f)
{
unsigned char buffer[4];
unsigned char c = fgetc(f);
uint32 val;
ungetc(c,f);
buffer[0] = buffer[1] = buffer[2] = buffer[3] = 0;
if (c < 0x80) fread(buffer+3,1,1,f);
else if (c < 0xc0) { fread(buffer+2,2,1,f); buffer[2] -= 0x80; }
else if (c < 0xe0) { fread(buffer+1,3,1,f); buffer[1] -= 0xc0; }
else if (c < 0xff) { fread(buffer+0,4,1,f); buffer[0] -= 0xe0; }
else { fgetc(f); fread(buffer+0,4,1,f); }
val = (((((buffer[0] << 8) + buffer[1])<<8) + buffer[2])<<8) + buffer[3];
if ((val & 3) == 3) val ^= 0xfffffffc;
val = (val >> 2) + (val << 30);
return val;
}
Delta File ("save game")
A delta file consists of the following elements:
Offset (in bytes) | Size (in bytes) | Field
|
0 | 16 | signature: "sLUx eXtenDs\012\032\015\377"
|
16 | 16 | Header:
|
16 | 4 | checksum (of program file, not delta file)
|
20 | 4 | version number
|
24 | 4 | global object element #
|
28 | 4 | num-db-entries (in delta file)
|
32 | | program data
|
This is basically the same data as the program file. The delta
file version number should be the version number that the interpreter
was written against, not the original version number; in other words,
if this save file were read by another interpreter, what version must
that interpreter implement in order to handle to the delta file?
Program Data
The data in the delta file consists of a series of data items
formatted identically to the program file. Any data type can
appear at any number; datatypes are encoded using identical
class-ids to the program file. As with the program file, it is
illegal for the same element to be defined more than once, within
this single delta file.
Conceptually, element descriptions in the delta file override
those defined in the program file. Because both files contain
the data items sorted by element #, a single-pass merge operation
can merge the data as it is read in from disk, regardless of the
way the data is represented in memory.
In addition to overriding data descriptions with entirely new
descriptions, additional typecode values can be used:
Typecode = 0
A typecode of 0 indicates that a data item has been deleted
(i.e. garbage collected, not to be confused with being set to
a null datatype).
Negative typecodes
A negative typecode indicates that the element is still the same
type as before, but its data is itself delta encoded from the
description in the program file.
In this version of the specification, the only valid negative
typecode is a delta-encoded object.
Delta-encoded object
A delta-encoded object description looks identical to a non-delta-encoded
object description, but it is interpreted differently. The class field
always overrides the definition in the program file. The <key,value> pairs
are used to update the object: if a given key already exists, then the
value in the delta file overrides the one in the program file; otherwise,
the pair is created. As a special
case, if the value is 0x80000000 (which is never a valid element number),
the pair defined in the program file is deleted.
Entry Points
The entry points to the program are found by using the global element
number as an object reference, and sending a message to it, using a
name. Thus, for any entry points the program needs to use, the compiler
must create the appropriate name and define a method on the global object
which implements it. (See the discussion of message passing for more
information on this process.)
The following entry points are defined (here a C-style function declaration
is used to indicate parameters; the name is just the identifier, not including
anything including or after the left parenthesis); in every case, 'parm' is
a parameter this program supplied to the opcode which triggered the entry
point (restrictions on the contents of parm are discussed with the opcodes):
[[this allows programs to pass data "around" undos and restores]]
- #start()
- invoked when the program is loaded
- #save(error_message, parm)
- invoked when a deltafile is saved; error_message is either 0,
indicating success, or something else, indicating failure; interpreters
that return a string should expect the string to be printed
- #restore(parm)
- invoked when a deltafile is loaded successfully
- #saveundo(parm)
- invoked after successfully creating an undo mark
- #restoreundo(parm)
- invoked after a successful undo
- #error(error_message, stack_trace, grouping)
- invoked when there is a program error; error_message is of type string,
stack_trace is an array, and grouping indicates the structure of the
stack_trace array (every sequential 'grouping' of array elements is a
level of the stack). Interpreters may pass in 0 for any of these.
Note that the use of a leading number-sign (#) character is a convention
in this document that stands for a prefix of a 0-byte; for example, the
name "#start" should be taken to mean the character array
{ 0, 's', 't', 'a', 'r', 't' }.
Note that if a method has the wrong function signature (takes
the wrong number of arguments), the normal error handling will
occur (a #doesNotUnderstand message, and if that fails, an error;
if the error cannot be successfully dispatched for an entry point,
the interpreter should terminate the program).
Additionally, the VM looks for certain certain names to be defined
on the global object, and uses the values of them for certain purposes.
- #int
- The pseudoclass for ints
- #name
- The pseudoclass for names
- other VM class names in the same format
- As above
- #objectname
- If global.#objectname is a name, when debug printing an object, checks
for the existence of this name as an instance variable, and if the value
of it is a string, prints that string. [[Debug printing is an optional
feature of debugging interpreters, so this isn't very important;
typically, debug printing will be handled directly by the program.]]
The global should not have a parent class.
Bytecode format
The executable code for a function is represented as a bytecode
stream, which is described below. This stream consists of a series
of 8-bit bytes which can be parsed into 'instructions'; each
instruction conists of an opcode followed by a series of 0 or more
operands.
For a program to be valid, all of its functions' bytecode must
be strictly parseable. A bytecode stream is strictly parseable
if:
- Starting at the beginning of the bytestream and parsing out
instructions, with each new instruction beginning on the next
byte after the previous instruction, eventually reaches the
end of the bytestream without encountering any invalid opcodes.
- Let the instructions encountered by the above process be the
strictly parsed instructions. Then if any strictly parsed
instruction is a branch instruction, any target of that branch
instruction must itself be a strictly parsed instruction.
- All references to literals and frame variables are within the
range defined by the function's literal pool and frame size.
- The last strictly parsed instruction must be one of the return
opcodes or an unconditional branch.
[[The high concept is that the bytecode is well-formed; there is
no internal padding with jumps around it; there is no sequence
of bytecode which is executed on different passes "out of phase".
These restrictions allow VMs to trivially alter or recompile
bytecode.]]
Slux bytecode uses a register model [[rather than the traditional
stack machine model]]. Each activated function gets its own set
of registers [[which are normally mapped 1-to-1 to variables]].
Register #0 always has the value of self, the object which provides
the current context for instance-variable references and
self-message sends, while register #1 always has the value
of class, which provides the context necessary for superclass
message sends. Neither of these registers can be written to.
Opcode table
Opcodes! We got opcodes!
Function bytecode data consists of a stream of variable-length
instructions. First comes the opcode byte. This is followed
by zero or more operand bytes; the opcode determines how many
operand words are used, although some opcodes use an operand value
to indicate a variable number of operands.
Each opcode is followed by a series of data values, which represent
the location of the source of the data to the operation, and the output
from it. The number of data values is defined by the opcode. The data
values are 8-bit signed values.
The value of the data item is interpreted as referring to a location
as follows:
- value < 0: The |value|-1'th item in the literal pool (e.g.
value=-1 means the first value defined in the literal pool); for an
output (destination), the value -1 means 'discard the output', and
all other negative values are invalid
- value >= 0: The value'th register
The registers are defined when a function is called:
0=self, 1=class, 2,3,4...n+1 = n function arguments,
n+2...n+2+m=local variables.
Only registers can be the targets of operations
(but not registers 0 & 1);
targets of operations are referred to as 'dest' in the tables below.
Some instructions use a 16-bit signed integer literal operand, which
is parsed in a big-endian manner from the bytestream (byte0 * 256 + byte1);
these operands are notated with the suffix _i in the tables below.
Many opcodes support 'opcode overloading' so as to provide operator
overloading in an underlying language. If an opcode described as
supporting overloading is run and would otherwise typecheck,
then if operand p0 is an object, a message is generated of this form:
p0.#opcode(p1,p2,...)
If p0 is not an object, then a message is sent to p0's pseudoclass,
with p0 as the first argument:
getclass(p0).#opcode(p0,p1,p2,...)
(Note that the # prefix stands for a 0-byte, not the number-sign character.)
The result of this message is written to 'dest', if such an operand
exists in the original opcode.
Miscellaneous instructions
- noop
- Has no effect.
- move dest p0
- Stores p0 to dest.
- caller dest
- Stores the value of the calling frame's pseudovariable 'self' in dest.
- global dest
- Stores the value of the global object in dest.
- copy dest p0
- Creates a copy of p0 and stores it in dest. For arrays
and objects, this is a real copy; copy(x)!=x. For strings, names,
integers, functions, currieds, and null objects, copy(x)==x.
- destroy dest p0
If p0 a mutable type (object, array, struct), turns p0 into
an entity of type NULL. If p0 is a system object, informs the
system that p0 will no longer be reached, and turns it into
an entity of type NULL. typechecks for names, strings, int,
function, curried, and static-struct.
- type dest p0
- Returns the type of p0 as a name: $int, $string, $name,
$array, $object, $function, $curried, $null--similarly
for other names defined in the header.
- not dest p0
- If p0 is 0, writes 1 to dest; otherwise, writes 0.
- randomseed dest
- Stores in dest a random number in the full signed 31-bit range
suitable for seeding a pseudorandom generator [[ e.g. it could
be the current time in milliseconds; successive values would
not be particularly random, but a single value would be ]]
Comparison instructions
Comparison instructions evaluate a condition and write the
result to a register. [[These instructions generate C-style
booleans, but most of the time the branch instructions should
be used.]]
- cmpz dest p0
- dest = (p0 == 0)
- cmpnz dest p0
- dest = (p0 != 0)
- cmpe dest p0 p1
- dest = (p0 == p1)
- cmpne dest p0 p1
- dest = (p0 != p1)
- cmpg dest p0 p1
- dest = (p0 > p1)
- cmpge dest p0 p1
- dest = (p0 >= p1)
- cmpgu dest p0 p1
- dest = (p0 unsigned> p1)
- cmpgeu dest p0 p1
- dest = (p0 unsigned>= p1)
Branch instructions
During normal execution, bytecode instructions are executed sequentially;
after executing instruction an instruction at location N which was encoded
in M words, then next instruction executed is at word offset N+M. Branch
instructions change this; if a branch is taken, then the operand data
(which is always interpreted as a literal integer) is the offset to the new
address to continue execution; if the value is X, then the new exectution
address (expressed as a word offset) is N+M+X; in other words, the branch
distance is X*2 bytes.
Branch instructions overload by overloading to the corresponding
comparison instruction, and using the result of that to determine
the branch.
- jump offset_i
- Always branches to offset location.
- jumpz/jumpnz p0 offset_i
- Branches if p0 is 0 or is not 0, respectively. Does not typecheck.
- jumpe p0 p1 offset_i
- If p0 equals p1, branches to offset. Note that this is object equality;
two strings are equal only if they are the same string, not if they have
the same contents; similarly for other data structures. Integers are the
same if they have the same value.
- jumpne p0 p1 offset_i
- Branches if p0 does not equal p1.
- jumpg p0 p1 offset_i
- Branches if p0 is greater than p1 if both are integers; otherwise typecheck
- jumpge p0 p1 offset_i
- Branches if p0 is greater than or equal to p1; both must be integers.
- jumpgu/jumpgeu p0 p1 offset_i
- Branches based on comparison of p0 to p1 with both numbers
considered unsigned [[implementors: note that if using the idealized
tagged 32-bit representation, the 31-bit numbers are sign extended;
however, a direct-unsigned comparison is still valid without masking
off the highest bit]].
- cjump p0 numparm_i off0 off1 ... offn-1
- off0..offn-1 are signed 16-bit branch offsets. If p0 is not an
integer, typecheck. If p0 <= 0, jump to off0. If p0=1, jump
to off1. If p0 >= numparm-1, jump to offn-1.
Integer Math
All of these operations support overloading.
- add dest p0 p1
- Computes the sum of p0 and p1 and stores it to dest. p0 and p1
are added using 32-bit addition, and truncated to 31-bits; the same
instruction serves for both signed and unsigned addition.
- sub dest p0 p1
- Computes p0 minus p1 and stores it in dest.
- mul dest p0 p1
- Computes p0 multiplied by p1 using a signed 32*32->64-bit multiply,
truncated to 31 bits.
- div dest p0 p1
- Computes p0 divided by p1. Division by 0 generates a domaincheck error.
Results are rounded towards.
- mod dest p0 p1
- Computes the remainder of dividing p0 by p1. The remainder is computed
consistently with the divide, so that ((X div Y) mul Y + (X mod Y)) equals X.
- modu dest p0 p1
- Computes the remainder of dividing p0 by p1 such that the remainder is
always non-negative; 0 <= p0 modu p1 < abs(p1).
- neg dest p0
- Computes the negative of p0 and stores it in dest. Negation is identical
to subtraction from 0; therefore (neg 0x40000000) equals 0x400000000.
- abs dest p0
- Computes the absolute value of p0; equivalent to negating p0 if p0 is
negative; therefore (abs 0x40000000) equals 0x40000000.
Integer Bit Operations
All of these operations support overloading.
- lshift dest p0 p1
- Computes p0 bitwise left-shifted by p1 bits; the shift amount is
evaluated by taking the bottom five bits of p1, computing a 32-bit shift,
and truncating the result to 31 bits. [[This means the natural effect
is obtained for 0..31 bit shifts, but the shift can be implemented without
any conditional testing.]]
- rshifts dest p0 p1
- Computes p0 bitwise right-shifted by p1 bits, with the bit length
as described above. The result is sign-extended; that is, the highest
bit of p0 is replicated into the top p1 bits.
- rshiftu dest p0 p1
- Computes p0 bitwise right-shifted by p1 bits as above, but without
sign-extension--the result is filled with 0s. [[Note that implementation
of this is subtle, since the natural 32-bit representation includes one
already sign-extended bit; clearing the high bit before shifting works for
every case except a 0-length shift, where it needs to be reconstructed.]]
- bitand dest p0 p1
- Computes the bitwise-and of p0 and p1: each bit of the destination
is 1 if both of the corresponding bits of p0 and p1 were 1; otherwise the
result bit is 0. This computation uses a 2's complement representation of
p0 and p1 and dest.
- bitor/bitxor dest p0 p1
- Computes the bitwise-or and bitwise-xor respectively of p0 and p1.
Bitwise or produces a 1 bit if either corresponding bit of p0 or p1 is 1;
Bitwise xor produces 1 bit if the corresponding bits of p0 and p1 are different.
- bitnot dest p0
- Inverts all of the bits of p0, stores into dest.
String operations
All opcodes support overloading unless otherwise noted.
- output p0 (not overloaded)
- If p0 is a string, outputs p0 to a display; otherwise does nothing.
- input dest (not overloaded)
- Allows the user to type in text, and stores it in dest as a string.
- strlen dest p0
- If p0 is not a string, typecheck. Stores the number of characters
in the string in dest.
- strextract dest p0 p1 p2
- If p0 is not a string or if p1 or p2 are not integers, typecheck.
If p1 < 0 or p1 > strlen(p0), domaincheck. Otherwise,
produces a new string which consists of the characters x where
p1 <= x < p1+p2, in sequential order, and stores it in dest.
If p1+p2 > strlen(dest), substiute p2 = strlen(dest)-p1.
- strcat dest p0 p1
- If p0 or p1 are not strings, typecheck. Produces a new string
consisting of the characters of p0 followed by the characters of p1.
- strget dest p0 p1
- If p0 is not a string, typecheck. If p1 < 0 or p1 >= strlen(p0),
domaincheck. Stores the p1'th character of p0 into dest. The character
is interpreted as an unsigned 8-bit number.
- strsubs dest p0 p1 p2
- All three values must be strings. Searches for 'p1' in p0, and replaces
all copies with p2. Searching occurs left-to-right; after a substitution,
searching continues after the replaced string, so only the leftmost
of overlapping patterns are substituted, and copies of the target
string created by substitution are not themselves substituted. The
result is stored in dest.
- lowercase/uppercase dest p0
- Produces a copy of p0 with all alphabetic characters converted
to lowercase/uppercase respectively.
- strcmp dest p0 p1
- p0 and p1 must be strings. Returns -1 if p0 < p1, 0 if p0 == p1,
and 1 if p0 > p1, where comparison is in "dictionary order".
- strncmp dest p0 p1 p2
- Compares first p2 characters of p0 with those of p1, as above.
Identical to strcmp(strextract(p0,0,p2),strextract(p1,0,p2)).
- strcasecmp dest p0 p1
- Compares p0 and p1 like strcmp, but performs the comparison
in a case-insensitive fashion.
String conversions
- ntos dest p0
- Converts name p0 to a string. An interpreter may return different
strings each time; that is, the test (ntos(x) == ntos(x)) can vary
between different interpreters.
- ston dest p0
- Converts string p0 to a name with the identical byte pattern.
Any two strings with a given sequence of bytes will generate the
same name; ston("foo")==ston(strcat("fo","o")), but it's not
required that "foo"==strcat("fo","o").
It is required that ston(ntos(n))==n, but not ntos(ston(s))==s
- lowerston dest p0
- Converts string to name, converting the characters of the string
to lowercase first. lowerston(x) == ston(lowercase(x));
[[Is faster because it avoids
creating any temporary data items--significantly faster if
the name already exists.]]
- stoi dest p0
- Converts a string to an integer . If the string is not a number, produces
0. If the number cannot be represented with a signed 31-bit range,
produces unpredictable results. Note that the first character of the
string must be one of the characters "-0123456789", or else 0 will be
produced.
- itos dest p0
- Converts an integer to a string representation of the base-10 encoding
of that integer.
- stoa dest p0
- Creates an array of length strlen(p0), with each of the elements
one character from p0; the values are treated as 8-bit unsigned
values. arrayget(stoa(str),i) == strget(str,i).
- atos dest p0
- If p0 is not an array, or any of its elements are not integers,
typecheck. A new string is created, arraylen characters long.
The bottom 8 bits of each integer in the array are turned into
a character.
Array Processing
Array operations may have to operate on many elements. The opcodes
below indicate the performance that compilers (and language users)
should expect the opcodes to have (possibly amortized over many such
operations); interpreters should attempt to provide such performance;
N indicates the number of elements in the array.
Note that arrays are generally modified in place, rather than returning
a new array.
All allow opcode overloading except as noted.
- arrlen dest p0 O(1)
- If p0 is an array, then stores the number of elements in p0 into dest.
Otherwise, generates an overloaded operator message.
- arrget dest p0 p1 O(1)
- If p0 is an array, then if p1 is not an integer, generates a typecheck
error. If p1 < 0 or p1 >= arrlen(p0), generates a domaincheck
error; otherwise, stores
a copy of the p1'th element of p0 in dest. If p0 is not an array,
typecheck.
- arrset p0 p1 p2 O(1)
- If p0 is an array, then typecheck and domaincheck p1 as above. If
no error, the p1'th element of p0 is overwritten with p2.
- arrappend p0 p1 O(1)
- If p0 is an array, then grow it by one element, and place p1 in the
final slot of the array just created. If p0 is not an array, typecheck.
- arrshrink p0 O(1)
- If p0 is an array, deletes the final element of the array and shrinks it.
If p0 is already empty, generate domaincheck error. If p0 is not an
array, typecheck.
- arrresize p0 p1 O(N)
- If p0 is an array and p1 is a positive integer, resize the array to
length p1, padding with 0s. If p1 is negative, generate domaincheck
error. Otherwise, typecheck.
- arrinsert p0 p1 p2 O(N)
- Typecheck if p0 is not an array or p1 is not an integer. Domaincheck
if p1 < 0 or p1 > arraylength(p0). [[Note this is different than the
standard array domaincheck.]] Otherwise, grow the array p0 by one
element, and shift all the elements starting at the p1'th element
up to a one-higher location; store p2 into the p1'th slot.
- arrdelete p0 p1 O(N)
- Typecheck if p0 is not an array or p1 is not an integer. Domaincheck
if p1 < 0 or p1 >= arraylength(p0). Delete the p1'th element of the
array, shifting the elements starting at p1+1 down by one element.
- arrsearch dest p0 p1 p2 O(N)
- Typecheck if p0 is not an array or p1 is not an integer. Domaincheck
if p1 < 0. Store in dest
the integer location of the lowest-numbered slot >= p1 which contains
element p2, or -1 if p2 is not in any such spot in the array.
- arrfastdel p0 p1 O(N)
- Typecheck if p0 is not an array. Let loc be the lowest-numbered slot
containing element p1, or -1 if no such slot. Then if loc >= 0, the
following occurs. The last element in the array is copied into slot
loc, and then the array is shrunk by one. [[Implementation note:
a tight sentinel-based linear search, followed by a swap and
a shrink.]]
- arrjoin dest p0 p1 O(M+N)
- Typecheck if either p0 or p1 is not an array. Construct a new array
whose length is the sum of the length of p0 and p1, whose contents
consist of the elements of p0 followed by the elements of p1.
- arrinsarr p0 p1 p2 O(M+N)
- Typecheck if either p0 or p2 is not an array, or p1 is not an integer.
Domaincheck if p1 < 0 or p1 > arraylength(p0). Insert the elements of
p2 starting at p1, in order. If p1=0, roughly equivalent to
arrayjoin(p2,p0); if p1=arraylength(p0), roughly equivalent to
arrayjoin(p0,p2); if p2 is a single element array, identical
to arrayinsert(p0,p1,arrayget(p2,0)).
- newarray p0 dest (no overloading)
- Create a new array of length p0, filled with 0s. Typecheck if
p0 is not an integer. Store in dest.
- array num p0 p1 p2 ... p(num-1) dest (no overloading)
- Create an array with num elements p0,p1, etc.
Object Data Access
No opcode overloading.
- testprop dest p0 p1
- p0 is an object, p1 is any data type. if p1 is a key in the
object, dest is set to 1; otherwise to 0
- getprop dest p0 p1
- p0 is an object, p1 is any data type. If p1 is a key in the
object, dest is set to the corresponding value; otherwise
dest is set to 0. [[Use testprop to determine existence.]]
- setprop p0 p1 p2
- p0 is an object. Set the <key,value> pair <p1,p2> to p0;
If p1 is not currently a key of p0, domaincheck.
- addprop p0 p1 p2
- p0 is an object. Set the <key,value> pair <p1,p2> to p0;
if p1 is currently a key of p0, the old value is replaced.
- findprop dest p0 p1
- p0 is an object, p1 is any data type. If p1 is a key in the
object, dest is set to the corresponding value. Otherwise,
the class tree starting with getclass(p0) is searched depth-first
[[see discussion of overriding search order]] for a corresponding
key; if found, the value is put in dest. Otherwise 0 is stored in dest.
[[There is no way to determine existence.]]
- delprop p0 p1
- p0 is an object, p1 is any data type. If p1 is a key in p0,
that <key,value> pair is removed from the object.
- getpropself dest p0
- Identical to "dest = getprop(self(), p0)"; 'self' is a pseudo-variable
not accessible in the normal stack frame.
- findpropself dest p0
- Identical to "dest = findprop(self(), p0)".
- setpropself p0 p1
- Identical to "setprop(self(), p0, p1)".
- getglobalprop dest p0
- Identical to "dest = getprop(global(), p0)".
- setglobalprop p0 p1
- Identical to "setprop(global(), p0, p1)".
- objkeys dest p0
- Creates an array of all of the keys found in p0. The order will
vary with implementation, and the performance is unknown.
- objvals dest p0
- Creates an array of all of the values found in p0. The order will
vary with implementation, and the performance is unknown.
- otoa dest p0
- Creates an array with all of the <key,value> pairs found in p0,
stored in alternating order--all of the keys are in even-numbered
slots, all of the values in odd-numbered slots. The overall
ordering of keys is implementation dependent.
[[Interpreters should optimize (and compilers should expect to
be optimized) getpropself, setpropself, findpropself, getglobalprop
when the key is a name, or, ideally, no matter the type of key.
In particular, getglobalprop should be as-fast-as-possible
O(1) for names, so it can be used as a global namespace.
Also, findpropself (and findprop) should presumably use
the same system for fast lookup as message->method determination,
e.g. a lookup cache, since they have identical semantics.]]
Object manipulation
- create dest p0
- Creates an empty object with its class set to p0. [[It's
up to compilers to synthesize constructors. Also, since this
creates an empty property list, a compiler which wants to
allow a class to define a set of always-defined instance variables
should create an object of that class with the instance variables
pre-defined, and copy that object instead of create'ing the
class. This can be done by hanging a compiler-private method off
the class that caries the relevent object.]] p0 must be an object,
an array of objects, or 0, or else a typecheck error will be
generated.
- plist dest numparm p1 p2 ... pn
- Creates an object with instance variables <p1,p2>, <p3,p4>, etc.
The object's class is 0.
- getclass dest p0
- If p0 is an object, reads the class field of the object and
stores it in dest. Otherwise, stores p0's pseudoclass in 'getclass'.
See discussion of pseudoclasses.
- subclass dest p0 p1
- p1 must be an object. If p0 is not an object, checks if
getclass(p0)==p1, and writes the result in dest (1 if true,
0 if false). If p0 is an object, checks if p1 is an ancestor
of p0.
- setclass p0 p1
- p0 must be an object; p1 must be an object, an array of objects, or 0.
None of the objects referenced by p1 can be pseudoclasses.
Changes the class field of p0 to p1. [[Note: can be arbitrarily
slow.]]
Special operations
- become p0 p1
- All references to p0 become references to p1, and vice versa.
p0 and p1 must be the same datatype, and must be either objects
or arrays [[that is, one of the resizeable by-reference
datatypes]]. If either is not of those types, typecheck.
If the two types are different, domaincheck.
- trap0 p0
- trap1 p0
- trap2 dest p0
- trap3 dest p0
- trap4 p0 p1
- trap5 p0 p1
- trap6 dest p0 p1
- trap7 dest p0 p1
- All trap opcodes produce errors if executed; however, they all
support opcode overloading. Thus, they provide a sort of concise
global function calls which can be used as a form of compression.
[[ For example, an adventure library might use trap0 to encode
'print', and have the overloaded operator call 'output', but
allowing client programs to intercept print or do 'stream
redirection'. An overloaded trap routine can use 'caller' to
determine the value of 'self' at the time of the trap to allow
compressed messages to self. ]]
Function calls / Message passing
The details of method lookup are discussed in a separate section.
For now, note that there are three pieces of explicitly visible
execution context. The simplest two are 'self' and 'class' (accessible
via identically named opcodes). 'self' is the object on whose behalf
the current code is running; property lookups such as 'getpropself' use
this to implement instance variables. 'class' is the class whose method
the currently executing function comes from; for example, if class
foo is a subclass of bar is a subclass of baz, and I call joe.blah(),
if blah is defined in foo, class will be foo; if blah is not defined
in foo, but is defined in bar, class will be bar. This context provides
the information needed for 'super' calls.
When a message is passed, in the new execution context, 'self' will
be the target of the message send, and 'class' will be the class which
provided the method. When a function is called, instead of a method,
the current 'self' and 'class' are "inherited" from the previous context
(that is, they are dynamically scoped). Compilers can disallow "loose"
functions they compile from referencing 'self' and 'class' and using
functions like 'getpropself' or 'msgself'; however, it is always possible
to turn a method into a "loose" function, e.g. by using 'findprop'
instead of a message send, so it would be hard for a language built
on Slux to not expose this behavior.
The frame variables that are created for each function call
are: 0=self, 1=class, 2,3,4...n+1 = n function arguments,
n+2...n+2+m=local variables.
In addition to the two pseudovariables self and class, Slux
provides for arbitrary compiler-defined dynamically-scoped variables,
using an explicit "context stack", controlled by the opcodes 'pushcontext',
'popcontext', and 'getcontext'. These context values are automatically
propogated with full dynamic scope. [[ Although the intent is for the
scope to match the behavior of the function call stack, the specification
does not force this behavior; as a result it is possible for the context
scope to be even stranger, although if it is not made to mimic the call
stack it will almost certainly over- or underflow. Contexts as defined
in Slux do not appear to correspond to anything in mainstream languages,
but are a simple generalization of the pseudovariable 'player' which appears
in some mud languages. The design of contexts also
echoes two different Postscript mechanisms: 'save'/'restore'
which has the same stack semantics, but is normally used on a much coarser
scale, and the dictionary stack, which is semantically different but is
used to achieve similar effects; the dictionary stack would be roughly
equivalent to pushcontext pushing an object of bindings onto the
context stack. ]]
All message sends are overloaded; if the target is not an object,
the equivalent message is sent to the pseudoclass of the target,
with the target as an additional leading parameter.
If a function/message invocation does not match the function's
signature (i.e. they don't both name the same number of parameters),
a domaincheck error is generated. (Actually, for a message, a
#doesNotUnderstand is generated.)
If a "method" binding for a message is not of type function,
or if the opcode func is called on something not of type
function, opcode overloading of func (in both cases) is
invoked. [[ For example, Inform-like semantics would have
the pseudoclass for strings define func as printing the
string, a newline, and returning true. ]]
- getcontext dest p1
- Store the most recent binding of p1 into dest. p1 must be a name.
- pushcontext numbind p1 v1 p2 v2 p3 v3 ... pn vn
- Defines 0 or more bindings of names to values; if a given name
is already bound, the value is temporarily replaced by the new
value, until the corresponding popcontext is seen. p1 ... pn must
be names.
- popcontext
- Undoes the effects of the most recent pushcontext which hasn't
already been popcontext'd.
- return p0
- return0
- return1
- Returns from the current function/method, writing p0/0/1 into
the caller's 'dest' variable location.
- msg dest target mess numparm p1 p2 ... pn
- Sends a message named 'mess' (which must be a name) to 'target'.
The return value is written to 'dest'. msg0 sends a message with
0 parameters; msg1 sends a message with 1 parameter, etc. Note that
this is what is meant elsewhere in this document by the notation
dest = target.mess(p1,p2, ..., pn)
- perform dest target mess paramlist
- Like msg, but the parameters are extracted from the paramlist,
which must be an array. [[Compiler writers: if you expose perform
in your language, you should be careful to make sure your argument
passing order corresponds naturally.]]
- msgself dest mess numparm p1 p2 ... pn
- Like msg, but target=self. Note that 'self' is always an object,
so this cannot overload to a pseudoclass.
- msgglobal dest mess numparm p1 p2 ... pn
- Like msg, but target=global. Note that 'global' is always an object,
so this cannot overload to a pseudoclass.
- super dest mess numparm p1 p2 ... pn
- Traverses the class tree above 'class' and executes the first method
found. If no method is found, 0 is written to dest.
- dsuper dest target mess numparm p1 p2 ... pn
- Like 'super', but specifies the class's method to use as 'target'.
If 'target' does not define 'mess', traverses the tree above 'target'.
Note that 'target' may not be one of self's ancestors; this is still
valid.
- func dest func numparm p1 p2 ... pn
- 'func' must be a function or a closure. Calls 'func' with the
specified parameters and writes the return value to 'dest'.
'self' and 'class' are unchanged.
- funcperform dest func parmlist
- calls the function 'func', but with the parameters derived
from parmlist, which must be an array.
- msgselffunc dest messfunc numparm p1 p2 ... pn
- If 'messfunc' is a function or closure, calls it as in 'func'. If
'messfunc' is a name, does an equivalent 'msgself'. [[This allows
a language syntax that is polymorphic between function calls and
self-message sends.]]
- msgfunc dest msgfunc numparm p1 p2 ... pn
- If msgfunc is a function or closure, behaves like 'func'; if msgfunc
is a name, does a message send with p1 as the target and one fewer
parameters. [[This allows polymorphism given the Sludge language
syntax where a message pass is written as message(obj,p1,p2).]]
- curry dest func numparm p1 p2 ... pn
- Creates a curried function from a function or curried function.
If a function, the first two parameters, p1 & p2, are bound to 'self'
and 'class' in the curried function (as if they were two hidden
parameters to every function). The remaining operands
are bound to the firt numparm-2 arguments of the function.
If already a curried function, the next N parameters after
those already curried are bound.
Caching control
- cacheglobal
- Certain aspects of the VM rely on specific values found in the global
object. Rather than monitor all object writes to check if they are to
relevent global elements, the VM is defined as potentially only reacting
to those changes if cacheglobal is called (and initially computed
after loading the program file [and delta file]--that is, it is as
if the first instruction executed before start() or restore() is
cacheglobal. The relevent keys stored
on the global that must be cached on cacheglobal are:
#int, #name, #string, #array, #function, #closure,
#objectname. [[#pageprompt does not have cache semantics.]]
Implicitly performs a cachepseudo immediately afterwards.
- cachepseudo
- The pseudoclasses provide the definitions of overloaded operators.
The VM may cache the values of all of the overloaded
operators from the pseudoclasses. If the methods for the overloaded
operators are changed, operators may still use the old definitions
(although explicit message passes to the overloaded operators will
see the new ones). After a call to cachepseudo, the overloaded
operators will be in sync again.
- Both of these operators are defined so that implementation-dependent
results occur if changes are made and cachefoo is not called.
System operations
Many system operations cause the current function call stack to
be cancelled and reenter the global object through a particular entry
point. All of these functions allow a "passaround" parameter which
is passed into the global entry point. However, in cases where
the VM is being restored (restore and restoreundo), this passaround
parameter may be copied on the way; this is notated as 'safecopy()'.
The result of a safecopy() on each type is:
- integer: preserved
- name: preserved
- string: a copy is made
- array: a copy is made, and the elements are safecopy()'d.
cyclic array references produce a 0.
- object: a copy is made; all of its elements (keys and values)
are safecopy()'d; however, any of its elements which are themselves
object references are 0'd.
- function: produces 0
- closure: produces 0
- null: produces 0
Note that shared references may be lost under this scenario. VMs can
also not copy, or make copies but keep shared references. [[Compilers
and program authors should expect performance of safecopy() to be very
slow and memory consumptive, and use passarounds as little as necessary.]]
- exit
- Unconditionally terminates execution of the program. Execution never
proceeds after 'exit'.
- start
- Restarts the program as if it had just been loaded.
- save p0 p1
- p0 should be 0 or a string. if p0 is 0, the interpreter may prompt
the player for a location to put a delta file. if p0 is a string, the
interpreter must either use that string as the name of the file, or
prompt the player, at its discretion. If the user cancels the save,
execution will proceed after this instruction. If the save is attempted,
then execution will be stopped and revectored with the call
global.save(0, p1) if it succeeds, or
global.save(error_string, p1)
if it fails. If the interpreter displays sufficient error information
to the user, it should simply pass a 1 for error_string; the global
save() routine should print the error_string if it's a string, on the
assumption that the interpreter did not give the user any feedback.
See the gestalt "saverestoreui".
- restore p0 p1
- p0 should be 0 or a string. if p0 is 0, the interpreter will prompt
the player for a delta file to read. If it is a string, the interpreter
may use that string as the name of the file, if and only if it allows
saving using similar strings. If the user cancels the restore, or if
the delta file does not exist or is obviously corrupt, execution will
proceed after this instruction. If the restore is succesful, then
execution will be stopped and revectored with the call
global.restored(safecopy(p1)). If the restore is not succesful,
then one of two
things will happen: if the interpreter can fully recover the VM state
from the execution of the restore, execution will proceed after the
restore statement exactly as if the restore was cancelled or trivially
failed; if the interpreter cannot fully recover the VM state, then
execution of the program terminates.
- saveundo p0
- Records an 'undo' location, then revectors with
global.saveundo(p0).
- restoreundo p0 p1
- p0 is an integer. Restores the VM state to the p0'th previous 'saveundo',
where 0 means the most recent, 1 means the second most recent, etc.
If the undo is successful, revectors with global.undo(safecopy(p1)).
If the undo is not successful, continues execution.
- error p0
- Generates an error like normal, revectors with
global.error(p0, stacktrace, grouping), unless the interpreter does
its own error display, in which case those will be all 0.
OS operations
- time dest
- Returns the current time in some unknown format. The output
may be any type, depending on the interpreter.
- ctime dest p0
- If p0 is 0, writes the current date and time to a string and
stores it in dest. Otherwise, if p0 is of the type returned by time,
then computes the date and time represented by p0 and writes that
to a string and stores it in dest.
- deltatime dest p0 p1 p2
- Given p0 and p1 both in the format returned by time, and p2 an integer,
computes the difference between time p0 and p1 in units of p2 seconds.
In other words, convert p0 and p1 to seconds since January 1, 1970,
with infinite integer precision. Compute (p0-p1)/p2. If the result
cannot be represented in 31 bits, store the maximum possible value
of the correct sign (0x3FFFFFFF if positive, 0x40000000 if negative).
- Example: Let x = time() at the beginning of play. Let y = time() at
when the player types a command to query how long she's been playing.
Compute z = deltatime(y,x,3600); z is the number of hours.
Use deltatime(y,x,60) for the number of minutes, deltatime(y,x,1)
for the number of seconds, deltatime(y,x,8640000) for the number of
days. [[Rather than load down VMs with the code to break it down
into days/hours/minutes/seconds, then make libraries turn that back
into the desired units, the desired units can be directly computed.
It's also possible to compute days/hours/minutes/seconds by using
remainders.]]
- writefile dest p0 p1
- p0 is a string or 0; p1 is a string. Creates a file named 'p0' and
writes p1 to it; that is, the file will be of length strlen(p1).
If writefile fails, dest will be written with 0. Otherwise, dest
will contain a string which can be passed to 'readfile' to reopen
the same file. Writefile can fail if there is insufficient diskspace
or if there are invalid characters in the filename. Filenames
containing only alphanumeric characters and 12 characters or less
long should never be invalid names.
- readfile dest p0
- p0 is a string or 0. Reads a file named 'p0' if p0 is a string,
or prompts the player if p0 is 0. If the read succeeds, returns
the entire file a string in dest. Otherwise, writes 0 to dest.
[[Writefile should be constrained by interpreters to either never be
silent (i.e. always request approval from the player), or else be
restricted to a single directory "sandbox", e.g. a fixed directory
associated with interpreter, so as to avoid security hazards.]]
VM meta-operations
- gestalt dest p1 p2
- Performs a generic 'gestalt' test, which allows extended 'opcodes'
without specific compiler support. p1 is the gestalt 'selector', which
must be of type name. If a VM does not recognize a name, it should
output 0. The value of p2 varies depending on the selector. If p2 is
not correctly formatted for the given selector, the VM should generate
a domaincheck error. If p1 is not a name, it should generate a
typecheck.
- Currently defined selectors:
- p1=$sluxversion
- If p2 is 0, returns the highest Slux specification version
number supported by the interpreter, formatted as a 4-character
string. If p2 >= 1, returns the highest Slux specification
version number assuming p2 is a product number. If that product
version number is not supported, returns 0.
- p1=$interpname
- p2 must be 0. Returns a string with the name of the interpreter,
suitable for printing in a sentence.
- p1=$interpversion
- p2 must be 0. Returns a string with the version number of the
interpreter, suitable for appending to the interpreter name.
E.g.
print gestalt($interpname,0)," version ",gestalt($interpversion,0)
- p1=$undo
- p2 must be 0. Returns 0 if the interpreter does not support
saveundo/restoreundo, that is, if opcode undo will always fail.
- p1=$saverestoreui
- p2 must be 0. Returns 0 if there is no built-in ui for doing
save/restore, or 1 if there is. If this returns 0, then attempts
to call save/restore with 0 for the name will always fail.
- p1=$cacheglobal
- p2 must be 0. Produces an array of all the names that the
cachekernel opcode will cache from the kernel.
- p1=$compile
- p2 must be 0. If the compile opcode is not a noop, returns
a human-readable string naming the language which the compile
opcode compiles; otherwise returns 0.
- p1=$error
- p2 must be 0. returns 0 if the interpreter has no built-in error
display handling, 1 if the interpreter displays errors
automatically.
- p1=$writefile
- p2 must be 0. returns 0 if writefile is not allowed, 1 if
writefile always prompts the player for the file location,
and 2 if direct writefile is allowed.
Debugging operations
All interpreters must implement these operations, but they can all
be implemented to do nothing at all; those that have a 'dest' parameter
should write a 0 to it. A "debugging" interpreter should provide as many
of these functions as possible, which will facilitate integrated debugging
and interactive development.
- compile dest p0 p1
- p0 is a string. p0 will be compiled; the result of compilation
will be put in dest. That result will be one of: a string encoding
a compile error, an object if the input was an object definition,
or a function if the input was a function definition. Note that
the language defined by 'compile' is implementation-dependent.
p1 can be 0, or it can be an object which contains a mapping of
names to classes, which is used for the compiler to hook up a
class definition to the object hierarchy.
- identifier dest p0
- Produces an integer identifier for element p0, which should be
as unique as possible (given the limit of only 31-bit integers)
and as persistent as possible.
Message passing
Certain function-call-style operations are defined as "message
passing". Such an operation takes an object and a name, computes
an appropriate function to call, and calls it. Computing an appropriate
function to call involves looking up the class hierarchy ancestors
of the objects for a property with that name.
Method lookup
The exact definition of this operation is as follows:
- Search for a property of that name on the object.
- Search for a property named #classancestors on the object.
If it is defined and is an array, iterate through its elements
in order, checking for a property of the message name.
- If #classancestors is not defined, search for the property starting from
the parent class; if not found there, recurse upwards. If the parent
class of a given object is an array, not a single object, then
iteratively recurse up each element of the array; that is, perform
a depth first search. The first implementation found is used.
Some clarifications:
- #classancestors is only checked once; it is not part of the
recursive process described in the last section above
- #classancestors provides a way to override the lookup order in
an arbitrary fashion. There is no requirement that the elements
of #classancestors actually be part of the class ancestry of
the class which defines it.
- If #classancestors does not contain an array, or if the array
contains anything other than objects, typecheck.
Changing the value of #classancestors corresponding to an object which
has a currently-pending method defined on it (that is, there is an unfinished
context in which 'self' contains an object whose parent class is undergoing
the #classancestors change) has implementation-dependent results, and so
should be avoided. [[This allows implementations to cache data associated
with #classancestors, such as the offset within the array for a given
stack frame.]]
Super lookup
The message passing mechanism allows classes to inherit partial
implementations from one another by means of 'super' calls. The
super call invokes a superparent. The search is conducted above
the class which defined the current method (which is found in the
'class' pseudovariable).
The search method varies significantly depending on whether the
self object (or its parent class) defines #classancestors or not:
- If neither self nor the parent class of self define #classancestors,
then a normal recursive search up from 'class' is conducted.
[[Note that this is very different from "find the next method
that the original traversal would have found", since the latter
will consider branches of the class hierarchy which are not
above the current class.]]
- If self or the parent class of self does define #classancestors, then
find the current class in the #classancestors array (treating
the parent class of self as the first element of the array).
Search iteratively through the #classancestors array beyond
this point. If the current class is not in the #classancestors
array, the message send fails.
Note that there is no requirement that the message sent be the
same as the current message, and the semantics in this situation
are well-defined, but I don't see the semantics as particularly
meaningful, and I recommend avoiding relying on these semantics
since they might change.
Directed super
A message can be directed to a particular superclass using the
directed super (dsuper) opcodes. These are defined as simply
looking up the message on the appropriate object, and doing no
traversal. It is the job of the compiler to bind dsupers to the
appropriate object up front. (Note, though, that 'super' calls
from dsuper calls do use the normal traversal rules.)
[[Directed supers allow the compiler to make super calls go to exactly the
right place, but they make it impossible for the super call in
a given class to behave differently in different circumstances,
which is what the #classancestors with normal supers allows.]]
It is the intention of this specification that between directed
supers and #classancestors, it is possible for languages which
create static, compile-time-known class hierarchies to experiment
with different rules for dealing with inheritence when supporting
multiple-inheritence. [[If this is not sufficient, it will change.]]
Method resolution
When a method lookup finds a property value corresponding to
the name of the message, the following sequence of events occurs:
- If the value is an integer, name, string, array, object,
or null, then opcode overloading on func is checked for
the appropriate pseudoclass; if not defined, the value itself
is immediately "returned" as the result value of the message pass.
- If the value is a function or a closure, then compare the
function signature with the number of parameters in the message
pass. If the number of parameters match, build a frame variable
frame with those parameters and execute the function or closure,
producing the value from a 'return' opcode as the result of the message.
(A function with a non-zero number of closure elements is an
unbound closure, and cannot be executed; this will produce a
closurecheck error.) If the number of parameters do not match,
the message send fails--see below.
findprop
The findprop opcodes use the exact same lookup mechanism, but always
simply returns the value, including functions and closures. However,
there is no way to do a 'super' findprop. [[One can actually use a
'super' function call if func hasn't been overloaded, since super
simply returns the values if the result is not a function.]]
Failed message send
If a message send fails, a new message is sent to the object,
#doesNotUnderstand. This message is sent using the same full
search routine, but always to the object from scratch (e.g.
a failed super call's #doesNotUnderstand starts over from the
bottom of the tree).
The #doesNotUnderstand message has two parameters; the first
is the name of the message that failed, and the second is an
array containing the parameters of the original message.
[[It is possible for an object which defines a message to receive
a #doesNotUnderstand for that message, for instance if it attempts
to do a 'super' dispatch where the method is not visible.]]
Function execution
If a function has a non-zero closure data size, then the function
can not be executed, only instantiated by the closure opcode.
If a function has a closure data size of zero, then the function
can be executed by a function call or message send with the correct
number of parameters. If the parameters match, then the function
is executed by creating a new set of "frame variables" for it;
the number of frame variables corresponds to the number defined
in the function frame definition. The passed-in parameters are
copied into the first N slots of the frame variables, and the others
are filled with zeroes.
If a closure is executed with the appropriate number of parameters
(according to the function defined in the closure), then a new set
of "frame variables" is created, based on the size specified by the
function stored in the closure. The N closure variables are copied
into the first N slots of the frame variables, followed by the M
parameters to the function call; the remainder are filled with zeroes.
Version History
- 1.0.0.14
- Reversed the numbering between names and other types. This
means that (a) the out-of-band value is 0x80000000, which is
nicer, and (b) that all names will come first in the file,
which is theoretically good since it means you can process
them fully in the first pass, and process them from the fast
hint array earlier.
- 1.0.0.13
- Removed fullcurry opcode; changed curry opcode to allow
specifying explicit self & class. Added 'null' datatype to
known vmclass names. Added missing description of destroy opcode.
- 1.0.0.12
- Removed displayerror opcode which should be handled with terp UI.
Altered the #classancestors spec to be simpler, although it is probably
a little trickier to use this way (but the same trickiness was always
necessary if an object's parent was an array of classes). Altered
arrsearch to allow searching from the middle.
- 1.0.0.11
- Added num-db-entries to header. Added arrshrink opcode and renumbered
other opcodes as a result.
Removed redundant size in curried-function layout. Clarified file
layout transition from vm classes to database.
- 1.0.0.10
- Introduced VM-classes, basically user-configurable data types.
Back to 8-bit bytecode, yay! Also added comparison opcodes
that correspond to the branch opcodes, which does two things:
shrinks bytecode for operations return x < y (and speeds up
the more rare z = x < y), and more imporantly provides something
which can be overloaded so branches can be overloaded. Removed
redundant halt opcode. Added forceint and typeassert opcode
to allow optimizing interpreters to assume types without replicating
code. Rewrote the introduction to provide information instead
of advocating, and to distinguish the IF components from the
non-IF components. Renamed 'kernel' to be 'global' so it's more
recognizable for targetting other languages. All function dispatches
can do tail recursion by specifying a dest of '-2', which means
'return this value'.
- 1.0.0.9
- Added cjump opcode. Removed class opcode and re-added
copy opcode (had accidentally deleted copy instead of
class).
Renamed classof to getclass. Reorganized opcode numbering for
increased sanity.
- 1.0.0.8
- Fixed the descriptions of msg, func etc. to the correct
opcode names (were funcn etc. from 8-bit opcode spec). More
importantly, defined opcode overloading for 'func' et al so that
e.g. foo.bar() if foo.bar is a string can do things like print
the string. Added links from the opcode table to the opcode
definitions. Removed the self and class opcodes, making them
the first two frame variables, so that they could get referenced
as opcode parameters directly, instead of requiring an intermediate
register.
- 1.0.0.7
- Removed set datatype; added more trap opcodes; added caller;
added context opcodes; move kernel to be 8-bit; allow #classancestors
on objects as well as classes, so that objects can use mixins directly;
removed the random number generator, leaving it to user code, replacing
it with randomseed
- 1.0.0.6
- Removed arrappendx, an old opcode from the 8-bit days.
- 1.0.0.5
- Converted from 8-bit bytecode to 16-bit wordcode for function
data. Added 16-bit data compression to file format. Moved become()
from debugging primitive to true primitive.
- 1.0.0.4
- Added a few more primitives. Wrote description of message lookup
algorithm. This entry is an after-the-fact guess.
- 1.0.0.3
- Added many primitives, compressed string format. This entry is an after-the-fact guess.
- 1.0.0.2
- Added set datatype and numerous small changes. This entry is an after-the-fact guess.
- 1.0.0.1
- Initial revision.