Stu

An early look

Stu is a language I'm developing which is a cross between C, OCaml, and Smalltalk. Because it is syntactically similar to C, I will describe it here in how it is different from C.

Although I have the semantics fairly nailed down in my head, there is a lot of play for the syntactic sugar for it. I welcome suggestions and opinions, e.g. for "where do you need { }" and the "should there be an endif". There's a tradeoff between making it accessible to C programmers and doing the right thing.

A sample Stu routine

fib(x:int)
{
   if x < 2 then
      return x;
   else
      return fib(x-1) + fib(x-2);
}

The biggest difference between this program and the equivalent C program is that the variable and its type are reversed. Also, if() and while() statements are if ... then and while ... do instead.

The return type of fib() is int; this is infered by the type inferencer from the type of x. If no type was given, then x could be any type for which x < 2 and x - 1 are valid.

Garbage Collection

Unlike C, Stu is fully garbage-collected. This simplifies many programming tasks and makes functional-style programming much more effective. Unlike most garbage-collected languages, though, Stu can still be very memory efficient; most objects have only 4 bytes of overhead, and structures can contain structures directly instead of being indirected. See the "about performance" section for other considerations.

Structures

A structure type is defined in global scope in the following manner:

MyFoo
{
   x,y : int;
   a   : MyFoo;
   b   : MyBar;
}

MyBar could appear earlier or later in the file; forward declarations are not required. There is no trailing semicolon after the { ... } pair, ever.

Like Java, all uses of structs are through pointers, but they are hidden from the user--there is no pointer arithmetic or pointer derefencing (in normal usage; see the discussion of "references" for more). For example:

func1(foo:MyFoo)
{
   foo.y = fib(foo.x);
   foo.x = foo.b.something;
}

The syntax -> is never used for field/messaging; only . is ever used.

None

Given a variable definition z:MyFoo, z is never a "null" pointer; it is always safe to "dereference" z, e.g. z.x, z.b. As a result, the structure MyFoo described above is rather useless, since the a field must point to a MyFoo, so there's no way to terminate the list. (You could imagine a cycle of MyFoos, or even one just pointing to itself, but there's no actual way to create one in Stu. The above definition will produce a warning.)

Stu has an equivalent to NULL, which is called none. Any type can be prefixed by the character ? to indicate that it is also allowed to be none. So you can write:

MyBar
{
   x,y :  int;
   a   :  MyFoo;
   b   : ?MyBar;
}

func2(bar:?MyBar)
{
   if bar != none then
      bar.x = fib(bar.y);
}

Often in coding you might have an expression that evalutes to an object of some type or none; you want to compute further on that result but only if it is not none. For example, in C you might have something like this:

result = foo[x].val != NULL ? bar(foo[x].val) : 0;

It would be nice if you didn't have to write 'foo[x].val' twice, and didn't have to introduce an explicit temporary. But there's really no plausible syntax, since you do want to write something like:

result = bar(foo[x].val);

and any "guard" you add to foo[x].val would have to be inside bar() yet express the value assigned to result. (Or it could imply 'actually, don't execute this after all', and abort the assignement entirely, but that would still be a confusing if it's inside bar().)

Instead, Stu provides a special statement type:

foo[x].val ? z -> result = bar(z);

Here ? ... -> is a ternary operator which means, if the first expression is none, then the value of the whole expression is none, otherwise put the first result in z, and execute what's after it.

This is just syntactic sugar for an equivalent match statement:

match foo[x].val with
{
    none ->
    z    -> result = bar(z);
}

Matching will be discussed more later.

Polymorphism

There are two sorts of things that are considered types in Stu. There are value types, which are the types that values take one. 5 is a specific type--it's an integer. none is a specific type (symbolically known as None, a type which has only one value). An instance of MyBar is of the MyBar type, etc.

Then there are what I call variable types, meaning the sorts of types variables can have. Any given variable type is a set of value types--possibly a set of only one element. So x:MyFoo has the variable type { MyFoo }, and x:?MyFoo has the variable type { MyFoo, None }.

You can define arbitrary sets of types and give it a name using the variant keyword:

MyThing variant
{
   int;
   MyFoo;
   MyBar;
}

value of type int, MyFoo, or MyBar. b:?MyThing would be the same, but could also be none.

The variant block is defining a type set, but I've chosen this keyword since I think in practice you won't think about sets of types, but rather you'll imagine that this is like a union but with a tag. (On the other hand, you can also use it with just one type inside to create a type name alias, so maybe a keyword like is or some such would be better. OCaml's use of | to separate alternatives is more consistent with match and to make it clear that we're talking alternatives, not a tuple or some such. Maybe I should say 'union' to be utterly clear.)

However, it is implemented with type sets. So if I also have

MyThingToo variant
{
   MyFoo;
   MyWibble;
   None;
}

MyMasterThing variant
{
   MyThing;
   MyThingToo;
}

then a variable z:MyMasterThing can be of type int, MyFoo, MyBar, MyWibble, or None; if it is of type MyFoo, I won't be able to tell whether it's a MyThing or a MyThingToo. Moreover, no value will ever be of type MyMasterThing, MyThing, or MyThingToo. A structure definition creates a value type; a variant definition creates a variable type.

There is a predefined variable "type" Any, which is the set of all types defined in the program (and the libraries). Variables can be of this type:

MyThingy
{
   x,y :  int;
   a   :  Any;
   b   : ?MyThingy;
}

Downcasting

Given a variable of some set of types, you might want to operate on it assuming it's some particular type. You do this by downcasting to the desired type. The special syntax ? ... -> does exactly that for none. (So does a comparison if x != none then ..., but this is a special case since the compiler knows that none is the only possible None value.)

You can do the same sort of thing by adding a type tag:

func3(a:Any)
{
   a ? f: MyFoo -> f.x = fib(f.y);
   a ? b:?MyBar -> doSomething(b);
}

You can also use a downcast which either produces the desired type or produces none:

   f:?MyFoo = (:MyFoo) a;    // what's a good syntax?

but if the desired type includes None, you won't be able to distinguish whether the downcast failed or whether it was originally null.

Matching

The ?-based matching is a subset of the full matcher, and is inconvenient when match the same expression multiple times. This is what the match statement does, but, like Caml's matcher, it allows rather arbitrary pattern matching.

(Note that the | is nominally a seperator but you can optionally prefix the entire match list with one, much as you can always have a trailing comma in comma-seperated definitions.)

match func(a, b.field) with {
   | z:MyFoo         -> z.x = fib(z.y);
   | z:MyBar         -> z.b = none;
   | z:MyMasterThing -> doSomething(z);
   | none            -> return false;
   | _               -> printError("oops");
}
return true;

Here, _ is a special variable that is thrown away; because it has no type listed, it's of type Any, so it matches anything.

The cases are matched in order top-to-bottom. The none case can never match, because None is a member of MyMasterThing, so none will call doSomething(none). (The compiler will warn about unreachable matches.)

Pattern matching can do much more sophisticated things as well, outside their use for downcasting. Discussed later.

Type Parameters

Suppose we define a generic linked list type:

List
{
   first :  Any;
   rest  : ?List;
}

We could use this to create arbitrary lists, and they could even be arbitrarily heterogenous. But if we want homogenous lists, they're inconvenient; we'd have to add guards around all the operations on them, which would be ugly and would slow the code down:

iterateMyThingList(z:?List)
{
   while z != none do {
      z.first ? a:MyThing -> doSomething(a);
      z = z.rest;
   }
}

Instead, we define a type parameter on List:

List p
{
   first : p;
   rest  : ?List p;
}

Also, it'll be annoying to always have to say ?List to allow for an empty list. We could make a variant:

CoreList p
{
   first : p;
   rest  : ?CoreList p;
}

List p variant
{
   CoreList p;
   None;
}

but for simplicity's sake we can specify a structure definition as being implicitly a variant-with-None:

List ? p
{
   first : p;
   rest  : List p;
}

If in code the type name List MyFoo appears, then it as if there were an equivalently defined type. The syntax List ?MyFoo would define a slightly different type of valid list. (Can this be confused with the guard syntax? I don't think a :typename can ever appear at the end of an expression right now.) An actual type parameter could itself be a parametric type; suppose we have a type DoubleStuff p q, then we could have List DoubleStuff MyFoo MyBar. We could also have DoubleStuff List MyFoo List MyBar. Use variant to give it a new name:

MessyType variant
{
   DoubleStuff (List List MyBar) (DoubleStuff MyFoo (List List MyBar));
   // add parentheses for clarity... but how will the compiler know the
   // parents are type parameter delimiters and not function types?
}

Note that you cannot downcast from a List Any to a List MyFoo. There is no way to tell, given a value of List Any, whether the entire list is homogenous or not. You can only downcast it field by field. (This is only true of recursive types? Although a non-recursive type could still require a bunch of comparison at run time, e.g. MyType p { a,b,c,d,e,f,g,h,i:p } Hmm, maybe not, since they have to all be the same type anyway.)

I guess you could put type parameters on a variant as well:

SomeType p variant
{
   MyFoo;
   List p;
   List ?p;
}

and then you say SomeType MyBar. Although this actual case is problematic (there's no way to distinguish the two Lists propertly), you could always manually write

SomeType_MyBar
{
   MyFoo;
   List MyBar;
   List ?MyBar;
}

so we might as well support the syntax.

You can also omit the paramters, in which case they are treated as any type. So List Any and List are the same.

On the other hand, you can (implicitly) cast in the other direction, so I can say

listLength(n:List)
{
   if n == none then return 0;
   return 1 + listLength(n.rest);
}

myFunc(f:List MyFoo)
{
   n:int = listLength(f);
   ...
}

and it just works.

Object Programming

It should be obvious with this talk of upcasting and downcasting how the type-set scheme is compatible with the typecasting aspects of object-orientation; e.g. the type set for a classname includes the types of all its subclasses.

However, while the above strives for safety, we do want to allow more smalltalk-like operations, and we would like to be able to avoid guards. Therefore, we introduce unsafe operations which can throw exceptions if they're not valid. So I can take an x:Any and send it a message x.myMessage(), and it either runs x's myMessage method or throws an exception if it doesn't have one.

Should we have . statically checked and -> dynamically checked? Or we could just allow a dynamically checked . in the case that the left hand of the operator is of type Any--but, especially with inference, this might happen without you realizing it. Or we could have all field get/set be statically checked, and all messages dynamic. (And the compiler can still warn if it can statically tell a message is bad... although maybe you want it to throw that exception.)