Stu is a language I'm developing which is a cross between C, OCaml, and Smalltalk. Because it is syntactically similar to C, I will describe it here in how it is different from C.
Although I have the semantics fairly nailed down in my head, there is a lot of play for the syntactic sugar for it. I welcome suggestions and opinions, e.g. for "where do you need { }" and the "should there be an endif". There's a tradeoff between making it accessible to C programmers and doing the right thing.
fib(x:int)
{
if x < 2 then
return x;
else
return fib(x-1) + fib(x-2);
}
The biggest difference between this program and the equivalent C program is that the variable and its type are reversed. Also, if() and while() statements are if ... then and while ... do instead.
The return type of fib() is int; this is infered by the type inferencer from the type of x. If no type was given, then x could be any type for which x < 2 and x - 1 are valid.
MyFoo
{
x,y : int;
a : MyFoo;
b : MyBar;
}
MyBar could appear earlier or later in the file; forward declarations are not required. There is no trailing semicolon after the { ... } pair, ever.
Like Java, all uses of structs are through pointers, but they are hidden from the user--there is no pointer arithmetic or pointer derefencing (in normal usage; see the discussion of "references" for more). For example:
func1(foo:MyFoo)
{
foo.y = fib(foo.x);
foo.x = foo.b.something;
}
The syntax -> is never used for field/messaging; only . is ever used.
Given a variable definition z:MyFoo, z is never a "null" pointer; it is always safe to "dereference" z, e.g. z.x, z.b. As a result, the structure MyFoo described above is rather useless, since the a field must point to a MyFoo, so there's no way to terminate the list. (You could imagine a cycle of MyFoos, or even one just pointing to itself, but there's no actual way to create one in Stu. The above definition will produce a warning.)
Stu has an equivalent to NULL, which is called none. Any type can be prefixed by the character ? to indicate that it is also allowed to be none. So you can write:
MyBar
{
x,y : int;
a : MyFoo;
b : ?MyBar;
}
func2(bar:?MyBar)
{
if bar != none then
bar.x = fib(bar.y);
}
Often in coding you might have an expression that evalutes to an object of some type or none; you want to compute further on that result but only if it is not none. For example, in C you might have something like this:
result = foo[x].val != NULL ? bar(foo[x].val) : 0;
It would be nice if you didn't have to write 'foo[x].val' twice, and didn't have to introduce an explicit temporary. But there's really no plausible syntax, since you do want to write something like:
result = bar(foo[x].val);
and any "guard" you add to foo[x].val would have to be inside bar() yet express the value assigned to result. (Or it could imply 'actually, don't execute this after all', and abort the assignement entirely, but that would still be a confusing if it's inside bar().)
Instead, Stu provides a special statement type:
Here ? ... -> is a ternary operator which means, if the first expression is none, then the value of the whole expression is none, otherwise put the first result in z, and execute what's after it.foo[x].val ? z -> result = bar(z);
This is just syntactic sugar for an equivalent match statement:
match foo[x].val with
{
none ->
z -> result = bar(z);
}
Matching will be discussed more later.
Polymorphism
There are two sorts of things that are considered types in Stu.
There are value types, which are the types that values take
one. 5 is a specific type--it's an integer. none is a specific
type (symbolically known as None, a type which has only one
value). An instance of MyBar is of the MyBar type, etc.
Then there are what I call variable types, meaning the sorts
of types variables can have. Any given variable type is a
set of value types--possibly a set of only one element.
So x:MyFoo has the variable type { MyFoo }, and x:?MyFoo
has the variable type { MyFoo, None }.
You can define arbitrary sets of types and give it a name
using the variant keyword:
MyThing variant
{
int;
MyFoo;
MyBar;
}
value of type int, MyFoo, or MyBar. b:?MyThing would
be the same, but could also be none.
The variant block is defining a type set, but I've chosen
this keyword since I think in practice you won't think about
sets of types, but rather you'll imagine that this is like a
union but with a tag. (On the other hand, you can also use it
with just one type inside to create a type name alias, so
maybe a keyword like is or some such would be better. OCaml's
use of | to separate alternatives is more consistent with match
and to make it clear that we're talking alternatives, not a
tuple or some such. Maybe I should say 'union' to be utterly clear.)
However, it is implemented with type sets. So if I also have
MyThingToo variant
{
MyFoo;
MyWibble;
None;
}
MyMasterThing variant
{
MyThing;
MyThingToo;
}
then a variable z:MyMasterThing can be of type int, MyFoo,
MyBar, MyWibble, or None; if it is of type MyFoo, I won't
be able to tell whether it's a MyThing or a MyThingToo.
Moreover, no value will ever be of type MyMasterThing,
MyThing, or MyThingToo. A structure definition creates a
value type; a variant definition creates a variable type.
There is a predefined variable "type" Any, which is the
set of all types defined in the program (and the libraries).
Variables can be of this type:
MyThingy
{
x,y : int;
a : Any;
b : ?MyThingy;
}
Downcasting
Given a variable of some set of types, you might want to
operate on it assuming it's some particular type. You do
this by downcasting to the desired type. The special syntax
? ... -> does exactly that for none. (So does a comparison
if x != none then ..., but this is a special case since
the compiler knows that none is the only possible None value.)
You can do the same sort of thing by adding a type tag:
func3(a:Any)
{
a ? f: MyFoo -> f.x = fib(f.y);
a ? b:?MyBar -> doSomething(b);
}
You can also use a downcast which either produces the
desired type or produces none:
f:?MyFoo = (:MyFoo) a; // what's a good syntax?
but if the desired type includes None, you won't be
able to distinguish whether the downcast failed or
whether it was originally null.
Matching
The ?-based matching is a subset of the full matcher,
and is inconvenient when match the same expression multiple
times. This is what the match statement does, but, like
Caml's matcher, it allows rather arbitrary pattern matching.
(Note that the | is nominally a seperator but you can optionally
prefix the entire match list with one, much as you can always
have a trailing comma in comma-seperated definitions.)
match func(a, b.field) with {
| z:MyFoo -> z.x = fib(z.y);
| z:MyBar -> z.b = none;
| z:MyMasterThing -> doSomething(z);
| none -> return false;
| _ -> printError("oops");
}
return true;
Here, _ is a special variable that is thrown away; because it
has no type listed, it's of type Any, so it matches anything.
The cases are matched in order top-to-bottom. The none case can
never match, because None is a member of MyMasterThing, so none
will call doSomething(none). (The compiler will warn about unreachable
matches.)
Pattern matching can do much more sophisticated things as well,
outside their use for downcasting. Discussed later.
Type Parameters
Suppose we define a generic linked list type:
List
{
first : Any;
rest : ?List;
}
We could use this to create arbitrary lists, and they could
even be arbitrarily heterogenous. But if we want homogenous
lists, they're inconvenient; we'd have to add guards around
all the operations on them, which would be ugly and would
slow the code down:
iterateMyThingList(z:?List)
{
while z != none do {
z.first ? a:MyThing -> doSomething(a);
z = z.rest;
}
}
Instead, we define a type parameter on List:
List p
{
first : p;
rest : ?List p;
}
Also, it'll be annoying to always have to say ?List to allow
for an empty list. We could make a variant:
CoreList p
{
first : p;
rest : ?CoreList p;
}
List p variant
{
CoreList p;
None;
}
but for simplicity's sake we can specify a structure definition
as being implicitly a variant-with-None:
List ? p
{
first : p;
rest : List p;
}
If in code the type name List MyFoo appears, then it as if
there were an equivalently defined type. The syntax List ?MyFoo would
define a slightly different type of valid list.
(Can this be confused with the guard syntax? I don't
think a :typename can ever appear at the end of an expression
right now.) An actual type parameter could itself be a parametric
type; suppose we have a type DoubleStuff p q, then we could
have List DoubleStuff MyFoo MyBar. We could also have
DoubleStuff List MyFoo List MyBar. Use variant to give
it a new name:
MessyType variant
{
DoubleStuff (List List MyBar) (DoubleStuff MyFoo (List List MyBar));
// add parentheses for clarity... but how will the compiler know the
// parents are type parameter delimiters and not function types?
}
Note that you cannot downcast from a List Any to a List MyFoo.
There is no way to tell, given a value of List Any, whether the
entire list is homogenous or not. You can only downcast it field
by field. (This is only true of recursive types? Although a non-recursive
type could still require a bunch of comparison at run time,
e.g. MyType p { a,b,c,d,e,f,g,h,i:p } Hmm, maybe not, since
they have to all be the same type anyway.)
I guess you could put type parameters on a variant as well:
SomeType p variant
{
MyFoo;
List p;
List ?p;
}
and then you say SomeType MyBar. Although this actual case
is problematic (there's no way to distinguish the two Lists
propertly), you could always manually write
SomeType_MyBar
{
MyFoo;
List MyBar;
List ?MyBar;
}
so we might as well support the syntax.
You can also omit the paramters, in which case they are treated
as any type. So List Any and List are the same.
On the other hand, you can (implicitly) cast in the other direction, so
I can say
listLength(n:List)
{
if n == none then return 0;
return 1 + listLength(n.rest);
}
myFunc(f:List MyFoo)
{
n:int = listLength(f);
...
}
and it just works.
Object Programming
It should be obvious with this talk of upcasting and downcasting
how the type-set scheme is compatible with the typecasting aspects
of object-orientation; e.g. the type set for a classname includes the
types of all its subclasses.
However, while the above strives for safety, we do want to
allow more smalltalk-like operations, and we would like to
be able to avoid guards. Therefore, we introduce unsafe
operations which can throw exceptions if they're not valid.
So I can take an x:Any and send it a message x.myMessage(),
and it either runs x's myMessage method or throws an exception
if it doesn't have one.
Should we have . statically checked and -> dynamically
checked? Or we could just allow a dynamically checked . in
the case that the left hand of the operator is of type
Any--but, especially with inference, this might happen
without you realizing it. Or we could have all field get/set
be statically checked, and all messages dynamic. (And the
compiler can still warn if it can statically tell a message
is bad... although maybe you want it to throw that exception.)