Elyra: Building A Programming Language (Pt. 1)

Well — how should I go about it? It’s probably a question that people ask themselves, get stuck on, and give up. Not to mention the complexity of making a lexer, analyzing syntax and parsing into an AST, semantic analysis, code generation, linking or interpreting. There’s way too much to think about, however I believe a language is doomed to fail if there isn’t some idea of what you want to do.

My first thoughts are of course what do I want to do? In my case I want something with a familiar syntax that operates as a safe(r) and simpler alternative to C++. I’ve read a lot into Zig and Rust, and I really love Zig. This language will be sort of a love letter while incorporating classes. I considered making a C++ transpiler, and maybe that’ll be for another day, but I’d really like to get into LLVM.

So with the broadest possible context I just described, I need to make a basic idea of what I want to do. So let’s go section by section down a hypothetical language with a sample at the end.

Variables

I admire Rust’s philosophy on variable mutability — however I like the syntax of Zig. In my opinion const and var communicates slightly better that a variable is mutable than let and let mut.

Data Types are pretty simple in my opinion. C (and by extension C++’s) type system sucks! What is an int anyways? The only things I really like is size_t which is also represented in Rust and Zig as usize and isize. Beyond this I find the u8, u16, u32, i8, i16, i32 convention to be very easy to understand.

As a subnote Zig has arbitrary bit lengths (e.g. u53) which I find to not be a particularly fine feature — I’d like something like u1, u2, u4 as the middle ground so that you can be more specific on what you’re looking for when manipulating subsections of a byte.

Beyond these base types I think a few other primitive types should exist like void (specifically for functions) and bool for boolean logic, and opaque (essentially void*). Maybe instead of opaque the keyword any could be used — this is pretty arbitrary to change.

Pointers

Pointers will probably use the canonical * as marking the pointer, and it should precede the type in my opinion. So for example *u8 is a pointer to arbitrary bytes. Zig has a feature where there’s *u8 is a singular and [*]u8 refers to multiple. I have no strong opinions on this distinction, but overall I like just using a pointer as a C pointer.

Should pointers be able to be null? No! Any pointer must either be undefined (not yet initialized) or set to a value. If you need nullable pointers, then that should be the role of optionals!

Also I appreciate Zig’s .* syntax because it sure beats *data = 10 and being able to reference structures without using the -> operator is not fun!

Another related concept is slices. I really like Zig’s slice type. If you’re not familiar it’s basically

// In C++
struct Slice<T> {
    T* data;
    size_t len;
};

// In Zig
const Slice = struct {
    data: [*]T,
    len: usize
};

And the syntax is []u8 where this is a slice. Arrays are required to have a size. Slices make no assumptions about the data they point to — a slice may or may not have ownership of the data it references. If the slice is not constant then it may modify the data. It has range syntax as well with 0.. to the end of a slice and 0..3 to be the range [0, 1, 2]. This generates a new slice. Additionally 0...3 refers to the inclusive range [0, 1, 2, 3]

Optionals/Errors

Optionals are a great feature of languages. I really like the way it looks and operates in Swift. An optional is just a ? that you put at the end of a type and allows the ability to set the value to nil. Functionally this type allows us to know whether or not something is set to a value. This is very useful for pointers where maybe you’re calling some function allocate and it fails — you are required to check whether or not this succeeds using the ? operator.

var my_ptr: *u32? = allocate(sizeof(u32));
my_ptr?.* = 25;

if my_ptr |ptr|
{
    ptr.* = 30;
}

var option_value: i64? = nil;

if option_value? == 20 
{
    do_something();
}

You’ll note the capture block in the if-statement. This allows you to discard the ?

I also like Zig’s error handling system using the ! type. This allows you to have errors. One thing I dislike about this however is the error unions — at least in the way that the compiler implements it and the anyerror type. I think that errors should probably just be a structure like so:

struct Error {
     size_t kind = 0x1;
};

The total set of errors can be calculated by the compiler and then you just need to handle the specific kinds by doing some type of match expression or just generically handling it.

So any function that may fail must mark itself as possibly throwing an error. To “throw” an error the error is returned as part of the return type. The try block without a catch is equivalent to try x() catch |e| { return e; }

Functions

Functions have the same general syntax of [pub] fn where pub is optionally public. Functions have a name and parameters (arguments) that are passed in like so pub fn main(argc: i32, argv: *[]const u8) -> i32 The return syntax is the same as Rust and the returns are optionally compiler-inferred.

The arguments to a function of base types like i32 or *u8 are not modifiable. In the case of pointers however the pointer can be dereferenced and modify unless marked *const u8 User defined types are passed by reference, so any modifications made to them are kept unless marked const.

Comments

// is simple enough. Maybe /// for documentation.

Objects, Interfaces, and Inheritance

Time to be controversial. I don’t think OOP/Inheritance is necessarily bad. I think it’s definitely overused and I want to implement very simple principles. I think the functional difference between a struct and a class is default visibility, much like C++. In both cases, struct and class are considered objects and are able to inherit from a parent. Interfaces are abstract-only layouts of objects and can be applied.

Class members are static unless they define an argument this: self. self is just a reference to the current class. And instead of using this you can use .member_var (an idea borrowed from Jakt).

const Foo = class {
    x: u32,
    pub z : u32 {set},

    Foo() {
        .x = 0;
    }

    pub fn bar() {
        print("Bar!");
    }

    pub fn add(this: self, y: u32) -> void {
        .x += y;
    }
}

This has a peculiarity with public variables. You can set public variables as get or set — which automatically checks the access to the variable. If you don’t define this block it’s get + set by default. Beyond this the code is pretty straight forward. X cannot be set without the this definition in the add code. One would call foo.add(10) as it’s an instance method but Foo::bar()

Interfaces are pretty much objects but all functions are assumed virtual unless defined. All interfaces and objects can inherit one other object. Overrides exist for overriding inherited functions. However overloads are not present. To refer to the base class (inherited from) you can use base.method().

const Foo = interface {
    x: u32

    pub fn plus_one(this: self) {
        .x += 1;
    }

    pub fn plus(this: self) {
        .x += 2;
    }
}

const Bar = class from Foo {
    override fn plus(this: self) {
        .x += 3;
    }
}

The from syntax is in reference to the concept class A derived from B. Override comes before the function in the logic. The access specifier doesn’t need to be repeated here because it’s inherited from the parent.

Enum & Match

All enums should have a backing type that it can be effortlessly converted between via a keyword like as. Enums may have values assigned — enums need to be backed by integer types. Enums may be used as literals with just the .value of the actual tag.

const Color = enum(u8) {
    Red = 0,
    Green,
    Blue = 23
};

var color = get_color();
if color == Color.Red or color == .Red
{
    do_something();
}

Match is a drop-in for switch statements. And uses the => syntax from Rust and Zig.

match color 
{
    .Red => { do_something(); },
    _ => { return Error.NotRed; }
}

Control Flow

I’ve been using if throughout the article and it’s pretty clear – the parentheses are entirely optional and the && || is replaced with the word and or. Additionally captures exist for the optional or error type.

for loops like in zig loop over a slice or array and has a capture value for the index and the value. while loops are like traditional C while loops. Infinite loops have a loop syntax.

var bytes: []u8;

for bytes |*value, index| 
{
    value.* = index;
}

while condition 
{
   if other_condition
       break;
   else 
       continue;
}

var x : i32 = 0;
loop 
{
   x += 1;
}

For Now…

That’s it for now. I’m sure I’ve overlooked something. But I’ve spent my time building up the idea and can’t think up too much more. It’s finally time to get into the interesting stuff… code!