Reserved keywords (16):
and asm defer drift
else for if in
lov match or out
ro rw while wo
Many other words are reserved as either literal values,
primitive types, or are part of the builtin @
(for literals, function calls) or builtin dot @.
(for decorators, attributes) namespace.
A program entry point is implemented as a container qualified
with a @. parametrized domain argument:
\print @import("std").dbg.print
@.pub \main @.("host") domain {
print("hello world!\n", .{})
}
123,
0xff,
0xFF,
0o743,
0b11110000,
0b1111_0000_11110000,
12_345_678.
123.0e+77,
123.0,
0x103.70p-5,
0x103.70P-5,
0x103.70,
0.000_000_001.
.[0.2, 0.5, 0.2],
.[0.0, 0.5, 1.0, 0.5, 0.0, -0.5, -1.0, -0.5],
.[0.701, -0.701].
'a',
'n',
'x',
'1',
'\n',
'\0'.
"kotki",
"Smocze Jaja",
"12345678".
true,
false,
null,
nan,
inf,
undefined.
'\a' - bell (BEL)'\b' - backspace (BS)'\e' - escape (ESC)'\f' - form feed (FF)'\n' - newline'\r' - carriage return'\t' - tab'\v' - vertical tab (VT)'\\' - backslash'\"' - double quote'\'' - single quote'\NNN' - octal 6-bit character (3 digits)'\xNN' - hexadecimal 8-bit character (2 digits)'\uNNNN' - hexadecimal 16-bit Unicode character UTF-8 encoded (4 digits)'\uNNNNNNNN' - hexadecimal 32-bit Unicode character UTF-8 encoded (8 digits)
Numerical literals are written similar to other programming languages.
Underscores are allowed for better readability:
1_000_000_000 (one billion).
The scientific notation is allowed for floating point and integer numbers alike,
and symbols e, E, p or P are allowed.
If used with + or -, e.g. 1e-10,
the number is a floating point literal.
A number that contains a dot is a floating point literal as well:
1.0e9.
Lovage supports 2 types of comments. Normal comments are ignored, but doc comments are used by the compiler to generate the package documentation.
There are no multiline comments in Lovage (e.g. /* */ comments in C).
Because of this property each line of code can be tokenized out of context.
\print @import("std").dbg.print
\main @.("host") domain {
// Comments in Lovage start with "//" and end at the next LF byte (end of line).
// The line below is a comment and won't be executed.
//print("zjadlem muche :3\n", .{})
print("ugryze cie >:(\n", .{})
}
A doc comment is one that begins with exactly three slashes
(i.e. ///, but not ////).
Multiple doc comments in a row are merged together to form a multiline doc comment.
The doc comment documents whatever immediately follows it.
A root doc comment is created by documenting @self
in the scope of a module file.
\cacheline_size @import("builtin").cacheline_size
/// The MPMC ring buffer was implemented from the multiple-producer, multiple-consumer
/// queue as described by Dmitry Vyuko on 1024cores.
\Mpmc (lov T : type) => @.(cacheline_size) align {
/// User provided buffer of type T to hold produced data.
data : [^]rw T,
/// User provided buffer of atomic counters for every cell of data.
sequence : [^]rw @.atomic usize,
/// Basically (cell_count-1) given at runtime during initialization.
mask : usize,
/// Read and write operations on an atomic object are free from data races.
/// However, if one thread writes to it, all cache lines occupied by the object
/// are invalidated. If another thread is reading from an unrelated object that
/// shares the same cache line, it incures unnecesary overhead. This is called
/// false sharing, and we pad the MPMC structure to avoid that.
@.(cacheline_size) align
enqueue_pos : @.atomic usize,
/// Padding to avoid false sharing (this is a doc comment too).
@.(cacheline_size) align
dequeue_pos : @.atomic usize,
// ...
}
Documentation can be produced with:
lovage -femit-docs main.lov
Unit tests are resolved by a runner provided during the build process.
Any domain qualified with @root("test") is recognized as a unit test target.
The user may return an error value for the runner to interpret the result of a test.
A test domain should be named over an identifier; a more readable identifier name
can be achieved through a @"name" string literal.
/// The standard library exposes some common operations to simplify unit testing.
\test @import("std").test
/// This is the stable identifier of the test program,
/// with a more or less readable user-defined name.
\@"My cool fun amazing unit test"
/// Any entrypoint to a program must be a domain with an explicit execution model,
/// the "test" environment interprets the functions body as a testing program for build-time invocation.
@.("test") domain
{
// The @.domain body holds any runnable code.
@.try test.expect(true == true)
}
To programatically skip a test, make a test domain return the error
@error.SkipLovageTest and the default test runner will consider the
test as being skipped.
\@"this test will be skipped"
@.("test") domain
{
drift => test.hints.skip_lovage_test
}
A variable is a unit of memory storage. By default all variables are immutable, and lov-known if possible, unless explicitly set otherwise. Variables are never allowed to shadow identifiers from an outer scope.
Interface binding is defined by in for read-only access,
out for write-only access, and in out for both.
Keyword inout is equivalent to writing in out.
For host applications this has implications on the memory model.
For device shaders this directly maps to registers and other GPU bullshit.
Using out, in out or out in for values passes them by a mutable reference,
instead of by copy.
For arrays or other complex types (e.g. vectors, matrices), because they are always
passed by reference, out just gives mutable access to them too.
Data has access capabilities that come from the memory model, and are mandatory for indirection (e.g. pointers, buffers, images):
rw defines a fully mutable pointee,
so a read-write access pointer.
ro defines a read-only pointee and is basically
an immutable resource view.
wo defines a write-only pointee and is basically
an unreadable resource sink.
The access capabilities can be optionally assigned to any binding declaration to redefine mutability of the binding.
\val rw s32 = undefined
\a ro s32 = 10
val = a // legal read
// a += 1 // illegal, explicit read-only
// a = 1 // illegal, explicit read-only
\b wo s32 = 10
// val = b // illegal, write-only cannot read
// b += 1 // illegal, write-only cannot read self
b = 1 // legal, we can overwrite value of b
\c rw s32 = 10
val = c // legal read
c += 1 // legal assignment, explicit read-write
c = 1 // legal assignment, explicit read-write
The lov qualifier can be put in place of
ro, rw or wo.
In fact, the shorthand declaration grammar is by itself a lov expression
with an inferred type:
\ident expr
// equivalent to:
\ident lov = expr
A destructuring assignment can separate elements of indexable aggregate types (tuples, arrays, vectors, matrices), for example in assignment:
\main @.("host") domain .{
// Both `x` and `y` will have the rightmost declared type (for `z`) of `rw u32`.
\x, \y, \z rw u32 = undefined
\tuple .{1, 2, 3}
x, y, z = tuple
\array [_]u32{4, 5, 6}
x, y, z = array
\vector u32x3{7, 8, 9}
x, y, z = vector
// You can use _ to throw away unwanted values:
_, x, _ = tuple
// You can declare new variables while destructuring:
x, \a ro u32, \b rw s32 = tuple
}
A named function can be defined via this pattern:
(<binding> arg1 : data_type, ...) => expr,
where <binding> is one of:
in // read-only
out // write-only
in out // read/write
out in // read/write, equivalent to `in out`
lov // compile-time
If the binding is not explicitly specified, in is implied.
The compile-time binding enforces that a parameter is one of:
A lambda expression is an anonymous function defined via this pattern:
.( <opt_binding> arg1 : opt_data_type, ...) => expr.
In this case, bindings and explicit data types can be omitted, with the condition that the compiler can properly infer the resulting type of a lambda expression. Compile-time bindings are allowed and encouraged for generalized meta-expressions.
Partial application is the natural way to "bind later":
add(5, _) // .(x) => add(5, x)
.(_ + 1) // .(x) => x + 1
Any binary (or unary) operator wrapped in .() becomes a first-class lambda expression:
.(op) // lowers to:
.(a,b) => a op b // binary
.(a) => op a // unary
A function prototype can be declared only with an explicit result type as the expression:
\add_fn (in x1 : s32, in x2 : s32) => s32
Such implementations are directly compatible with the prototype signature:
\add1 (in x1 : s32, in x2 : s32) => s32 { x1 + x2 }
\add2 add_fn { x1 + x2 }
A function pointer type can be declared over a predefined or inline signature:
\add_pfn ^ro (in x1 : s32, in x2 : s32) => s32
\addp1 ^add_fn = &add1
\addp2 add_pfn = &add2
\addp3 &add1
\addp4 ro = &add2
\addp5 ^ro add_fn { &add1 }
\addp6 add_pfn { &add1 }
| Operator | Description |
|---|---|
a=b | Assignment, both initialization and mutation. |
a+b, a+=b | Addition, can cause illegal overflow. |
a+%b, a+%=b | Wrapping addition, where overflow is legal. |
a+|b, a+|=b | Saturating addition, clamped to the value range of the type inferred from this expression. |
a-b, a-=b | Subtraction, can cause illegal overflow. |
a-%b, a-%=b | Wrapping subtraction, where overflow is legal. |
a-|b, a-|=b | Saturating subtraction, clamped to the value range of the type inferred from this expression. |
-a | Negation, can cause illegal overflow for integers, -1 == 0 - 1. |
-%a | Wrapping negation, where overflow is legal. |
a*b, a*=b | Multiplication, can cause illegal overflow. |
a*%b, a*%=b | Wrapping multiplication, overflow is legal. |
a*|b, a*|=b | Saturating multiplication, clamped to the value range of the type inferred from this expression. |
a/b, a/=b | Division, can cause illegal overflow, division by zero. |
a%b, a%=b | Modulo division, can cause illegal overflow, division by zero. |
a%%b, a%%=b | Remainder division, can cause illegal overflow, division by zero. |
a<<b, a<<=b | Bit shift left. b must be compile-time known or have a type with log₂ number of bits as a. |
a<<|b, a<<|=b | Saturating bit shift left. |
a>>b, a>>=b | Bit shift right. b must be compile-time known or have a type with log₂ number of bits as a. |
a&b, a&=b | Bitwise AND. |
a|b, a|=b | Bitwise OR. |
a~b, a~=b | Bitwise XOR. |
a&~b, a&~=b | Bitwise NAND. |
~a | Bitwise NOT (unary ~). |
!a | Logical boolean NOT, !false == true. |
a and b | Logical AND - if a is false, returns false without evaluating b. Otherwise, returns b, false and true == false. |
a or b | Logical OR - if a is true, returns true without evaluating b. Otherwise, returns b, false or true == true. |
&a | Address of a. |
a^ | Dereference of a. |
a?b | Defaulting optional unwrap - returns b if a is null, otherwise returns the unwrapped value of a. |
a{}, a{...} | Container invocation, e.g. struct init. |
a(), a(...) | Function invocation. |
a[], a[0..N] | Array member access, ranged 0..N slice into array. |
a.|b | Parallel composition, places two domains on top of another without dependencies or other connections between them. |
a.>b | Sequential composition, or function pipeline: capture value of expression a and use it as the first in parameter of b - basically b(a). |
a<~b | Recursive composition, output of b feeds back into a. |
a<:b | Structural split, wires outputs of a to multiple inputs of b. |
b:>a | Structural merge, wires multiple outputs of a to inputs of b. |
a=>b | Logical implication used in lambda expressions, destructuring and match arms. The conclusion of b can be drawn from a. |
a==b | Equality - returns true if a and b are equal, otherwise false. |
a!=b | Inequality - returns false if a and b are equal, otherwise true. |
a>b | Greater than - returns true if a is greater than b. |
a>=b | Greater or equal - returns true if a is greater or equal to b. |
a<b | Less than - returns true if a is less than b. |
a<=b | Lesser or equal - returns true if a is less than or equal to b. |
a++b | Merge error or bit sets into one new set, concatenate compile-time arrays. |
a**b | Multiply compile-time array by b. |
x() x[] x.y x^
a!b a?b
x{}
!x -x -%x ~x &x ^x ?x
* / % %% ** *% *|
+ - ++ +% -% +| -|
<< >> <<|
&~ & ~ |
== != < > <= >=
and
or
<~ <: :>
.|
.>
*%= *|= /= %= %%= += +%= +|= -= -%= -|= <<= <<|= >>= &= &~= ~= |= => =
.|
The parallel composition .| forms the dojoin union of two graphs.
Inputs and outputs are concatenated in order:
inputs(A .| B) = inputs(A) ++ inputs(B)
outputs(A .| B) = outputs(A) ++ outputs(B)
No edges are introduced between A and B.
Parallel composition is an associative operation,
so (A .| (B .| C)) and ((A .| B) .| C) are equivalent.
When no parentheses are used (e.g. A.|B.|C.|D), parsing will use
right-associativity and therefore build internally the expression
(A .| (B .| (C .| D))).
This organization is important to know when using pattern matching techniques
on parallel compositions.
.>
The sequential composition .> connects the outputs of A to the inputs of B.
A composition is valid when:
outputs(A) = inputs(B)
The graph connects each output of A to the corresponding input of B:
A[i] -> [i]B
The resulting graph exposes:
inputs(A .> B) = inputs(A)
outputs(A .> B) = outputs(B)
Sequential composition is an associative operation,
so (A .> (B .> C)) and ((A .> B) .> C) are equivalent.
When no parentheses are used (e.g. A.>B.>C.>D), parsing will use
right-associativity and therefore build internally the expression
(A .> (B .> (C .> D))).
This organization is important to know when using pattern matching techniques
on parallel compositions.
When applied to functions, .> behaves as a forward pipe:
a .> b .> c
// lowers to: c(b(a))
<~
The recursive composition A <~ B introduces a cycle into the graph
and creates a feedback loop.
This may be interpreted as a fixpoint operation:
(A <~ B)(x) = fixpoint(y => B(A(x,y)))
The exact lowering strategy is execution-model dependent.
The use of feedback must guarantee one of the following:
@.inline.A feedback composition that cannot be lowered to a valid execution model will result in a compile error.
Execution environments that do not support recursion or cyclic evaluation
may impose additional restrictions on the use of <~.
<:
The split composition A <: B distributes outputs across a larger
number of inputs.
For the operation to be valid, the number of inputs of B must be a multiple
of the number of outputs of A:
outputs(A)*k = inputs(B)
Each input i of B is connected to the output
i mod k of A:
A[i mod k] -> [i]B
:>
The merge composition A :> B is the dual of the split composition.
The number of outputs of A must be a multiple of the number of inputs of B.
outputs(A) = k*inputs(B)
Each output i of A is connected to the input
i mod k of B:
A[i] -> [i mod k]B
For each input slot j of B:
values = { A[j], A[j + inputs(B)], A[j + 2*inputs(B)], ... }
sN): s8, s16, s32, s64, s128.uN): u8, u16, u32, u64, u128.fN): f16, f32, f64, f80, f128.eNmM): e4m3, e5m2.bool, void, type, noreturn.byte, short, ushort, int, uint, long, ulong, longlong, ulonglong.Tle or Tbe, where T is any 16-bit or larger primitive scalar type.TxN where T is any primitive scalar type, and N is row count.TmMxN where T is any primitive scalar type, M is column count and N is row count.
Arbitrary bit-width integers can be referenced by using an identifier of
s or u followed by digits.
For example, the identifier s7 refers to a signed 7-bit integer.
The maximum bit-width of an integer type is 65535 for CPU targets.
For GPUs it's not that simple I guess, this will be explored later.
A two's complement representation may be used for signed bit-width integers:
https://en.wikipedia.org/wiki/Two's_complement
There are unsigned integer types with forced byte-endianness for integers larger
than one byte.
They are constructed from a prefix le for little-endian or
be for big-endian, followed by the bit-width of a basic integer type.
The endian qualifier has no effect for 8-bit types.
Valid endian types are:
s16le, s32le, s64le, s128le, s16be, s32be, s64be, s128be,
u16le, u32le, u64le, u128le, u16be, u32be, u64be, u128be,
f16le, f32le, f64le, f128le, f16be, f32be, f64be, f128be
This data type can be byte-swapped to native and forced endianness,
by casting between a basic and endian integer type via
@endian_cast().
An ordered homogeneous collection of two or more scalars:
TxN for N (row count) between 1 and 4,
and where T is a scalar floating-point or integer type.
Examples of vector types are:
u8x2, s16x3, f32x4, u64x4.
A vector type can be lowered to an array with N elements of type T.
Vector types allow a compile-time swizzle and shuffle access to its elements.
Vector followed by . and [xyzw|rgba] patterns,
returning the underlying scalar or vector type, e.g.
v.xyz, v.x, v.yx,
v.rgba, v.bgra, v.xxx.
Mixing xyzw and rgba patterns is not allowed for readability.
This allows vector elements to be reordered or duplicated.
Array indexing is also supported: v[i].
An ordered homogeneous collection of vectors:
TmMxN for M (column count) and N (row count) between 1 and 4,
and where T is a scalar floating-point or integer type.
Examples of matrix types are:
s32m2x2,
f32m4x3,
f64m3x2,
f32m4x4,
u16m4x2.
A matrix type can be lowered to an array with N elements of TxM vectors.
Matrix types allow a compile-time slice access to their vectors, independent of the target storage layout. The storage layout can be either row-major or column-major, and can be chosen from GPU target-specific packing. By default it is column-major.
Matrix followed by . and an axis [a0|a3]
decides how the access is lowered, then the sliced vector can be accessed:
M.a0 is a slice along axis 0, so the default storage layout access, equivalent to M.M.a1 is a slice along axis 1, so transpose + access.
The vector (or submatrix) is then accessed via . and
[v0|3] syntax, e.g.
M.v0 a vector,
M.v03 a Tm2xN matrix,
M.v0123 a Tm4xN matrix (itself).
Specific storage layout can be enforced regardless of the GPU target by
@row_major or @column_major.
When targets are ambiguous on what matrix storage layout they prefer,
and the user does not specify one, the compiler defaults to row-major order,
as it is the only portable layout across targets.
Any matrix can be translated from TmMxN to TmNxM
(so between column-major and row-major orders) via @transpose.
[N]T array of N length known at compile-time; arr[i..N][N][M]T multidimensional array of lengths N and M (or more) known at compile-time; arr[i..N][j..M][N:X]T sentinel-terminated array of length N known at compile-time[_]T = {...} array of compile-time inferred length from an initializer list[_:X] = {...} sentinel-terminated array of compile-time inferred length from an initializer list
Defined by ^A T, where T is the pointee type and
A is the memory model applicable to the pointee.
Acceptable A words are:
ro read-only access.wo write-only access.rw read/write access.Define a pointer like follows, and we distinguish pointers to objects and pointers to arrays of objects.
^A T pointer to an object of type T; Dereference by ptr^.[^]A T pointer to an array or slice of type T; Dereference by ptr[i].[^N]A T pointer to an array/slice of N objects; Dereference by ptr[i..N].[^N:X]A T pointer to a sentinel-terminated array/slice of N objects; Dereference by ptr[i..(N+X)].In comparison to C, for example:
\ptr rw [^]rw u8 is the equivalent of unsigned char *ptr in C,\ptr [^]ro u8 is the equivalent of unsigned char const *const ptr in C.
A slice is a view defined by a pointer and a length.
The difference between an array and a slice is that the array's length
is part of the type and known at compile-time, whereas the slice's length is known at runtime.
Both can be accessed with the len field.
Pointer of the slice type can be accessed with the ptr field.
Another clear distinction is that a slice is a view, explicitly stated by its need
for an access capability (ro/rw/wo) under A:
[]A T slice (contains pointer of type [^]A T and a length); slice[start..end].[:X]A T sentinel-terminated slice.
A pair of brackets {} by itself declares a container.
Containers can be semantically specialized with different kinds of builtin dot @. qualifiers.
A container is a namespace that is used to access compile-time fields.
If it defines runtime storage, it may be used as a non-void data type as well.
The builtin function @self() may be used to refer to the definition
of the innermost container.
A plain (unqualified) container is a POD struct. Field order is preserved, and storage fields define the runtime layout. The container may contain other compile-time declarations (containers, functions, variables) and these are part of the container's namespace.
\Transform {
position : f32x3,
rotation : f32x4,
scale : f32x3,
\identity @self{
.position = .{ 0.0, 0.0, 0.0 },
.rotation = .{ 0.0, 0.0, 0.0, 1.0 },
.scale = .{ 1.0, 1.0, 1.0 },
}
}
Tuples, or anonymous containers, are declared with .{} syntax
and with compile-time known field ordering.
\point ro = .{ 10, 20 }
\color ro = .{ 1.0, 0.5, 0.2, 1.0 }
\x, \y ro s32 = point
Anonymous containers may contain heterogeneous values.
A packed container declared with @.packed has no alignment implications,
and does not conform to the C ABI.
This allows extremely precise control over storage fields,
placing them on a bit-by-bit basis.
The @.packed qualifier accepts an optional argument with a backing integer type.
\MovementState @.packed {
running : bool,
crouching : bool,
jumping : bool,
in_air : bool,
}
Inside packed containers, integers take their bit-width in space
(i.e. a u12 has an @bit_sizeof of 12).
Bools also take up 1 bit, so explicit bit flags can be implemented easily.
\PackedHeader @.(u32) packed {
version : u4,
flags : u12,
size : u16,
}
Packed containers may have target-ABI defined restrictions regarding atomic fields, references to sub-byte fields, target alignment guarantees and pointer aliasing behaviour.
A union container declared with @.union defines overlapping runtime storage.
All fields in a union share the same memory region.
\Value @.union {
i : s32,
u : u32,
f : f32,
}
The size of a union is equal to the size of its largest field. The alignment of a union is equal to the strictest alignment requirement among its fields.
Reading from a field different than the one most recently written is target-ABI defined. Unions do not automatically store runtime tag information.
A variant container declared with @.variant is effectively a tagged union.
They hold an additional value that can be exactly one of several variants,
each optionally carrying payload data.
\Shape @.variant {
circle : { c : f32x3, r : f32 },
rect : f32x4,
something : u32,
none,
}
The hidden tag indicates which variant is active.
The size of the variant is determined from the tag and size/alignment rules,
the same way a union does.
Direct field access is only valid if the variant is known,
for example via the match expression:
\x Shape.circle{ .c = .{0.0, 0.0, 0.0}, .r = 1.0 }
match x {
.circle => ...,
.rect => ...,
.none => ...,
else => ...,
}
Enums are tag-only unions without payloads, declared with @.enum:
\Color @.enum {
red,
green,
blue,
}
Enums are stored as an integer tag, exhaustiveness is always decidable in match patterns. Optionally enums can have an underlying base type:
\CompareOp @.(u8) enum {
Never,
Lesser,
LesserEqual,
Equal,
Greater,
GreaterEqual,
Always,
}
An opaque container declared with @.opaque defines a type with unknown storage layout.
Opaque containers are intended for ABI boundaries, external runtime handles,
incomplete native types and implementation hiding.
\Window @.opaque {}
The size and alignment of an opaque container are unknown at compile-time unless provided externally. Opaque containers cannot be instantiated directly, cannot expose runtime storage fields, and are typically used behind pointers.
\window_handle rw ^rw Window = null
A domain container is a declarative context of execution with an optional execution model argument:
\main @.("host") domain {
@import("std").dbg.print("hello world!\n", .{})
}
The execution model determines runtime environment, calling conventions, resource bindings, memory semantics, synchronization behaviour, and target lowering rules.
The parameter is a string name of the execution model. Such a string parameter is required for the domain to be recognized as a root node in the project build graph, making it a potential compilation unit.
Currently recognized parameters are:
"host" // native host application or library.
"kernel" // GPU kernel, e.g. OpenCL.
"compute" // compute shader stage.
"vertex" // vertex shader stage.
"pixel" // pixel/fragment shader stage.
"geometry" // geometry shader stage.
"tescontrol" // tesselation control shader stage.
"teseval" // tesselation evaluation shader stage.
"amplification" // amplification/task shader stage.
"mesh" // mesh shader stage.
"raygen" // ray generation shader stage.
"intersect" // ray intersection shader stage.
"anyhit" // ray any-hit shader stage.
"closesthit" // ray closest-hit shader stage.
"miss" // ray miss shader stage.
"test" // unit test program.
In the future, additional execution models could be defined by the user from within the build system.
Any domain is allowed to inherit the context of another domain, provided no circular dependencies occur. This includes all global variables, internal definitions in the domain, and especially the descriptor model defined within it. It's a convenient way of isolating global definitions behind a shared execution context that can be inherited by domains serving as program entry points:
\MyContext @.domain {
... TODO descriptor model
}
\vert @.("vertex", MyContext) domain {
...
}
\pix @.("pixel", MyContext) domain {
...
}
Conditional branching is expression-based and returns a value when used. Parentheses are not necessary to hold the condition.
if (cond1) {
expr1
} else if (cond2) {
expr2
} else {
expr_default
}
// equivalent to:
if cond1 expr1 else if cond2 expr2 else expr_default
Condition must evaluate to bool or @unreachable.
Values of number literals are interpreted as value != 0 in conditions.
The first matching branch is executed.
Else is optional, but required when if is used as an expression
and not all branches evaluate to unreachable.
Conditionals can be used to unwrap optional types:
\a rw ?u32 = null
if a \_ @unreachable
They also can be used to unwrap error unions.
else by itself can be used to catch an unwrapped error value and run a fallback branch.
Using if ... else with error unions makes so that the unwrapped value branch comes before the error fallback.
\MyError @.err { SomeErrorCode }
\may_fail (fail : bool) => s32!MyError {
if fail MyError.SomeErrorCode else 1
}
// `expr else fallback` by itself is equivalent to:
// `if expr \val val else \_ fallback`
\x ro s32 = may_fail(false) else @unreachable
if may_fail(true) \val @unreachable else \e drift => e
// `@.try <expr>` is sugar for `<expr> else \e drift => e`
@.try may_fail(true)
The drift keyword is the primary control-transfer construct.
It replaces traditional return, break and continue
by treating control flow as graph redirection.
A drift transfers the current computation result out of its enclosing scope
and resumes execution at a target boundary.
Its type is noreturn.
The basic form is drift <expression>.
This immediately exits the current invocation and produces the value of
<expression> as the output.
\fun (in ptr : ?[^]rw s32, in at : s32) => s32!MyError {
if ptr \val {
drift => ptr[at]
}
// Last expression of the scope is always the result.
MyError.NullPointer
// implicit: `drift out => MyError.NullPointer`
}
Combining drift with in or out directions
is equivalent to C-like break/continue statements:
drift in // continue
drift out // break
These fine-grained drift operations work on the current scope.
Scopes may be explicitly labeled to allow directed control transfer.
A labeled scope is written as :label { ... }, for example:
:loop {
if done {
drift in:loop => 0
}
drift out:loop => 1
}
Unscoped labels are also legal:
:do_add
val += counter
// later can be jumped into with:
drift :do_add
In this case, in and out have exact same behaviour,
and may even be omitted.
A particular feature is the ability to compute drift labels at runtime. Their legality is defined by target capabilities (e.g. limited on GPUs).
For example, this simple VM could use computed drift for indirect dispatch:
\interp (in code : [^]ro u8, in initval : s32) => s32 {
\dispatch [_]^ro void = .{
do_halt,
do_inc,
do_dec,
do_mul2,
do_div2,
do_add7,
do_neg,
}
\pc rw s32 = 0
\val rw s32 = initval
:do_halt
drift => val
:do_inc
val += 1
pc += 1
drift :dispatch[code[pc]]
:do_dec
val -= 1
pc += 1
drift :dispatch[code[pc]]
:do_mul2
val *= 2
pc += 1
drift :dispatch[code[pc]]
:do_div2
val /= 2
pc += 1
drift :dispatch[code[pc]]
:do_add7
val += 7
pc += 1
drift :dispatch[code[pc]]
:do_neg
val = -val
pc += 1
drift :dispatch[code[pc]]
}
A defer executes another expression unconditionally at scope exit:
\print @import("std").dbg.print
\defer_unwind () => void {
print("\n", .{})
defer { print("1 ", .{}) }
defer print("2 ", .{})
if false {
// defers don't run if they are never encountered
defer { print("3 ", .{}) }
}
} // prints: `2 1`.
Inside a defer expression, invoking a drift is not allowed:
\defer_invalid () => s32! {
defer {
drift => MyError.SomeErrorCode // illegal
}
drift => 0
}
Specializations are available depending on the capture operator.
For example, defer! \err ... can be used to only run the deferred expression
if the current scope result is an error code.
Likewise, defer? ... can be used to only run the deferred expression
if the current scope result is a null literal.
\assert @import("std").dbg.assert
\capture_error (in captured : ^wo ErrorName) => void!ErrorName {
defer! \err {
captured^ = err
}
ErrorName.Failure
}
\foo () => void! {
\captured rw ?ErrorName = null
if capture_error(&captured) @unreachable else \err {
assert(ErrorName.Failure == captured?)
else \captured_value drift => captured_value
assert(ErrorName.Failure == err)
else drift => ErrorName.Failure
@.try assert(captured? == err)
}
}
Match is an expression-based multi-branch control construct that selects
a branch based on pattern compatibility with a scrutinee value.
It replaces switch cases otherwise found in C.
Example shape (compile-time is allowed):
lov match x {
pattern1 => expr1,
pattern2 if cond => expr2,
else => expr_default,
}
The scrutinee is evaluated exactly once. Arms are evaluated top to bottom, and for each arm a pattern match is attempted.
If a pattern matches, its guard (if present) is evaluated and the arm is selected if the guard passes. Otherwise evaluation continues to the next arm.
If no arm matches, and an else arm exists, it will be used.
Otherwise an implicit @unreachable is invoked.
A pattern may include literals, identifiers (bindings), wildcards,
structured patterns (for tuples, enums, errors, etc.),
and possibly lov-evaluable patterns.
The entire match may be lov, allowing one or more branches to be evaluated at compile time.
The compiler verifies if all cases are covered for statically exhaustive scrutinees, such as enums or errors.
Match can be an expression, e.g. to assign a value.
If used as a statement, each arm must evaluate to void or noreturn.
\y match x {
0 => 10,
1 => 20,
z => 69,
else => 30,
}
Variables introduced in patterns are scoped to the arm, are immutable unless explicitly declared otherwise, and are only valid inside the arm expression or block.
For compound types:
match p {
{x, y} => ...
}
or:
match p {
Point {x, y} => ...
}
Semantics match fields structurally and bind subcomponents.
Guards refine patterns with conditions. A pattern must match first, then the guard is evaluated. Guards must be pure (no side effects assumed):
match x {
n if n > 0 => ...
n if n < 0 => ...
else => ...
}
Unlike C-style switch, match has no fallthrough,
each arm is isolated, and exactly one arm is executed.
Compile-time match must work on patterns that are themselves compile-time evaluable. Match can be used in lov contexts to generate types or literal values, or to eliminate branches from the build process.
Matches can be labeled, and drift instructions can be used for explicit control flow:
:sw match x {
5 => drift in:sw => 4,
2...4 => \val {
if val > 3 {
drift in:sw => 2
} else if val == 3 {
drift out:sw
}
drift in:sw => 1
},
1 => drift,
else => @unreachable,
}
This is especially useful for state machines, e.g. lexers. Semantically this is equivalent to the following loop:
\sw rw s32 = 5
:loop {
match sw {
5 => {
sw = 4
drift in:loop
},
2...4 => \val {
if val > 3 {
sw = 2
drift in:loop
} else if val == 3 {
drift out:loop
}
sw = 1
drift in:loop
},
1 => drift,
else => @unreachable,
}
}
A while loop is used to repeatedly execute an expression until some condition
is no longer true.
The syntax while cond \capture : defer_expr {} may be used to always
execute a tail expression after running the block expression,
and before checking the while condition again.
\i, j rw usize = .{0, 100}
while i < 10 and j > 50 : j -= 1 {
i += 1
}
Which is semantically equivalent to this explicit loop:
\i, j rw usize = .{0, 100}
:loop {
if i < 10 and j > 50 {
defer j -= 1
i += 1
drift in:loop
}
}
When a while loop is labeled, it can be referenced from a
drift statement within a nested loop:
:outer while true {
while true {
drift out:outer // break
}
}
\i rw usize = 0
:outer while i < 10 : i += 1 {
while true {
drift in:outer // continue
}
}
Just like if expressions, while loops can take
an optional or an error union as the condition and capture the payload:
\optional rw ?u32 = 10
while optional \val : val -= 1 {
// ... optional? == val
} else {
// ... null, optional? == @unreachable
}
\error_union u32!SomeErrorCode = 10
while error_union \val {
// ... valid number, no error
} else \err {
// ... error
}
A for loop can be used to repeatedly execute an expression by iterating over a slice or range of literal values:
for 0..10 \i
a[i] = b[i]
Just like while loops, a for loop can be labeled:
\count rw usize = 0
:outer for 1..9 \i {
for 1..9 \j {
count += i + j
drift out:outer // break
}
}
:outer for 1..9 \i {
for 1..9 \j {
count += i + j
drift in:outer // continue
}
}
A big importance is placed on whether an expression is known at compile-time or not. There are a few places where this is used, and this simple tool is used to keep the language small, readable and powerful.
Compile-time parameters are how generics are implemented:
\max (lov T : type, in a : T, in b : T) => T {
if a > b { a } else { b }
}
\bigger_float (in a : f32, in b : f32) => max(f32, a, b)
\bigger_uint64 (in a : u64, in b : u64) => max(u64, a, b)
Types can be assigned to variables, passed as parameters to functions,
and returned from them.
However, they can only be used in expressions which are known at compile-time,
which is why the parameter T in the above snippet must be marked with lov.
A lov parameter means that:
A lov expression may be used to guarantee that the expression
will be evaluated at compile-time.
If this cannot be accomplished, the compiler will emit an error:
\exit @.extern () => noreturn
\foo @.("test") domain {
lov {
exit() // illegal
}
}
It does not make sense that a program could call exit(),
or any other external function, at compile-time.
However, a lov expression does much more than that and may cause compile errors for additional reasons.
Within a lov expression:
lov variables.if, while, match, lambda expressions, and containers are evaluated at compile-time, or emit a compile error if this is not possible.drift expressions are invalid (unless the function itself is called at compile-time).This means the programmer can create a function which is called both at compile-time and run-time with no modification to the function required.
Variables can be labeled as lov.
This guarantees to the compiler that every load and store of the variable
is performed at compile-time.
Any violation of this results in a compile error.
This combined with the fact that we can @.inline loops or split/merge operations
allows us to write a function which is partially evaluated at compile-time
and partially at run-time, for example:
\test @import("std").test
\CmdFn {
name : []ro u8,
fn : ^ro (in : s32) => s32,
}
\cmd_fns ro [_]CmdFn = .{
CmdFn{ .name = "one", .fn = one },
CmdFn{ .name = "two", .fn = two },
CmdFn{ .name = "three", .fn = three },
}
\one (in val : s32) => s32 { val + 1 }
\two (in val : s32) => s32 { val + 2 }
\three (in val : s32) => s32 { val + 3 }
\performfn (
lov prefix_char : u8,
in start_value : s32,
) => s32 {
\pipeline lov rw = .(x) => x
@.inline for 0..cmd_fns.len \i
if cmd_fns[i].name[0] == prefix_char
pipeline = pipeline .> cmd_fns[i].fn
start_value .> pipeline
}
\@"perform fns" @.("test") domain {
@.try test.expect(performfn('t', 1) == 6)
@.try test.expect(performfn('o', 0) == 1)
@.try test.expect(performfn('w', 67) == 67)
}
When a local variable is immutable, then after initialization the variable's value must not change.
If the initialization value of the immutable variable is lov-known,
then the variable is also lov-known.
Keyword asm emits inline assembly, valid for all targets:
C, WASM, SPIR-V and native CPU, and should be specialized with compile-time branches.
Dissecting the syntax:
\syscall1 (in number : usize, in arg1 : usize) =>
@.volatile
asm (
\\syscall
:
[ret]
"={rax}"
(=> usize),
: [number] "{rax}" (number),
[arg1] "{rdi}" (arg1),
: .{ .rcx = true, .r11 = true })
Different qualifiers may be acceptable here.
One clear example is @.volatile.
This tells the compiler that the inline assembly expression has side effects.
Without @.volatile, Lovage is allowed to delete the inline assembly code
if the result is unused.
Inline assembly is an expression which returns a value,
and the asm keyword begins the expression.
The first argument is a lov string containing the assembly code.
Inside this string one may use placeholders such as
%[ret], %[number] or %[arg1]
where a register is expected, to specify the register that Lovage uses
for the argument or return value when register constraint strings are used.
A literal percent sign can be written as %%.
Multiline string syntax is often useful here.
The output section follows the first colon. It is allowed for there to be no outputs, in which case this colon would be directly followed by the input section.
The identifier inside square brackets specifies the symbolic name
used by the %[name] syntax within the assembly string.
In the example above, the output name is ret.
The output constraint string specifies how the output value is bound.
In the example, "={rax}" means that the result value
of the inline assembly expression is whatever is stored in the rax register.
After the constraint comes either a value binding or => followed by a type.
The type becomes the result type of the inline assembly expression.
If a value binding is used, then the symbolic output name may be referenced
inside the assembly string.
The input section follows. Input constraints specify which registers contain the values of the provided expressions when the assembly code executes. Any number of input parameters is allowed, including none.
The final section is the clobber list. Clobbers declare registers whose values are not preserved by the assembly code. These do not include input or output registers.
The special clobber value memory indicates that the assembly writes
to arbitrary undeclared memory locations, not only memory reachable through
declared indirect outputs.
In the example above, rcx and r11 are listed because
the kernel syscall ABI does not preserve those registers.