Mojo’s ability to interface directly with the Multi-Level Intermediate Representation opens up a whole new programming paradigm with custom bitwidth integers.

Mojo is a low-level, system programming language specifically focusing on Artificial Intelligence applications. It was invented by Chris Lattner, the creator of LLVM and the Swift programming language.

MLIR - Multi-Level Intermediate Representation

One of the most interesting aspects of Mojo is its direct interface with the Multi-Level Intermediate Representation (MLIR). Both LLVM-Intermediate Representation (LLVM-IR) and MLIR are low-level intermediate representations that kind of read like assembly but are way more human-readable.

MLIR is further sub-categorized into many different ‘dialects’, e.g. arith dialect for compiler operations related to arithmetic. Mojo’s Int type is a wrapper around the index dialect.

One very interesting application of this is that it enables developers to write a type that represents unsigned (it can be signed as well) integers of a custom bitwidth, e.g. u7, representing an unsigned integer of exactly 7 bits.

Why custom bitwidth integers?

In performance-critical applications and especially those using FPGA, one might be interested in having custom bitwidth integers to avoid wasting bits as logic gates are an exceptionally valuable resource in these. 1

Implementation in Mojo

I’ll show an example of an implementation of U24 in Mojo.

Why U24?

An integer that is exactly 24 bits is not within the standard library.

The Mojo’s standard library’s sizeof function returns the size of the type in bytes, hence, choosing 24 bits is to show that indeed the U24 takes up exactly 24 bits = 3 bytes, else, a 7 bits integer will have sizeof function returning 1 byte as its result.

Constructor

@register_passable("trivial")
struct U24[N: Int]:
    var value: __mlir_type.ui24

    alias MAX: Int = 2**24 - 1
    alias ZERO: Int = 0

    alias greater_than_16777215_err_msg: StringLiteral = "Unable to parameterize a value of type `U24` with an integer value greater than 16777215"
    alias lesser_than_0_err_msg: StringLiteral = "Unable to parameterize a value of type `U24` with a negative integer value"

    @always_inline("nodebug")
    fn __init__() -> Self:
        constrained[N <= Self.MAX, Self.greater_than_16777215_err_msg]()
        constrained[N >= Self.ZERO, Self.lesser_than_0_err_msg]()
        return Self {
            value: __mlir_op.`index.castu`[_type = __mlir_type.ui24](N.__mlir_index__())
        }
@register_passable("trivial")
struct U24[N: Int]:
    var value: __mlir_type.ui24

The decorator @register_passable(“trivial”) makes sure that the value of the type can be easily passed into the machine’s registers. This is an optimization that not even Rust does. 2

The type U24 is defined as a generic struct, with a generic type parameter that is bounded to the in-built Int type (Mojo does not have traits yet, which it will soon).

The var value: __mlir_type.ui24 declares a field belonging to the U24 struct with a name value. The magical part is that a developer can declare the type of value to be __mlir_type.ui24, representing a 24-bit unsigned integer (‘ui’) in the MLIR’s Builtin dialect 3.

alias MAX: Int = 2**24 - 1
alias ZERO: Int = 0

alias greater_than_16777215_err_msg: StringLiteral = "Unable to parameterize a value of type `U24` with an integer value greater than 16777215"

alias lesser_than_0_err_msg: StringLiteral = "Unable to parameterize a value of type `U24` with a negative integer value"

The next few lines declare compile time constants using the keyword alias. alias is similar to C++’s constexpr and consteval, TypeScript’s infer and Rust’s const keywords. It instructs the Mojo compiler to compute the value of the expression at compile time.

@always_inline("nodebug")
fn __init__() -> Self:
    constrained[N <= Self.MAX, Self.greater_than_16777215_err_msg]()
    constrained[N >= Self.ZERO, Self.lesser_than_0_err_msg]()
    return Self {
        value: __mlir_op.`index.castu`[_type = __mlir_type.ui24](N.__mlir_index__())
    }

The constructor fn __init__() → Self: … is fairly easy to understand. First, we always inline the function using the decorator @always_inline, then within the function body, we use the function constrained to check for the bounds. Mojo has two ‘spaces’, the ‘parameter’ space (which operates during compile time and the ‘value’ space (which operates during run time). Mojo functions can do computations in both spaces, the [] accepts arguments for the ‘parameter’ space, and () accepts arguments for the ‘value’ space 4. Here, we check if the type argument N passed in is larger than 2 ** 24 - 1 i.e. the largest integer value that an unsigned 24-bit integer can represent, if it is indeed larger, we raise a COMPILE TIME error.

This is pretty cool, that we can seemingly be doing compile-time metaprogramming with extremely simple and Pythonic syntax and with great functionality like creating our own custom bitwidth integers.

We might then use the struct U24 like so:

print("Size of `sizeof[U24]`: ", sizeof[U24[0]]()) ## prints '3'
alias c = U24[600]()
print("c = ", c.__into_int__())  ## Should print "c = 600"

If you attempt to construct a value that is larger than what U24 can represent, a compile-time error is raised, like so:

## Compile time error: `constraint failed: Unable to parameterize a value of type `U24` with an integer value greater than 16777215`
## alias e = U24[U24[0].MAX + 1]()

#Adding U24s What’s the point of having numbers if we cannot add them?

@always_inline("nodebug")
fn __add__[M: Int](self: Self, other: U24[M]) -> U24[Self.N + M]:
    return U24[Self.N + M]()

Above is a code snippet showing the implementation of the dunder (short for double underscore) __add__ method. You can think of dunder methods as interface methods to change the runtime behaviour of types, e.g. by implementing __add__, we are telling the Mojo compiler, how to add two U24s.

For this implementation, it is pretty straightforward.

First, the function is generic over the type parameter M, which itself is bounded to the Int type. The function value parameters are self-explanatory. The return type U24[Self.N + M] is what is interesting. We can directly add generic type parameters at compile time. This feature known as compile-time evaluation of const generics is still not present in Rust.

Second, the returned value of the function is simply the concrete type, i.e. U24[Self.N + M], initialized.

Conclusion

Mojo is a language that aims to be almost everything. It is simple enough to understand compile-time metaprogramming, it is a compiled language yet is runnable like Python with a REPL, it also aims to make it memory-safe with future lifetime annotations and allows for both dynamically typed and statically typed codes to run in the same program.

Yet, I am totally mind-blown by Mojo’s ability to allow average developers like myself to make custom bitwidth integer types so easily. To do the same in languages like Rust 5 and C++ is virtually impossible and to do this in newer languages like Zig takes way more than what it took in Mojo 6.

I am definitely looking forward to more developments in Mojo and will keep a very close eye on.

Github: https://github.com/jymchng/varintjo