Introduction to Formality (Part 1) – Victor Maia

Introduction to Formality (Part 1) – Victor Maia –

Formality is a new, massively parallel, minimal “proof”gramming language being developed at the Ethereum Foundation. It has an ambitious goal of redefining how programs, smart-contracts and even scientific papers are written, by combining several academic breakthroughs in a single tool that “just works”.

Right now, Formality still doesn’t have the core libraries required to be a practical, “easy” language. In a future, it’ll include all sorts of tools to make it as friendly as Python, and most developers won’t even know they’re using it. For now, the goal of this post is to introduce it in a more technical sense, explaining its core features and how they can be used to write programs and proofs. As such, this will assume familiarity with those concepts, as well as use Agda for comparisons. If you don’t have that kind of knowledge and just want to learn how to use Formality in practice, please wait while simplified versions are developed!

1. Why it exists?

Formality exists because it differs from popular programming and proof languages like Python, TypeScript and Coq in a few, important aspects:

It is efficient in a precise, technical sense

Is C faster than Python? While most would say “obviously”, others may argue this question doesn’t make sense. Certainly, C’s implementation is faster than Python’s implementation, but is Python inherently slower than C? Both languages are Turing-complete so they’re technically equivalent and should be equally fast given a sufficiently mature compiler, right?

Not exactly. Turing-completeness dictates languages can solve the same set of problems, not that they do so efficiently. Some languages have unavoidable features that do make them inherently slower: garbage collection is a remarkable example. Formality is optimal, in the sense it is able to emulate any (non-quantum) language in the same complexity category. Like C, it needs no garbage collection. Unlike it, it can be executed in massively parallel architectures with near-ideal speedups.

Our current implementation is nowhere near C, but it is evolving fast towards that point (and we’ll soon share exciting news regarding GPU compilation). This makes it perfect for smart-contracts, where we want a future-proof way to write cheap (gas-wise) programs, for rendering, for machine learning and pretty much anything that requires performance.

It is capable of checking mathematical theorems about itself

For most readers, that sounds like an un-interesting feature: “I’ve never proven a theorem about my code and nobody died!”. What they don’t see is that being able to state theorems goes much beyond proving that a + b = b + a, or that certain smart-contracts don’t have bugs. The real win is that you get a whole new tool to work with, a language of specifications, one on which we can state precisely, in a way that a computer can understand, what an algorithm is supposed to do.

This opens doors for use-cases that are only limited by your imagination. Smart-contracts that automatically outsource and check software development labor? Academic publishing with computer-based peer-reviewing? Tools to let AI write programs to us? Operating systems immune to any kind of virus and malware? Safely importing and running untrusted code in non-sandboxed environments? The list goes on.

It is really, really small

Its reference implementation, including parser, stringifier, evaluator and type checker has about 1,000 lines of code. This is because Formality is meant to fit entirely in a small specification document which people can easily implement independently. This is important for 1. portability, i.e., we want Formality to be a library in many languages, just like JSON, 2. trust, i.e., we do not want people to trust a single, monolithic, bug-prone implementation.

Of course, that means that, out of the box, it is very primitive. Features that you take for granted on other languages such as numbers, if-then-else and for-loops simply do not exist. After all, completely specifying just floating point operations would already take more than 1000 lines. For that reason, writing “raw Formality” is not easy. That is compensated by the fact it is extremely expressive and modular. All of those things can be recovered as libraries. With a proper set of libs, Formality should be no harder than popular languages. In fact, a complete Pythonish syntax is planned for a future!

2. Installing

At the moment of this writing, Formality is implemented in JavaScript. This choice was made because it is the most wide-spread, cross-platform language. Implementing it in other places should take no more than a few days of work. If you’re willing to do so, please contact me!

To install Formality, you must have npm. Once you do, just type:

npm i -g formality-lang

And that’s it. Create a intro.fm file with the following contents:

.main (hello world)

And test it with formality -n main (on the same directory). If it outputs:

(hello world)

Then you’re good to go. Optionally, you can also use it as a JS library or, if you don’t want to download anything, you can try our online editor here.

2. Untyped language overview

2a. Definitions, lambdas and applications

I’ll start by introducing Formality’s untyped primitives. Its main one is the lambda. It works almost like Agda functions. Here is its syntax:

pri | Formality      | JavaScript         | Agda
--- | -------------- | ------------------ | ----------------
def | .x val         | x = val            | x = val
lam | [x] body       | (x) => body        | λx → body
app | (f x)          | f(x)               | f x
box | | val          | [val]              | (box val)
dup | [x = val] body | (x = val[0]), body | let (box x) = body

Let’s test it. Save this program as intro.fm:

.id [x] x
.main (id 42)

Then run it with formality -n main. This command reads all .fm files on the current directory and loads the definitions inside them. The -n main bit means it must eva the main definition and output the result. It’ll print 42, which is the result of (id 42). Note that, here, 42 is not an actual number, but an undefined reference. We’ll give meaning to it later.

2b. Strong normalization, explicit duplications

There is one difference between Formality and Agda lambdas. Like Agda, Formality is strongly normalizing language, which means all its programs halt. This is desirable in proof languages because non-terminating programs often allow you to prove logical absurds, “breaking” them. In Agda, this is enforced by a “termination checker” which forbid, for example, non-structural recursion. In Formality, the restriction is much stronger: only affine functions are allowed! This means lambda-bound variables can only appear, at most, once. This, for example, isn’t allowed:

.copy [x] [t] (t x x)
.main (copy foo)

(Note that, since we’re not dealing with types yet, running this program with formality -n main actually works (and outputs [t] (t foo foo)), but you won’t be able to assign a type to it.) That restriction makes the language obviously terminating, but also very limited. To amend this problem, we introduce new primitives for explicit, “boxed” duplications, put and dup:

pri | Formality      | JavaScript         | Agda
--- | -------------- | ------------------ | ----------------
def | .x val         | x = val            | x = val
lam | [x] body       | (x) => body        | λx → body
app | (f x)          | f(x)               | f x
put | | val          | [val]              | (put val)
dup | [x = val] body | (x = val[0]), body | (λ val)

Here,put is equivalent to this Agda constructor:

data Box (A : Set) : Set where
put : A → Box

This primitive allows us to:

Put a value “inside a box” with a pipe (|value).
Copy a boxed value with [x = value] body, consuming the box.

With a few caveats:

Copy-bound variables must have 1 box between definition and occurrence.
Boxed values can’t reference variables bound by outer lambdas.

The cool thing about explicit, boxed duplications is that, with those simple restrictions, we’re able to avoid infinite loops, while, at the same time, keeping the language extremely expressive. Technically, it captures all the programs in O(tow(n,s)), where tow(n,s) = 2^2^2...s (n times). Later on, I’ll explain how to encode for-loops and arbitrary bounded recursion using those primitives. For now, let’s test what we have so far.

Save the following program as intro.fm:

.id [x] x
.main [val = id] (val val)

This program makes a copy of id and applies it to itself. The result should be just id, i.e., [x] x, but, if we run it with formality -n main, it outputs:

[val = [x] x] (val val)

As you can see, the copy doesn’t happen. That’s because id wasn’t boxed. We can fix it by putting it in a box with a pipe (|):

.main [val = | id] (val val)

And now this works as expected! But this is still not legal, though, because there isn’t exactly 1 box between where val is bound and where it is used. This can be easily fixed:

.id [x] x
.main [val = id] | (val val)

And that’s it. Those are all the native features of the untyped fragment of Formality. Lambdas, applications, boxes and duplications. This language is interesting for many reasons. It is terminating even without types. It is compatible with massively parallel runtimes. It can be evaluated without garbage collection. And so on. If you want to explore more of our untyped fragment, please check this repository, which also includes a proof sketch on the normalization argument. A very insightful exercise must be to follow a standard lambda calculus tutorial, except using that language instead.

3. Type system overview

Dependent type systems are the perfect example on how some things can be technically very simple, while at the same time being “difficult” for humans to understand. Formality’s type system is very small, similarly to the Calculus of Constructions, yet very powerful. If you’re familiar with Agda, though, it should be pretty easy to understand. Here is an overview:

3a. Forall

The first type to understand is the type of a function, B. It can be read as “a function that takes an input of type A and returns an output of type B”. It is almost equivalent to, for example, Haskell’s function type, A -> B, with one caveat: its return type may refer to its argument, x.

3b. Box

The second type is the type of a box, !A. This can be read as A is a boxed value of type A. Boxes are, essentially, pairs of one element, with a native operation to duplicate their contents. There isn’t much else to say about them.

3c. Type

The third type is the type of types, written as Type. As in Agda, in Formality, types are first-class values: you can use them inside your programs, return them from functions, make lists of them and so on. The type of a type is always Type, so, for example, the type of B is Type, and the type of !A is Type, and the type ofType is itself is also Type. In most proof languages, this would be problematic. Since Formality’s termination doesn’t rely on types, we can actually have that without introducing non-halting computations.

3d. Self

The last type is the self, written as $x A. This is the type of types that can refer to their own values. It also requires two new terms to introduce and use it, @typ val and ~val respectively. Self types are essential to construct inductive datatypes, in a way that will elaborated later on.

3e. Erased

Formality also includes “erased lambdas”, which are literally erased after type-checking. They’re useful for making polymorphic functions without extra runtime costs. Also, due to Formality’s linearity, they are important for our recursive datatype encodings, since erased functions aren’t affected by the restrictions imposed by the explicit duplication system. They’re respectively written as [-x] body, (f -x) and B.

3f. Annotations

Finally, there is also a syntax for explicit type annotations, : A = t, which means that the term t has type A. Nothing unusual here.

Published at Mon, 22 Apr 2019 22:07:58 +0000