Bril: An Intermediate Language for Teaching Compilers
Adrian Sampson developed Bril, an educational intermediate language for teaching compilers, emphasizing simplicity and extensibility, while prioritizing learning over performance metrics, despite some design challenges.
Read original articleAdrian Sampson from Cornell University developed Bril, an intermediate language designed specifically for teaching compilers. The language aims to provide a hands-on learning experience, allowing students to implement algorithms without the complexity of industrial compiler APIs. Bril is characterized by its simplicity, regular syntax, and ease of use, making it accessible for students with varying levels of familiarity with compilers. It is structured as a JSON document, which facilitates quick integration with various programming languages and tools, promoting collaboration among students.
Bril prioritizes educational goals over traditional compiler performance metrics, such as code size and execution speed. It features a unique design that includes a core opcode for printing, which simplifies example creation, and employs a more A-normal form than typical intermediate languages. This design choice, while verbose, aids in teaching by ensuring consistency in handling constants and variables.
The language is extensible, allowing students to contribute tools and features, fostering an open-source environment. However, the lack of systematic management for these extensions has led to challenges. Bril also includes a non-SSA form, which helps students understand the implications of mutable variables before transitioning to SSA concepts. Despite some design flaws, particularly regarding SSA implementation, Bril has evolved through student contributions, enhancing its functionality and educational value.
Related
'Writing a C Compiler' is a book
The "Writing a C Compiler" series transitions into a book by No Starch Press, focusing on building a C compiler with practical guidance, pseudocode examples, and essential compiler concepts for a broad audience.
Five little languages and how they grew: Dennis Ritchie's talk at HOPL on the
Dennis M. Ritchie's 1993 HOPL conference transcript compares C with languages like Bliss, Pascal, Algol 68, and BCPL. He discusses similarities, unique features, and design challenges, offering insights into historical context and language development.
Writing a C Compiler
Nora Sandler's upcoming book, "Writing a C Compiler," releases in July 2024, guiding beginners through compiler construction with practical techniques, pseudocode algorithms, and additional resources for enhanced learning.
Bril: An Intermediate Language for Teaching Compilers
Adrian Sampson developed Bril, an educational intermediate language for teaching compilers, emphasizing simplicity and extensibility, allowing students to learn algorithms without complex industrial APIs.
Driving Compilers
The article outlines the author's journey learning C and C++, focusing on the compilation process often overlooked in programming literature. It introduces a series to clarify executable creation in a Linux environment.
I've ranted extensively [1] about how badly interpreters (among adjacent topics) are often taught, because so often decisions are just abruptly made without explanation of the reasoning (or, often, even that a decision is being made!).
One thing I just added was that there are (at least) 6 "arities" (which really should not be called that) of bytecode/IR:
* stack output, stack input, stack input - though often a few instructions do take arguments (push, jump)
* accumulator output/input, 1 explicit input (usually a local variable or constant)
* belt output, belt input, 1 explicit input. The first argument is always the output of the previous instruction; the explicit argument (if not a constant) specifies the output of some previous instruction. Backwards jumps will require storing many variables to a prior part of the belt, but this is simple and useful for mini expression evaluators.
* explicit output/input, explicit input - common for machine code
* explicit output, explicit input, explicit input - common for machine code
* implicit output, explicit input, explicit input - an easy way to save space for SSA (though you can also use it with mutability), though there's weirdness for phis.
[1]: https://gist.github.com/o11c/6b08643335388bbab0228db763f9921...
The original ANF is actually looser than this in that it permits anonymous functions as arguments. In practice, there is no canonical variant of ANF that people really refer to, but most people usually mean a variant of ANF that doesn't permit this (which, to my knowledge, was first published in David Tarditi's PhD thesis). See this table from Appel's "Modern Compiler Implementation in ML" for the comparisons made in the functional IR space: https://i.imgur.com/17nfGMI.png.
Usually what people in the functional compiler space mean when they mention ANF is some variant of ANF (with Tarditi's restriction) that retains nested functions in a tree-like structure. The tree structure is important because it practically necessitates the extension of "join point"s within the tree structure (to act as local, second-class, continuations: to avoid duplicating evaluation contexts for multi-way branching constructs, without using functions for this purpose). It just so happens that you hoist ANF into a bunch of function bodies (which were once nested) and blocks (which were once join points), you can easily construct a control flow graph. However, you could also say that lambda calculus would be "in SSA" throughout all of this (as it is originally, then normalised into ANF, and then hoisted into a control flow graph) - it just isn't usually what people mean when they talk about an SSA IR (they tend to imply the existence of specific provisions for the splitting of live ranges at join points, be it a phi-like pseudo-instruction or block arguments).
All this is to say that ANF is very tied to literature about the compilation of functional languages and its related analysis and comparison with CPS (such as whether it's closed under beta-reduction, for example), such that I think we need to be a bit more precise about the expected shape and properties of IRs to differentiate them, rather than just expecting compiler engineers to know what you're talking about - and, indeed, agree with your description - when you describe something as "an ANF language".
The linked MLIR documentation, in turn, credits Swift for that idea, but the earliest occurrence of phis as continuation arguments I know is in MLton. It’d be interesting to know where this idea comes from initially, because standard phis really are incredibly awkward.
I also like the decision to not have the IR always in SSA form. I think it helps you understand SSA form better to have to construct it, and you're going to have to eliminate SSA form anyway for any of the classic techniques for generating machine code.
Now the main choices seem to be CPS (which is seeing a bit of a resurgence!) and SSA.
So why teach ANF?
Education often has needs diametrically opposed to production. For example, the early Pascal compilers seem to have been optimised for throughput, not object codegen: after all, production code is compiled rarely and run often, but student code is compiled often and run almost never.
Frankly, I don’t see much point in a “special IR for teaching” because llvm text format is just really straightforward, and at the same time teaches the “real deal” (sure, normally you’d likely use bindings, but still).
You can still have your students reimplement optimizations that llvm would normally do by themselves (like inductive variable elimination, const propagation, dataflow analysis, using phi instead of alloc etc.).
At least I was really happy I could play with llvm for a university project.
Related
'Writing a C Compiler' is a book
The "Writing a C Compiler" series transitions into a book by No Starch Press, focusing on building a C compiler with practical guidance, pseudocode examples, and essential compiler concepts for a broad audience.
Five little languages and how they grew: Dennis Ritchie's talk at HOPL on the
Dennis M. Ritchie's 1993 HOPL conference transcript compares C with languages like Bliss, Pascal, Algol 68, and BCPL. He discusses similarities, unique features, and design challenges, offering insights into historical context and language development.
Writing a C Compiler
Nora Sandler's upcoming book, "Writing a C Compiler," releases in July 2024, guiding beginners through compiler construction with practical techniques, pseudocode algorithms, and additional resources for enhanced learning.
Bril: An Intermediate Language for Teaching Compilers
Adrian Sampson developed Bril, an educational intermediate language for teaching compilers, emphasizing simplicity and extensibility, allowing students to learn algorithms without complex industrial APIs.
Driving Compilers
The article outlines the author's journey learning C and C++, focusing on the compilation process often overlooked in programming literature. It introduces a series to clarify executable creation in a Linux environment.