July 12th, 2024

Mazeppa: A modern supercompiler for call-by-value functional languages

The Mazeppa supercompiler is explored, covering supercompilation, installation, examples, algorithm synthesis, metasystem transition, technical choices, release protocols, and FAQs. It references relevant papers and supercompilation concepts.

Read original article

Mazeppa: A modern supercompiler for call-by-value functional languages

The provided content discusses the Mazeppa supercompiler, delving into topics like supercompilation, installation guidelines, practical examples, algorithm synthesis, metasystem transition, technical choices, release protocols, and common queries regarding Mazeppa and supercompilation. It also references pertinent papers and concepts associated with supercompilation.

Mix-testing: revealing a new class of compiler bugs

A new "mix testing" approach uncovers compiler bugs by compiling test fragments with different compilers. Examples show issues in x86 and Arm architectures, emphasizing the importance of maintaining instruction ordering. Luke Geeson developed a tool to explore compiler combinations, identifying bugs and highlighting the need for clearer guidelines.

Integrated assembler improvements in LLVM 19

LLVM 19 brings significant enhancements to the integrated assembler, focusing on the MC library for assembly, disassembly, and object file formats. Performance improvements include optimized fragment sizes, streamlined symbol handling, and simplified expression evaluation. These changes aim to boost performance, reduce memory usage, and lay the groundwork for future enhancements.

Introduction to Program Synthesis

Program synthesis automates software creation by generating programs from requirements. It leverages Large Language Models (LLMs) like co-pilot and AlphaCode, alongside search-based techniques, for tasks like data manipulation and legacy application modernization.

Refined Input, Degraded Output: The Counterintuitive World of Compiler Behavior

The study delves into compiler behavior when given extra information for program optimization. Surprisingly, more data can sometimes lead to poorer optimization due to intricate compiler interactions. Testing identified 59 cases in popular compilers, emphasizing the need for better understanding.

Beating the Compiler

The blog post discusses optimizing interpreters in assembly to outperform compilers. By enhancing the Uxn CPU interpreter, a 10-20% speedup was achieved through efficient assembly implementations and techniques inspired by LuaJIT.

7 comments

By @batterseapower - 7 months

My PhD was on supercompilation for Haskell (thanks for the cite in the repo :-))

Cool to see that people are still working on it, but I think that the main barrier to practical use of these techniques still remains unsolved. The problem is that it's just so easy for the supercomplier to go into some crazy exponential blowup of function unfolding, making the compilation step take impractically long.

Even if you avoid a literal exponential blowup you can easily end up generating tons of specializations that bloat your code cache but don't reveal any useful optimization opportunities/are infrequently used in practice. Similar performance problems also plague related techniques like trace-based JITs, even though the trace JIT happens at runtime and thus has access to strictly more information about the frequency with which a trace might be used.

You can try to use annotations like the @extract proposed in the article to control these problems, but it can be hard to predict in advance when this is going to occur.

One interesting research direction might be to use deep reinforcement learning to try to guide the generalization/termination decisions, where the reward is based on A) whether the unfolding leads to a tie-back later on, and B) to what extent the unfolding allowed deforestation/beta reduction to take place.

By @parhamn - 7 months

That was such a good Readme. Wonderful explanations for those unfamiliar with the space, full examples, a bit of history/context. Thanks!

By @082349872349872 - 7 months

Is the manual annotation is because deciding whether or not a graph will blow up upon supercompilation by trying, but checking for blowup during the process, doesn't work?

> In Mazeppa, we have call-by-value functions and call-by-name (call-by-need) constructors, which 1) makes deforestation possible and 2) preserves the original semantics of code.

Are there any other notable differences between this approach and call-by-value constructors? (I guess one might have to carry around the original context in closures for a little while, but presumably even that disappears in residual programs?)

[Stepan Razin lost his black hat during his ride, but the supercompiler has already removed Ivan Mazepa's?]

By @faizshah - 7 months

I wonder how the debugging experience is, it must be pretty difficult to generate correct source maps.

By @voldacar - 7 months

Cool. How necessary is the type system? Could you make something similar for lispy languages?

By @keiferski - 7 months

The likely origin of the name, if you’re curious:

https://en.wikipedia.org/wiki/Mazeppa_(poem)

Mazeppa: A modern supercompiler for call-by-value functional languages

Related

Mix-testing: revealing a new class of compiler bugs

Integrated assembler improvements in LLVM 19

Introduction to Program Synthesis

Refined Input, Degraded Output: The Counterintuitive World of Compiler Behavior

Beating the Compiler

Related

Mix-testing: revealing a new class of compiler bugs

Integrated assembler improvements in LLVM 19

Introduction to Program Synthesis

Refined Input, Degraded Output: The Counterintuitive World of Compiler Behavior

Beating the Compiler