August 12th, 2024

How does it feel to test a compiler?

Alexander Zakharenko discusses the unique challenges of compiler testing, emphasizing the importance of automated and exploratory tests, collaboration with developers, and the intricacies of Kotlin/Native's compilation process.

Read original articleLink Icon
CuriositySkepticismAppreciation
How does it feel to test a compiler?

Alexander Zakharenko, a QA engineer on the Kotlin/Native team, shares insights into the unique experience of testing a compiler. With a background in software engineering and extensive experience in backend automation testing, Zakharenko transitioned to compiler testing after joining JetBrains. He explains that a compiler translates programming languages into machine code and consists of a frontend for analysis and a backend for code generation. Kotlin/Native allows Kotlin code to compile into native binaries, suitable for platforms without a virtual machine. Unlike typical software testing, compiler testing lacks a graphical or network interface, focusing instead on various language constructs, linking libraries, and compilation parameters. Zakharenko emphasizes the importance of automated tests, including unit, integration, and performance tests, alongside exploratory testing for complex features. He describes his workflow, which involves reviewing tasks, conducting exploratory tests, and collaborating with developers. Zakharenko provides examples of tasks he has tackled, such as implementing annotations to hide symbols in Objective-C and testing compiler features. He also highlights the challenges of ensuring compatibility with different operating systems and build systems. Overall, Zakharenko's article illustrates the intricate and rewarding nature of compiler testing, showcasing the blend of technical skills and problem-solving required in this niche field.

- Compiler testing is distinct from typical software testing due to the absence of graphical and network interfaces.

- Kotlin/Native compiles Kotlin code into native binaries, making it suitable for platforms without virtual machines.

- Automated tests play a crucial role in compiler testing, supplemented by exploratory testing for complex features.

- The testing process involves collaboration with developers and a thorough understanding of the compilation process.

- Zakharenko's experience highlights the technical challenges and rewards of working in compiler testing.

Related

Mix-testing: revealing a new class of compiler bugs

Mix-testing: revealing a new class of compiler bugs

A new "mix testing" approach uncovers compiler bugs by compiling test fragments with different compilers. Examples show issues in x86 and Arm architectures, emphasizing the importance of maintaining instruction ordering. Luke Geeson developed a tool to explore compiler combinations, identifying bugs and highlighting the need for clearer guidelines.

Boosting Compiler Testing by Injecting Real-World Code

Boosting Compiler Testing by Injecting Real-World Code

The research introduces a method to enhance compiler testing by using real-world code snippets to create diverse test programs. The approach, implemented in the Creal tool, identified and reported 132 bugs in GCC and LLVM, contributing to compiler testing practices.

Driving Compilers

Driving Compilers

The article outlines the author's journey learning C and C++, focusing on the compilation process often overlooked in programming literature. It introduces a series to clarify executable creation in a Linux environment.

How to Compile Your Language – Guide to implement a modern compiler for language

How to Compile Your Language – Guide to implement a modern compiler for language

This guide introduces programming language design and modern compiler implementation, emphasizing language purpose, syntax familiarity, and compiler components, while focusing on frontend development using LLVM, with source code available on GitHub.

Clang vs. Clang

Clang vs. Clang

The blog post critiques compiler optimizations in Clang, arguing they often introduce bugs and security vulnerabilities, diminish performance gains, and create timing channels, urging a reevaluation of current practices.

AI: What people are saying
The comments reflect a diverse range of opinions and experiences related to compiler testing, emphasizing both the challenges and strategies involved.
  • Many commenters agree that compiler testing can be complex due to the Oracle Problem, where verifying the correctness of output is non-trivial.
  • Several users highlight the importance of automated testing methods, such as differential testing and fuzzing, to uncover bugs effectively.
  • There is a consensus on the value of collaboration between developers and testers, with some praising Jetbrains for treating their testing team as equals.
  • Some commenters share personal experiences with compiler projects, noting the challenges they faced and the lessons learned.
  • Criticism is directed at certain design choices in languages like Kotlin, particularly regarding error handling and testing practices.
Link Icon 15 comments
By @alexvitkov - 6 months
Compilers are one of the easiest and most fun pieces of software to test, because you can test very specific behavior without touching the internals at all.

E.g. if I want to test that `*` has higher precedence than `+`, I would write something like this:

    assert_ast_equals(parse_expr("(a*b)+c"), parse_expr("a*b+c"))
    assert_ast_equals(parse_expr("a+(b*c)"), parse_expr("a+b*c"))
You can rewrite the whole compiler if you want, but as long as you have some notion of a "parser", an "AST" and "two AST nodes being the same" this test will keep working.

This is much more powerful than going into the parser internals and comparing get_operator_precedence('+') with get_operator_precedence('*') which is the default thing you would do if you're told to test every function after writing it.

By @JonChesterfield - 6 months
Jetbrains have a really solid testing team.

Lots of places have a two tier system where the real developers write the code and those who don't make the cut test the code, with pay delta and an aspiration of being promoted out of testing.

Other places have a mandatory stint in testing for new developments as a way to get some headcount on the task.

Jetbrains don't do that. Or at least they didn't sometime before covid when I met a bunch of their test devs at a conference. The developers mostly doing testing were equals to those mostly doing product work. Possibly with a more extreme bias towards case analysis.

I don't think it's a coincidence that jetbrains treat their test team as peers to the others and that their software seems to mostly not fall over in the field.

By @tgma - 6 months
The article touches on basic testing strategies that apply to general software as well as compilers. I have been involved with compiler testing projects on production LLVM and GCC in my past life. One thing that makes compiler testing specifically more difficult than general software is the Oracle Problem: how do you verify the output is in fact correct? Crashes are relatively easy to find by random fuzzing, but in the general case, proving that the output program is not miscompiled is non-trivial.

There are a couple effective techniques in the literature that might be useful here:

- Differential testing[1]: generate a bunch of random, correct, deterministic programs; run them under different compilers or under different compilation flags and check if the output of the program is identical

- Equivalence Modulo Inputs[2]: a class of techniques that can be used transform a program to a distinct program that is supposed to be equivalent to the original for a specific input. (shameless plug)

[1]: https://users.cs.utah.edu/~regehr/papers/pldi11-preprint.pdf

[2]: https://web.cs.ucdavis.edu/~su/publications/emi.pdf

By @nj5rq - 6 months
I am making a simple Lisp interpreter, and this is my whole testing stage:

    $ valgrind --leak-check=full --track-origins=yes ./lisp < test/test.lisp
    $ cat test/expected.txt
What can I say, works for me.
By @gumby - 6 months
From my years on GCC I can remember that 90% of regular users' bug reports were user error, not a compiler bug.

But as with any code, compilers have bugs too and sometimes they can be quite surprising.

By @lucidguppy - 6 months
Can you imagine being one of the five people in the universe that has confident knowledge of c++ undefined behavior?
By @wslh - 6 months
We recently posted about fuzzying a compiler with success [1]. The article contains the details. There is an error on the Zest link that should point to [2]. The key is how to craft the generator.

[1] https://www.coinfabrik.com/blog/why-the-fuzz-about-fuzzing-c...

[2] https://dl.acm.org/doi/10.1145/3293882.3330576

By @rbanffy - 6 months
It seems difficult to test. Being difficult to test is usually a sign of inconvenient APIs and modules. For instance, the hiding of elements happens in multiple levels - the front-end needs a test to make sure the right intermediate representation is being generated. The backend, in turn, needs to be tested to make sure the intermediate representation with a hidden element generates the proper linker information and the linker itself needs a test to make sure the correctly annotated object code results in the correct artefact lacking external information about the hidden symbol. Testing end-to-end seems very laborious and error prone. Root-cause-analysis should have pinpointed the place the issue originates and one or more possible paths where it propagates downstream to inform future tests.

It also disturbs me that the author mentioned the order at which sources are compiled to matter in the final result. It should never matter.

When we build software we should always make it in such way it’s trivial to write tests for it. If writing a test is easy, it indicates using the tool you wrote is also easy.

By @Neywiny - 6 months
I did this for a class on compilers and interpreters where we wrote our own, each week expanding on functionality but keeping the 2 in feature parity. I wrote Python to auto generate test cases. I vaguely recall that the test was if the two operated the same (given the simplicity of the interpreter I viewed it as a "gold standard"). See [1] for his it worked. Among the bugs in my code I found 2 things through that exercise:

1. Functional programmers often write slow code. It turned out that my compiler was spending most of its time in my professor's code that while I'm sure was very mathematically pure, was a large consumer of immutable, short lifetime objects. Meaning under the hood mallocs. I should've valgrinded it but I'm certain it would've overflowed the counters (jokes)

2. If a comment spanned multiple lines in the resulting assembly, I could escape the comment and operate outside the bounds the professor setup, letting me use more assembly directives to solve the problems way easier. Ultimately we worked to fix that because usually it just means the student will try to compile part of a comment as assembly and that can be very confusing for the less assembler-error inclined. I used it for having a constant before a variable for type tagging. A 1 line solution. I believe the class's preferred way was putting the tag in a register and yadda yadda something that took a lot more finagling and effort. I did that maybe once before using my knowledge of the comment escape to do the arbitrary code injection.

[1] interpreted was written in the language we were interpreting, so as long as there were no typos or logic errors, the functionally was perfect vs running the code in the programming language. The compiler would return back a series of objects that wrapped assembly. For example, Add(R2, R2, R3). Usually pretty transparent. The framework we were given would then write out the .s file, I believe it would call some gcc or other thing, and we'd run the binary to make sure it worked.

By @Haugsevje - 6 months
Somehow I came to think of 'Like a rolling stone'..Dylan I believe.
By @fidotron - 6 months
Contrary to the Jetbrains praise in here, I despise Kotlin and the evidence is the tailrec keyword.

“tailrec” is what you mention in function definitions to indicate that function is tail recursive, but it only actually worked when you called the same function itself in the return, and not any other tailrec function. The part of this which was idiocy was this would only manifest as a StackOverflowException at run time. (I found this as my language evaluation involved implementing a state machine idiomatically). If you are going to make tailrec only work as a while loop then have the compiler alert the programmer at build time, but this got all through their design and QA. Not exactly a well thought out process.

They have probably fixed this now, but instead I went off to golang, where the features are few but when they exist they are done properly.

By @danmur - 6 months
Probably better than paying money to test an IDE :P
By @pfdietz - 6 months
I was disappointed they apparently don't do high volume differential testing with randomly generated programs, a kind of property based testing. Each individual program has very little testing value, but when you can crank out millions or even billions of them they can find all sorts of bugs.
By @ultracakebakery - 6 months
nahh I don't buy it. You don't have friends OFTEN asking you how it feels to test a compiler bro stop the cap