September 22nd, 2024

Compiling to Assembly from Scratch [free to read online]

"Compiling to Assembly from Scratch" by Vladimir Keleshev teaches compiler theory and assembly programming, focusing on ARM 32-bit instruction set, with resources available online and on GitHub.

Read original articleLink Icon
Compiling to Assembly from Scratch [free to read online]

The book "Compiling to Assembly from Scratch" by Vladimir Keleshev is designed for those interested in understanding how compilers and programming languages function. It guides readers through the process of creating a compiler from the ground up, utilizing a subset of TypeScript that resembles pseudocode and targets the ARM 32-bit instruction set. The content is divided into two main parts: the first covers the baseline compiler, including topics such as abstract syntax trees, parser combinators, and code generation, while the second part explores compiler extensions, including data types, garbage collection, and static type checking. The book is available in a hardcover edition of 207 pages and can also be read online for free. It includes illustrations by Katiuska Pino and provides source code on GitHub, along with Python and OCaml ports of the compiler. Keleshev, a software developer with experience in various fields, invites readers to reach out for questions or feedback.

- The book teaches compiler theory and assembly programming from scratch.

- It targets the ARM 32-bit instruction set using a TypeScript subset.

- The content is divided into baseline compiler concepts and extensions.

- Source code and additional resources are available on GitHub.

- The hardcover edition is available for purchase, while an online version can be read for free.

Link Icon 9 comments
By @hirvi74 - 7 months
God, the title reminds me of when I when I took an x86 assembly class in college about a decade ago. Only 6 of the dumbest souls in the CS program dared to take the class the semester. The professor for the class was an ex-NASA computer engineer. Our test used to be writing assembly by hand. We were graded for accuracy too. I swear, at that point in time, I could convert between Hex, Dec, Oct, and Binary almost without thinking. I made a D in class with 40+ hours a week of studying. However, I learned more in that one class than my entire degree.

Anyway, thank you, OP, for sharing this. I have been looking into picking up ARM as way to crawl out of burnout from my career. It's been a years since I even touched x86 with any seriousness. I will add this book to my list resources.

By @ggorlen - 7 months
As is often the case, the title is unfortunately overloaded. I initially read this as writing code in the Scratch programming language[1] that compiles to assembly.

[1]: https://en.wikipedia.org/wiki/Scratch_(programming_language)

By @Joker_vD - 7 months
Wait a bloody second. Why does GAS looks like this for ARM:

    push {ip, lr}

    ldr r0, =hello
    bl printf

    mov r0, #41
    add r0, r0, #1  // Increment

    pop {ip, lr}
    bx lr

    str r0, [r1]         /* M[r1] = r0; */
    ldr r0, [r1]         /* r0 = M[r1]; */
  
    str r0, [r1, #8]     /* M[r1 + 8] = r0; */
    ldr r0, [r1, #8]     /* r0 = M[r1 + 8]; */
  
    str r0, [r1, -r2]    /* M[r1 - r2] = r0; */
    ldr r0, [r1, -r2]    /* r0 = M[r1 - r2]; */
with the destination on the left-hand side, with no "%" before the register names, and with the square brackets for addresses (with relatively sane looking expressions inside those brackets) — basically the same syntax that ARM's own assembler uses, — while x86 gets some absolutely unhinged syntax that looks like it just fell out of the sky (or an abyss for that matter) since it has almost no relation to what's written in the Intel's docs? I always assumed that GAS used some "unified" style for all its targets and disregarded the conventions of the CPU manufacturers but apparently no, it only did that for x86?
By @stonethrowaway - 7 months
Speaking of embedded systems, I have an old board, an offshoot of Zilog Z80 called Rabbit. I think recently Dave from EEVBlog took apart one of his ancient projects and I was floored to see a Rabbit. Talk about a left hook. Assuming this is Hacker News, I suspect someone probably knows what I’m talking about. The language used (called “Dynamic C”) has some unconventional additions, a kind of coroutine mechanism in addition to chaining functions to be called one after another in groups. It’s mostly C otherwise, so I suspect some macro shennanigans with interrupt vector priority for managing this coroutine stuff.

Anyhow, so I’ve got a bunch of .bin files around for it, no C source code, just straight assembly output ready to be flashed. And the text and data segments are often interwoven, each fetch being resolved to data or instruction in real time by the rabbit processor. So I’ve been thinking of sitting down, going through the assembly/processor manual for the board and just writing a board simulator hoping to get it back to source code by blackbox reversing in a sense. I’d have to rummage through JEDEC for the standard used by the EEPROM to figure out what pins it’s using there and the edge triggering sequences. Once I can execute it and see what registers and GPIOs are written when, I can probably figure out the original code path. Not sure if anyone has tips or guides or suggestions here.

By @pjc50 - 7 months
Was slightly assuming from the title that this would be something like "you have brought up a new machine with no tooling whatsoever; how do you bootstrap up to an assembler and your first real language?" such as the PDP-11 front panel toggles. But this version is probably more useful.
By @pjmlp - 7 months
I got the ebook when it came out, and is relatively nice as ramp up into the world of compiler development.
By @evnix - 7 months
I really liked how short and concise these chapters are. What took me months of effort has been condensed to these few chapters and It's well worth the read.

Though Why use 32bit instead of 64, why add so much friction for a first time learner.

By @ok_dad - 7 months
This is a lot of fun to read, but when I got to the point where I need to run the assembler code in the HELLO WORLD example, I cannot get GAS to work due to being on MacOS on an M2 and not being smart enough to figure out what to install and what magic incantations to run. Does anyone else know? I tried using brew to install a few of the different packages I could find, but couldn't get anything to work properly.

In any case, I find it interesting to write a compiler and do want to continue with this book, so perhaps I'll just write the code on my MacBook and then transfer it to my Windows gaming PC to run it on WSL or something :)

By @neuroelectron - 7 months
Ctrl-F "bootstrap" zero hits. That seems strange.