September 1st, 2024

The Design and Implementation of the CPython Virtual Machine

The article examines CPython's virtual machine design, focusing on its stack-based architecture, bytecode instruction format, and execution complexities, encouraging compiler engineers to learn from its implementation details.

Read original article

The article "The Design & Implementation of the CPython Virtual Machine" by Abhinav Upadhyay provides an in-depth exploration of CPython's bytecode instruction format and the internals of its execution engine. It discusses the significance of virtual machines (VMs) in programming languages, particularly how they facilitate portability and simplify the compilation process. The article contrasts stack-based and register-based VMs, noting that CPython employs a stack-based architecture. It details the structure of bytecode instructions, including how they are packed and executed within the CPython VM. The author explains the bytecode evaluation loop, which interprets and executes the bytecode, and provides examples of how Python functions are compiled into bytecode. The article also highlights the complexity of real-world VM implementations, which involve additional components like stack frames and execution contexts. The author encourages readers, especially compiler engineers, to learn from the implementation details of CPython to enhance their understanding and performance in other programming contexts. The article is extensive, offering a PDF version for paid subscribers that includes additional insights and a bonus appendix.

- The article explores the design and implementation of the CPython virtual machine.

- It explains the differences between stack-based and register-based virtual machines.

- CPython uses a stack-based architecture for its bytecode execution.

- The bytecode instructions are packed into a two-byte format for execution.

- The article provides insights into the complexities of real-world virtual machine implementations.

Weird things I learned while writing an x86 emulator

The article explores writing an x86 and amd64 emulator for Time Travel Debugging, emphasizing x86 encoding, prefixes, flag behaviors, shift instructions, segment overrides, FS and GS segments, TEB structures, CPU configuration, and segment handling nuances in 32-bit and 64-bit modes.

Mini-JVM: Educational Implementation of a Simplified Java Virtual Machine (JVM)

A GitHub project offers a simplified Java Virtual Machine (JVM) in Java, executing a subset of Java bytecode instructions. It covers frames, call stack, heap, class table, and a custom instruction set. Consider limitations.

Driving Compilers

The article outlines the author's journey learning C and C++, focusing on the compilation process often overlooked in programming literature. It introduces a series to clarify executable creation in a Linux environment.

I don't know how CPUs work so I simulated one in code (2019)

The author simulated an 8-bit CPU to understand computer architecture, learning about bits, ALU functions, and I/O communication, while planning to explore advanced topics like RISC architectures and modern CPU features.

Recent Performance Improvements in Function Calls in CPython

Recent CPython updates have improved function call performance, reducing overhead in loops and built-in functions, with notable speed increases, making Python more efficient for developers.

4 comments

By @nickpsecurity - 5 months

I remember Animats mentioned a naive, execution strategy of another project. Seeing two articles, I wonder if anyone made a list of many implementation strategies for bytecode interpreters with links to examples.

By @acheong08 - 5 months

Very informative read. I wonder if it’s start becoming more popular for people to compile & distribute Python byte code once the JIT is done similar to how Java does it.

By @cchianel - 5 months

Great overview on the structure of the CPython's Virtual Machine (as well as an introduction to stack based virtual machines).

I am unfortunately very familiar with CPython's bytecode [1], and do not recommend trying to do anything with it:

- It is extremely unstable; the same opcodes might do different things between versions. For instance: for the opcode `JUMP_IF_TRUE_OR_POP`, in Python 3.10 and below, its argument is an absolute address. In Python 3.11 and above, it is a relative address.

- It does not have synthetic (compiler-introduced) variables; this means the iterator of a loop must be kept on the stack, and as a result, exception handling must keep some items (namely, the iterators) on the stack.

- It is poorly documented (although it is getting better nowadays); there where numerous times where the stack layout an opcode expected changed between versions but were not yet included in the dis documentation.

One notable difference between CPython and other bytecode interpreters is a lack of multiple compilers. For instance, in Java, there are typically three compilers:

- `javac`, which produce class files that are interpreted by a basic, slow virtual machine.

- C1, which generates instrumented machine code to collect profiling data that C2 will use (still relatively slow).

- C2, which generates optimized machine code from the profiling data of C1 (significantly faster than C1).

CPython shoves the job of all three into its single interpreter. For instance, its CACHE instruction serves the purpose of recording the types on the stack for the subsequent instruction, which the specializing adaptive interpreter [2] uses to specialize an opcode (for instance, BINARY_OP to BINARY_OP_INPLACE_ADD_UNICODE).

All of the above makes it extremely hard to develop against CPython's bytecode, and is probably why there are few libraries for reading or transforming CPython's bytecode. Although most modifications can be done by either modifying the class object directly, wrapping the class/function, or extending the class, some cannot. For example:

- Change calls of function `foo` to function `bar`.

- Compute the automatic differentiation of a function [3].

- Translate python code blocks to another language for interoperability (for instance, to Java or SQL).

The way most library authors do it instead is by reading the AST, which is mostly stable.

[1] https://github.com/TimefoldAI/timefold-solver/tree/main/pyth..., a Python bytecode to Java bytecode translator, written to avoid excessive overhead caused by FFI calls (the new foreign Java API largely removes the overhead).

[2] https://peps.python.org/pep-0659/

[3] https://en.wikipedia.org/wiki/Automatic_differentiation

Weird things I learned while writing an x86 emulator

Mini-JVM: Educational Implementation of a Simplified Java Virtual Machine (JVM)

Driving Compilers

I don't know how CPUs work so I simulated one in code (2019)

Recent Performance Improvements in Function Calls in CPython

Recent CPython updates have improved function call performance, reducing overhead in loops and built-in functions, with notable speed increases, making Python more efficient for developers.

The Design and Implementation of the CPython Virtual Machine

Related

Weird things I learned while writing an x86 emulator

Mini-JVM: Educational Implementation of a Simplified Java Virtual Machine (JVM)

Driving Compilers

I don't know how CPUs work so I simulated one in code (2019)

Recent Performance Improvements in Function Calls in CPython

Related

Weird things I learned while writing an x86 emulator

Mini-JVM: Educational Implementation of a Simplified Java Virtual Machine (JVM)

Driving Compilers

I don't know how CPUs work so I simulated one in code (2019)

Recent Performance Improvements in Function Calls in CPython