September 1st, 2024

The Design and Implementation of the CPython Virtual Machine

The article examines CPython's virtual machine design, focusing on its stack-based architecture, bytecode instruction format, and execution complexities, encouraging compiler engineers to learn from its implementation details.

Read original articleLink Icon
The Design and Implementation of the CPython Virtual Machine

The article "The Design & Implementation of the CPython Virtual Machine" by Abhinav Upadhyay provides an in-depth exploration of CPython's bytecode instruction format and the internals of its execution engine. It discusses the significance of virtual machines (VMs) in programming languages, particularly how they facilitate portability and simplify the compilation process. The article contrasts stack-based and register-based VMs, noting that CPython employs a stack-based architecture. It details the structure of bytecode instructions, including how they are packed and executed within the CPython VM. The author explains the bytecode evaluation loop, which interprets and executes the bytecode, and provides examples of how Python functions are compiled into bytecode. The article also highlights the complexity of real-world VM implementations, which involve additional components like stack frames and execution contexts. The author encourages readers, especially compiler engineers, to learn from the implementation details of CPython to enhance their understanding and performance in other programming contexts. The article is extensive, offering a PDF version for paid subscribers that includes additional insights and a bonus appendix.

- The article explores the design and implementation of the CPython virtual machine.

- It explains the differences between stack-based and register-based virtual machines.

- CPython uses a stack-based architecture for its bytecode execution.

- The bytecode instructions are packed into a two-byte format for execution.

- The article provides insights into the complexities of real-world virtual machine implementations.

Link Icon 4 comments
By @nickpsecurity - 5 months
I remember Animats mentioned a naive, execution strategy of another project. Seeing two articles, I wonder if anyone made a list of many implementation strategies for bytecode interpreters with links to examples.
By @acheong08 - 5 months
Very informative read. I wonder if it’s start becoming more popular for people to compile & distribute Python byte code once the JIT is done similar to how Java does it.
By @cchianel - 5 months
Great overview on the structure of the CPython's Virtual Machine (as well as an introduction to stack based virtual machines).

I am unfortunately very familiar with CPython's bytecode [1], and do not recommend trying to do anything with it:

- It is extremely unstable; the same opcodes might do different things between versions. For instance: for the opcode `JUMP_IF_TRUE_OR_POP`, in Python 3.10 and below, its argument is an absolute address. In Python 3.11 and above, it is a relative address.

- It does not have synthetic (compiler-introduced) variables; this means the iterator of a loop must be kept on the stack, and as a result, exception handling must keep some items (namely, the iterators) on the stack.

- It is poorly documented (although it is getting better nowadays); there where numerous times where the stack layout an opcode expected changed between versions but were not yet included in the dis documentation.

One notable difference between CPython and other bytecode interpreters is a lack of multiple compilers. For instance, in Java, there are typically three compilers:

- `javac`, which produce class files that are interpreted by a basic, slow virtual machine.

- C1, which generates instrumented machine code to collect profiling data that C2 will use (still relatively slow).

- C2, which generates optimized machine code from the profiling data of C1 (significantly faster than C1).

CPython shoves the job of all three into its single interpreter. For instance, its CACHE instruction serves the purpose of recording the types on the stack for the subsequent instruction, which the specializing adaptive interpreter [2] uses to specialize an opcode (for instance, BINARY_OP to BINARY_OP_INPLACE_ADD_UNICODE).

All of the above makes it extremely hard to develop against CPython's bytecode, and is probably why there are few libraries for reading or transforming CPython's bytecode. Although most modifications can be done by either modifying the class object directly, wrapping the class/function, or extending the class, some cannot. For example:

- Change calls of function `foo` to function `bar`.

- Compute the automatic differentiation of a function [3].

- Translate python code blocks to another language for interoperability (for instance, to Java or SQL).

The way most library authors do it instead is by reading the AST, which is mostly stable.

[1] https://github.com/TimefoldAI/timefold-solver/tree/main/pyth..., a Python bytecode to Java bytecode translator, written to avoid excessive overhead caused by FFI calls (the new foreign Java API largely removes the overhead).

[2] https://peps.python.org/pep-0659/

[3] https://en.wikipedia.org/wiki/Automatic_differentiation