October 23rd, 2024

A DSL for peephole transformation rules of integer operations in the PyPy JIT

A new domain-specific language for optimizing integer operations in the PyPy JIT compiler has been developed, featuring declarative transformation rules verified for correctness to enhance efficiency and monitor effectiveness.

Read original articleLink Icon
A DSL for peephole transformation rules of integer operations in the PyPy JIT

The blog post discusses the development of a domain-specific language (DSL) for peephole transformation rules aimed at optimizing integer operations within the PyPy Just-In-Time (JIT) compiler. The motivation for this initiative stems from the need for more efficient integer operations, particularly highlighted by the Pydrofoil project. The new DSL allows for a more declarative and less verbose way to express optimization rules, utilizing pattern matching to simplify sequences of integer operations. Each rule consists of a name, a pattern, and a target transformation, with the ability to include checks and compute intermediate results. The rules are automatically verified for correctness using the Z3 theorem prover during the build process, ensuring that incorrect optimizations are minimized. The post also outlines the structure of the DSL, including examples of simple and complex transformation rules, and discusses the importance of rule coverage and statistics to monitor the effectiveness of the optimizations. Additionally, it highlights the challenges of ensuring termination and confluence in the rewriting process, as well as the necessity of rigorous proof checks to validate the correctness of the rules.

- A new DSL for integer operation optimizations in the PyPy JIT has been developed.

- The DSL allows for more declarative and efficient expression of peephole transformation rules.

- Rules are verified for correctness using the Z3 theorem prover to prevent incorrect optimizations.

- The post emphasizes the importance of monitoring rule coverage and effectiveness through statistics.

- Challenges related to termination and confluence in the rewriting process are acknowledged.

Link Icon 7 comments
By @fweimer - 7 months
GCC does this match.pd. Here's the add_zero example, I think:

    /* Simplifications of operations with one constant operand and
       simplifications to constants or single values.  */
    
    (for op (plus pointer_plus minus bit_ior bit_xor)
      (simplify
        (op @0 integer_zerop)
        (non_lvalue @0)))
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/match.pd

The floating point case can be more complicated because of signed zeros and signaling NaNs, see fold_real_zero_addition_p.

By @westurner - 7 months
math.fma fused multiply add is in Python 3.13. Are there already rules to transform expressions to math.fma?

And does Z3 verification indicate differences in output due to minimizing float-rounding error?

https://docs.python.org/3/library/math.html#math.fma :

> math.fma(x, y, z)

> Fused multiply-add operation. Return (x * y) + z, computed as though with infinite precision and range followed by a single round to the float format. This operation often provides better accuracy than the direct expression (x * y) + z.

> This function follows the specification of the fusedMultiplyAdd operation described in the IEEE 754 standard.

By @hoten - 7 months
The before/after delta is very satisfying and clearly easier to work with. Very nice.

How does the DSL handle side effects? I'm assuming any term here could be a compound expression, so what part of the system prevents "0*callfn()" from reducing to "0"?

By @JonChesterfield - 7 months
A cost model would give a proof of termination - require the output of each transform to be cheaper than the input. Confluence is less obvious.
By @mjcohen - 7 months
Interesting to see which optimizations are never used.