Python extensions should be lazy
Python's `ast.parse` function is slow due to memory management issues. A Rust extension improved AST processing speed by 16x, suggesting lazy loading strategies for better performance in Python extensions.
Read original articleThe article discusses the performance issues encountered when using Python's `ast.parse` function in a large codebase, which took approximately 8 seconds for 500,000 lines of code. The author identifies that the inefficiency stems from Python's memory management and the overhead of converting the Abstract Syntax Tree (AST) into Python objects. The process involves multiple memory allocations and significant garbage collection, leading to performance bottlenecks. To address this, the author implemented a Rust extension that processes ASTs without converting them into Python objects until necessary, resulting in a dramatic reduction in runtime from 8.7 seconds to 530 milliseconds. This approach minimizes memory pressure and garbage collection activity. The author suggests that adopting a lazy loading strategy, similar to that used in NumPy, could enhance the performance of Python extensions by allowing them to manage memory more efficiently and only create Python objects when required.
- Python's `ast.parse` can be slow due to memory management overhead.
- A Rust extension was developed to optimize AST processing, achieving a 16x speedup.
- The new implementation reduced memory allocations and garbage collection significantly.
- Lazy loading strategies could improve performance for Python extensions.
- Efficient memory management is crucial for handling large codebases in Python.
Related
Spending too much time optimizing for loops
Researcher Octave Larose shared insights on optimizing Rust interpreters, focusing on improving performance for the SOM language. By enhancing loop handling and addressing challenges, significant speedups were achieved, balancing code elegance with efficiency.
Spending too much time optimizing for loops
Researcher Octave Larose discussed optimizing Rust interpreters, focusing on improving performance for the SOM language. They highlighted enhancing loop efficiency through bytecode and primitives, addressing challenges like Rust limitations and complex designs. Despite performance gains, trade-offs between efficiency and code elegance persist.
Some Tricks from the Scrapscript Compiler
The Scrapscript compiler implements optimization tricks like immediate objects, small strings, and variants for better performance. It introduces immediate variants and const heap to enhance efficiency without complexity, seeking suggestions for future improvements.
I Hope Rust Does Not Oxidize Everything
The author expresses concerns about Rust's widespread adoption in programming, citing issues with syntax, async features, complexity, and long compile times. They advocate for language diversity to prevent monoculture, contrasting Rust with their language Yao.
Using Rust to corrode insane Python run-times
Vortexa improved a Python task processing GPS signals from 30 hours to 6 hours by developing a custom Rust library, achieving a 24x speed increase while maintaining existing business logic.
However, the conclusion is debatable. Not everyone has this problem. Not everyone would benefit from the same solution.
Sure, if your data can be loaded, manipulated, and summarized outside of Python land, then lazy object creation is a good way to go. But then you're giving up all of the Python tooling that likely drove you to Python in the first place.
Most of the Python ecosystem from sets and dicts to the standard library is focused on manipulating native Python objects. While the syntax supports method calls to data encapsulated elsewhere, it can be costly to constantly "box and unbox" data to move back and forth between the two worlds.
When linking to code on GitHub in an article like this, for posterity, it’s a good idea to link based on a specific commit instead of a branch.
It might be a good idea to change your link to the `Py_CompileStringObject()` function in CPython’s `Python/pythonrun.c` [0] to a commit-based link [1].
[0]: https://github.com/python/cpython/blob/main/Python/pythonrun...
[1]: https://github.com/python/cpython/blob/967a4f1d180d4cd669d5c...
You could make the API transparently lazy, i.e. ast.parse creates only one AstNode object or whatever and when you ask that object for e.g. its children those are created lazily from the underlying C struct. To preserve identity (which I assume is something users of ast are more likely to rely on than usual) you'd have to add some extra book-keeping to make it not generate new objects for each access, but memoize them.
The key for optimizing a Python extension is to minimize the number of times you have to interact with Python.
A couple of other tips in addition to what this article provides:
1. Object pooling is quite useful as it can significantly cut down on the number of allocations.
2. Be very careful about tools like pybind11 that make it easier to write extensions for Python. They come with a significant amount of overhead. For critical hotspots, always use the raw Python C extension API.
3. Use numpy arrays whenever possible when returning large lists to Python. A python list of python integers is amazingly inefficient compared to a numpy array of integers.
jemalloc also gave good results with NodeJS and Ruby projects i did.
But I couldn't help but notice that when `_PyCompile_AstOptimize` fails (<0), then `arena` is never freed. I think this is bug :thinking:.
Related
Spending too much time optimizing for loops
Researcher Octave Larose shared insights on optimizing Rust interpreters, focusing on improving performance for the SOM language. By enhancing loop handling and addressing challenges, significant speedups were achieved, balancing code elegance with efficiency.
Spending too much time optimizing for loops
Researcher Octave Larose discussed optimizing Rust interpreters, focusing on improving performance for the SOM language. They highlighted enhancing loop efficiency through bytecode and primitives, addressing challenges like Rust limitations and complex designs. Despite performance gains, trade-offs between efficiency and code elegance persist.
Some Tricks from the Scrapscript Compiler
The Scrapscript compiler implements optimization tricks like immediate objects, small strings, and variants for better performance. It introduces immediate variants and const heap to enhance efficiency without complexity, seeking suggestions for future improvements.
I Hope Rust Does Not Oxidize Everything
The author expresses concerns about Rust's widespread adoption in programming, citing issues with syntax, async features, complexity, and long compile times. They advocate for language diversity to prevent monoculture, contrasting Rust with their language Yao.
Using Rust to corrode insane Python run-times
Vortexa improved a Python task processing GPS signals from 30 hours to 6 hours by developing a custom Rust library, achieving a 24x speed increase while maintaining existing business logic.