July 12th, 2024

Free-threaded CPython is ready to experiment with

CPython 3.13 introduces free-threading to enhance performance by allowing parallel threads without the GIL. Challenges like thread-safety and ABI compatibility are being addressed for future adoption as the default build.

Read original articleLink Icon
Free-threaded CPython is ready to experiment with

A new experimental feature in CPython 3.13 called free-threading allows running multiple threads in parallel within the same interpreter by disabling the global interpreter lock (GIL). This change aims to improve performance, especially in multi-threaded scenarios, by utilizing multiple CPU cores effectively. However, implementing free-threading poses challenges related to thread-safety and ABI incompatibility between default and free-threaded CPython builds. Issues like intermittent failures in libraries such as NumPy and PyWavelets highlight the complexity of ensuring thread-safety in the ecosystem. Despite the challenges, efforts are underway to address these issues, with a focus on compatibility and performance improvements. The team behind this initiative is working on various projects to support free-threaded CPython, with plans to release compatible wheels on PyPI for experimentation. The ultimate goal is to make free-threaded CPython the default build in the future, with ongoing efforts to document lessons learned and facilitate community contributions to support this transition.

Link Icon 24 comments
By @eigenvalue - 6 months
Really excited for this. Once some more time goes by and the most important python libraries update to support no GIL, there is just a tremendous amount of performance that can be automatically unlocked with almost no incremental effort for so many organizations and projects. It's also a good opportunity for new and more actively maintained projects to take market share from older and more established libraries if the older libraries don't take making these changes seriously and finish them in a timely manner. It's going to be amazing to saturate all the cores on a big machine using simple threads instead of dealing with the massive overhead and complexity and bugs of using something like multiprocessing.
By @simonw - 6 months
I got this working on macOS and wrote up some notes on the installation process and a short script I wrote to demonstrate how it differs from non-free-threaded Python: https://til.simonwillison.net/python/trying-free-threaded-py...
By @nine_k - 6 months
Python 3 progress so far:

  [x] Async.
  [x] Optional static typing.
  [x] Threading.
  [ ] JIT.
  [ ] Efficient dependency management.
By @vegabook - 6 months
Clearly the Python 2 to 3 war was so traumatising (and so badly handled) that the core Python team is too scared to do the obvious thing, and call this Python 4.

This is a big fundamental and (in many cases breaking) change, even if it's "optional".

By @Sparkyte - 6 months
My body is ready. I love python because the ease of writing and logic. Hopefully the more complicated free-threaded approach is comprehensive enough to write it like we traditionally write python. Not saying it is or isn't I just haven't dived enough into python multithreading because it is hard to put those demons back once you pull them out.
By @mihaic - 6 months
Does anyone know if there is more serious single threaded performance degradation (more than a few percent for instance)? I couldn't find any benchmarks, just some generic reassurance that everything is fine.
By @discreteevent - 6 months
I remember back around 2007 all the anxious blog posts about the free lunch (Moore's law) being over. Parallelism was mandatory now. We were going to need exotic solutions like software transactional memory to get out of the crisis (and we could certainly forget about object orientation).

Meanwhile what takes the crown? - Single threaded python.

(Well, ok Rust looks like it's taking first place where you really need the speed and it does help parallelism without requiring absolute purity)

By @farhanhubble - 6 months
It remains to be seen how many subtle bugs are now introduced by programmers who have never dealt with real multithreading.
By @jmward01 - 6 months
I know, I know, 'not every story needs to be about ML' but.... I can only imagine how unlocking the GIL will change the nature of ML training and inference. There is so much waste and complexity in passing memory around and coordinating processes. I know that libraries have made it (somewhat) easier and more efficient but I can't wait to see what can be done with things like pytorch when optimized for this.
By @westurner - 6 months
Will there be an effort to encourage devs to add support for free-threaded Python like for Python 3 [1] and for Wheels [2]?

Is there a cibuildwheel / CI check for free-threaded Python support?

Is there already a reason not to have Platform compatibility tags for free-threaded cpython support? https://packaging.python.org/en/latest/specifications/platfo...

Is there a hame - a hashtaggable name - for this feature to help devs find resources to help add support?

Can an LLM almost port in support for free-threading in Python, and how should we expect the tests to be insufficient?

"Porting Extension Modules to Support Free-Threading" https://py-free-threading.github.io/porting/

[1] "Python 3 "Wall of Shame" Becomes "Wall of Superpowers" Today" https://news.ycombinator.com/item?id=4907755

[2] https://pythonwheels.com/

(Edit)

Compatibility status tracking: https://py-free-threading.github.io/tracking/

By @elijahbenizzy - 6 months
I'm really curious to see how this will work with async. There's a natural barrier (I/O versus CPU-bound code), which isn't always a perfect distinction.

I'd love to see a more fluid model between the two -- E.G. if I'm doing a "gather" on CPU-bound coroutines, I'm curious if there's something that can be smart enough to JIT between async and multithreaded implementations.

"Oh, the first few tasks were entirely CPU-bound? Cool, let's launch another thread. Oh, the first few threads were I/O-bound? Cool, let's use in-thread coroutines".

Probably not feasible for a myriad of reasons, but even a more fluid programming model could be really cool (similar interfaces with a quick swap between?).

By @grandimam - 6 months
How is the no-gil performance compared to other languages like - javascript (nodejs), go, rust, and even java? If it's bearable then I believe there is enormous value that could be generated instead of spending time porting to other languages.
By @VagabundoP - 6 months
Highly recommend the core.py podcast if you're interested in the background, there are a few episodes that focus on the GILectomy:

-Episode 2: Removing the GIL[1]

-Episode 12: A Legit Episode[2]

[1]https://www.youtube.com/watch?v=jHOtyx3PSJQ&list=PLShJCpYUN3...

[2]https://www.youtube.com/watch?v=IGYxMsHw9iw&list=PLShJCpYUN3...

By @vldmrs - 6 months
Great news ! It would be interesting to see performance comparison for IO-bound tasks like http requests between single-threaded asyncio code and multi-threaded asyncio
By @pansa2 - 6 months
PEP703 explains that with the GIL removed, operations on lists such as `append` remain thread-safe because of the addition of per-list locks.

What about simple operations like incrementing an integer? IIRC this is currently thread-safe because the GIL guarantees each bytecode instruction is executed atomically.

By @gnatolf - 6 months
Good to hear. The authors are touching on the journey it is to make Cython continue to work. I wonder how hard it'll be to continue to provide bdist packages, or within what timeframe, if at all, Cython can transparently ensure correctness for a no-gil build. Anyone got any insights?
By @codethief - 6 months
Yesterday someone presented preliminary benchmarks here at EuroPython 2024, comparing no-GIL to sub-interpreters and to multiprocessing. Upshot: This gon' be good!
By @earthnail - 6 months
Oh how much this would simplify torch.DataLoader (and its equivalents)…

Really excited about this.

By @throwaway5752 - 6 months
GVR, you are sorely missed, though I hope you are enjoying life.
By @nas - 6 months
Very encouraging news!
By @OutOfHere - 6 months
It has been ready for a few months now, at least since 3.13.0 beta 1 which released on 2024-05-08, although alpha versions had it working too. I don't know why this is news now.

With it, the single-threaded case is slower.

By @anacrolix - 6 months
Was ready for this 15 years ago when I loved Python and regularly contributed. At the time, nobody wanted to do it and I got bored and went to Go.