Python Has Too Many Package Managers
Python's package management ecosystem faces fragmentation issues. PEP 621 introduced pyproject.toml for project configurations, leading to new package managers like Poetry. Conda offers robust dependency management, especially for data science workflows.
Read original articlePython's package management ecosystem has long been criticized for its fragmentation and lack of a unified tool akin to Cargo for Rust or npm for JavaScript. Various tools like pip, venv, pyenv, and pipenv have attempted to address this issue, but each comes with its own set of limitations and complexities. The recent acceptance of PEP 621 aimed to consolidate Python project configurations into a pyproject.toml file, leading to the emergence of new package managers like Poetry, PDM, Flit, and Hatch. Among these, Poetry stands out for its comprehensive approach to dependency resolution and virtual environment management, although it still faces challenges with slow resolution times and potential issues with dependency bounds. Additionally, the Conda ecosystem, spearheaded by tools like conda and mamba, offers a robust solution for managing Python and non-Python dependencies, particularly catering to data science workflows. While Conda's approach may not be ideal for all use cases, it remains a popular choice for data scientists due to its comprehensive features and integration with key Python tools like Ray and Metaflow.
Related
What's up Python? Django get background tasks, a new REPL, bye bye gunicorn
Several Python updates include Django's background task integration, a new lightweight Python REPL, Uvicorn's multiprocessing support, and PyPI blocking outlook.com emails to combat bot registrations, enhancing Python development and security.
Maker of RStudio launches new R and Python IDE
Posit introduces Positron, a new beta IDE merging R and Python development. Built on Visual Studio Code, it offers a user-friendly interface, data exploration tools, and seamless script running for polyglot projects.
Python grapples with Apple App Store rejections
Python 3.12 faced rejections in Apple's App Store due to the "itms-services" string. Python developers discussed solutions, leading to a consensus for Python 3.13 with an "--with-app-store-compliance" option to address the issue.
Python Modern Practices
Python development best practices involve using tools like mise or pyenv for multiple versions, latest Python version, pipx for app running. Project tips include src layout, pyproject.toml, virtual environments, Black, flake8, pytest, wheel, type hinting, f-strings, datetime, enum, Named Tuples, data classes, breakpoint(), logging, TOML config for efficiency and maintainability.
Reproducibility in Disguise
Reproducibility in software development is supported by tools like Bazel, addressing lifecycle challenges. Vendor dependencies for reproducibility face complexity, leading to proposed solutions like vendoring all dependencies for control.
One of the key faults of pip is what happens when you decide to remove a dependency. Removing a dependency does not actually remove the sub-dependencies that were brought in by the original dependency, leaving a lot of potential cruft.
This is not really an issue if your virtual environments are disposable. Just nuke and recreate venv from scratch using only what you need.This is similar approach to “zero-based budgeting”. It forces you to carefully pick your dependencies and think about what you carry.
I never mention transitive dependencies in my requirements.txt file, just direct dependencies and rely on pip to install all transitive libs.
You dont even have to freeze the version, just list the name and pull up latest version whenever you run pip upgrade
If you dont do that, you can quickly go down the javascript’s path of bloated node_modules.
Can people explain why venv&pip is a bad solution that doesnt work for them they have to resort to other package managers?
Even venv is not really required if you dockerize your python apps, which you will have to do anyways at deploy time
pdm is my current favorite package manager. It is fully PEP-compliant and the lockfile generation is nice. I wouldn't call hatch a package manager because I don't think it can make lockfiles.
uv is on my radar but it doesn't look ready for primetime yet. I saw they are building out a package management API with commands such as `uv add` and `uv remove`. Cross-platform lockfiles, editable installs, in-project .venv, and a baked-in build backend might be enough for me to make the switch. It's my pipe dream to get the full build/install/publish workflow down to a single static binary with no dependencies.
Anna-Lee Popkes has an awesome comparison of the available package managers [0] complete with handy Venn diagrams.
The pyOpenSci team has another nice comparison of the available build backends [1].
[0] https://alpopkes.com/posts/python/packaging_tools/
[1] https://www.pyopensci.org/python-package-guide/package-struc...
There's a lot of cruft and desire for a one-size-fits-all solution but the base tools are probably good enough. My setup is not the one-size-fits-all solution but it works for me, and my team, and lots of other teams.
Beware anyone who tells you that thirty years of tooling doesn't have a solution to the problem you're facing and you need this new shiny thing.*
*Playing with shiny things is fun and should not be discouraged, but must not be mandated either.
And even that first run is not particularly slow - _unless_ you depend on packages that are not available as wheels, which last I checked is not nearly as common nowadays as it was 10 years ago. However it can still happen: for example, if you are working with python 3.8 and you are using the latest version of some fancy library, they may have already stopped building wheels for that version of python. That means the package manager has to fall back to the sdist, and actually run the build scripts to acquire the metadata.
On top of all this, private package feeds (like the one provided by azure devops) sometimes don't provide a metadata API at all, meaning the package manager has to download every single package just to get the metadata.
The important bit of my little wall of text here though is that this is all true for all the other package managers as well. You can't necessarily attribute slow dependency resolution to a solver being written in C++ or pure python, given all of these other compounding factors which are often overlooked.
One-time (or as-needed for manual upgrades):
1. Make a venv with setuptools, wheel, and pip-tools (to get pip-compile) installed.
2. Use venv's pip-compile to generate a fully-pinned piptools-requirements.txt for the venv.
3. Check piptools-requirements.txt into my repo. This is used to get a stable, versioned `pip-compile` for use on my payload requirements.
During normal development: 1. Add high-level dependencies to a `requirements.in` file. Usually unversioned, unless there's a good reason to specify something more exact.
2. On changes to `requirements.in`, make a venv from `piptools-requirements.txt` and its `pip-compile` to solve `requirements.in` into a fully-pinned `requirements.txt`.
3. Check requirements.in and requirements.txt into the repo.
4. Install packages from requirements.txt when making the venv that I need for production.
This approach is very easy to automate, CI/CD friendly, and completely repeatable. It doesn't require any nonstandard tools to deploy (and only needs pip-compile when recompiling requirements.txt). It also makes a clear distinction between "what packages do the developers actually want?" and "what is the fully-versioned set of all dependencies".It's worked great for me over the years, and I'd highly recommend it as a reliable way to use the standard Python package tooling.
Rye uses other pretty standard stuff under the hood, tools that follow PEPs, its just a front end that is sane. uv is fast as well. It downloads the pinned version of standalone Python, it keeps everything in its own venv and theres very little messing/tweaking of the environment.
It is messy, although its getting better. I doubt everything will ever standardise to one tool however.
Distutils has been ripped out of Python core, setuptools is somewhat deprecated but not really. Just don't call setup.py directly. Or use flit. Or perhaps pyproject.toml? If the latter, flit, poetry and the 100 other frontends all have a different syntax.
Would you like to copy external data into the staging area while using flit? You are out of luck, use poetry. Poetry versions <= 1.2.3.4.5.111 on the other hand do not support a backend.
Should you use pip? Well, the latest hotness are the search-unfriendly named modules "build" and "install", which create a completely isolated environment. Which is the officially supported tool that will stay supported and not ripped out like distutils?
Questions over questions. And all that for creating a simple zip archive of a directory, which for some reason demands gigantic frameworks.
In my experience, Poetry works much better than, say... ... npm.
`uv` is basically that but faster.
I don't think that's quite the right way to frame this. Handing Rye over to a company that could maintain it full time isn't the same thing as "abandoning" it - and the new maintainers are active on that project: https://github.com/astral-sh/rye/commits/main/
An important qualification: Poetry uses pyproject.toml, but it doesn't use the standard (i.e. PEP 518, and 621) metadata layout. This in practice means that it doesn't follow the standard; it just happens to (confusingly) use a file with the same name.
To the best of my knowledge, the others fully comply with the listed PEPs. In practice this means that the difference between them is abstracted away by tools like `build`, thanks to PEP 517.
Compared to .NET and when I compile a framework-independent, single file executable.
My current favorites are uv + mise. Handles lockfiles, multiple versions of python, and it's very fast since uv is very fast. Have not tried pdm or hatch though.
Am I missing anything major by not using conda and Poetry?
For Python, you can use `pip wheel` (https://pip.pypa.io/en/stable/cli/pip_wheel/) to download .whl files of your dependencies in a folder, add that folder to your version control, and update `sys.path` to include your .whl.
For updating packages, you run `pip wheel` again and check in the new .whl files after carefully review the changes.
Notable omission in pip-tools which many are suggesting here as being simpler: it can't write requirements files for multiple environments/platforms without running it once for each of those environments and having one file for all of them.
We settled on Poetry at the time but it has been quite unstable overall. Not so bad recently but there were a lot of issues/regressions with it over time.
For this reason I am happy to see new takes on package management, hopefully some of these will have clearer wins over the others, where you have to spend ages trying to figure out which one will do what you need.
... Is it? :-)
Mwahahaha
This is the flow I follow for Python development
And then what solution do we have? Virtual environments, virtual machines, docker and appimage. We package all dependencies and even entire operating systems so as to avoid all these problems. It's legacy support all the way down.
From scratch, I'd say, devs should just pull all dependencies into their code and package them with their product. Users should never even have to touch something like pip, or a virtual environment. A package manager that allows a developer to publish tools others can use to build code, but that packages the dependencies with their package instead of pulling it for users, would be ideal. Where possible, avoid dependency on anything external entirely. What's that XKCD about yet another standard? I know it will never happen, but I sure do wish it worked like that.
Related
What's up Python? Django get background tasks, a new REPL, bye bye gunicorn
Several Python updates include Django's background task integration, a new lightweight Python REPL, Uvicorn's multiprocessing support, and PyPI blocking outlook.com emails to combat bot registrations, enhancing Python development and security.
Maker of RStudio launches new R and Python IDE
Posit introduces Positron, a new beta IDE merging R and Python development. Built on Visual Studio Code, it offers a user-friendly interface, data exploration tools, and seamless script running for polyglot projects.
Python grapples with Apple App Store rejections
Python 3.12 faced rejections in Apple's App Store due to the "itms-services" string. Python developers discussed solutions, leading to a consensus for Python 3.13 with an "--with-app-store-compliance" option to address the issue.
Python Modern Practices
Python development best practices involve using tools like mise or pyenv for multiple versions, latest Python version, pipx for app running. Project tips include src layout, pyproject.toml, virtual environments, Black, flake8, pytest, wheel, type hinting, f-strings, datetime, enum, Named Tuples, data classes, breakpoint(), logging, TOML config for efficiency and maintainability.
Reproducibility in Disguise
Reproducibility in software development is supported by tools like Bazel, addressing lifecycle challenges. Vendor dependencies for reproducibility face complexity, leading to proposed solutions like vendoring all dependencies for control.