July 6th, 2024

An analysis of module names inside top PyPI packages

The blog post emphasizes Python package naming conventions, mapping module names to package names, and analyzing PyPI data. Insights include normalized names, common prefixes/suffixes, and advice for developers to follow conventions and avoid namespace packages.

Read original articleLink Icon
An analysis of module names inside top PyPI packages

The blog post discusses the importance of naming conventions in Python packages and the challenges associated with mapping module names to package names. The author outlines a plan to gather data on package names from PyPI, analyze file structures, and identify naming conventions. The analysis reveals insights such as the prevalence of normalized module names matching package names, common prefixes and suffixes used in package names, and the impact of namespace packages on naming conventions. The post concludes with recommendations for package developers to adhere to naming conventions, upload wheels, and avoid namespace packages when possible. The author plans to continue monitoring naming conventions in Python packages and refining the analysis.

Related

Start all of your commands with a comma (2009)

Start all of your commands with a comma (2009)

The article discusses creating a ~/bin/ directory in Unix to store custom commands, avoiding name collisions with system commands by prefixing custom commands with a comma. This technique ensures unique, easily accessible commands.

Python Modern Practices

Python Modern Practices

Python development best practices involve using tools like mise or pyenv for multiple versions, latest Python version, pipx for app running. Project tips include src layout, pyproject.toml, virtual environments, Black, flake8, pytest, wheel, type hinting, f-strings, datetime, enum, Named Tuples, data classes, breakpoint(), logging, TOML config for efficiency and maintainability.

Reproducibility in Disguise

Reproducibility in Disguise

Reproducibility in software development is supported by tools like Bazel, addressing lifecycle challenges. Vendor dependencies for reproducibility face complexity, leading to proposed solutions like vendoring all dependencies for control.

Python Has Too Many Package Managers

Python Has Too Many Package Managers

Python's package management ecosystem faces fragmentation issues. PEP 621 introduced pyproject.toml for project configurations, leading to new package managers like Poetry. Conda offers robust dependency management, especially for data science workflows.

Simple notes for Emacs with an efficient file-naming scheme

Simple notes for Emacs with an efficient file-naming scheme

The Denote package for Emacs by Protesilaos Stavrou simplifies note-taking with structured file names, emphasizing predictability, flexibility, and integration with other packages. It promotes clear naming conventions and customizable workflows.

Link Icon 7 comments
By @nicwolff - 6 months
I've got a fun issue right now – two packages with dashes in the package names but underscores in the module names:

https://pypi.org/project/xml-from-seq/ → xml_from_seq

https://pypi.org/project/cast-from-env/ → cast_from_env

Simple normalization, right? But `pip` installs one with underscores and one with dashes:

    >>> from importlib.metadata import metadata
    >>> metadata('xml_from_seq')['Name']
    'xml_from_seq'
    >>> metadata('cast_from_env')['Name']
    'cast-from-env' 
so that's what ends up in `pip freeze`.

I _think_ it's because there a bdist in PyPI for one, and not the other, so `pip` is using different "backends" that normalize the names into `METADATA` differently... ugh.

By @woodruffw - 6 months
This is a great writeup on a perennially misunderstood topic in Python packaging (and namespacing/module semantics)! A lot of (bad) security tools begin with the assumption that a top-level module name can always be reliably mapped back to its PyPI package name, and this post's data concretely dispels that assumption.

It's a shame that there isn't (currently) a reliable way to perform this backwards link: the closest current things are `{dist}.dist-info/METADATA` (unreliable, entirely user controlled) and `direct_url.json` for URL-installed packages, which isn't present for packages resolved from indices.

Edit: PEP 710[1] would accomplish the above, but it's still in draft.

[1]: https://peps.python.org/pep-0710/

By @dheera - 6 months
I hate this shit.

    yaml -> pip install pyyaml
    cv2 -> pip install opencv-contrib-python
    PIL -> pip install pillow (wtf, this should be a misdemeanor punishable by being forced to used windows for a year)
And can we please ban "py" and "python" from appearing inside the name of python packages?

Or else I'm going to start writing some python packages with ".js" in their name.

By @formerly_proven - 6 months
> There are 210 packages which include a top-level test or tests directory

Now there's a somewhat useful "make a pull request to an open source project" exercise.

By @bangaladore - 6 months
Every single language with centralized dependency managers should, without a doubt require namespacing for package names.

user/package-name group/package-name

etc...

By @doctorpangloss - 6 months
On the one hand, you could say it's a security issue, an installed Python package can make any module names importable, which would have surprising effects if say, it overwrote stuff like aiohttp or your postgres client or whatever.

On the other hand, you know, it's already source code, it can do whatever it wants...

By @wodenokoto - 6 months
Shame there weren’t examples of the most different package and import names.