August 12th, 2024

Approximating sum types in Python with Pydantic

Pydantic enables robust data models in Python, supporting sum types and discriminated unions for clear, type-safe definitions. It enhances maintainability and reliability by preventing invalid states in applications.

Read original articleLink Icon
ConfusionSkepticismCuriosity
Approximating sum types in Python with Pydantic

Pydantic, a popular Python library, allows developers to create robust data models, facilitating the validation of inputs and outputs in applications. This article discusses how to approximate sum types in Python using Pydantic's support for tagged unions. Sum types, or algebraic data types, enable the representation of values that can take on multiple forms, ensuring that invalid states are unrepresentable within the type system. The author illustrates how to define a model with exclusive fields (foo XOR bar) using Pydantic, highlighting the limitations of traditional approaches that may allow invalid states. By employing field validators, developers can enforce invariants, but this can lead to complex implementations that obscure type safety. The article introduces discriminated unions, which utilize an additional field (discriminator) to differentiate between variants, allowing for more flexible and clear model definitions. The author also explores the use of non-string discriminators and enums to enhance maintainability. Ultimately, Pydantic's capabilities enable developers to create precise and type-safe models, improving the reliability of Python applications.

- Pydantic allows for the creation of robust data models in Python.

- Sum types help represent values that can take on multiple forms, preventing invalid states.

- Discriminated unions in Pydantic use a discriminator field to differentiate between variants.

- Non-string discriminators and enums can enhance maintainability in model definitions.

- Pydantic's features improve type safety and reliability in Python applications.

Related

Summary of Major Changes Between Python Versions

Summary of Major Changes Between Python Versions

The article details Python updates from versions 3.7 to 3.12, highlighting async/await, Walrus operator, Type hints, F-strings, Assignment expressions, Typing enhancements, Structural Pattern Matching, Tomllib, and useful tools.

Beyond Hypermodern: Python is easy now

Beyond Hypermodern: Python is easy now

Python development in 2024 focuses on simplicity with tools like Rye aligning with packaging standards. It streamlines setup, dependency management, and project structuring, emphasizing typing with Pyright for efficient code maintenance and pytest for testing.

The algebra (and calculus) of algebraic data types

The algebra (and calculus) of algebraic data types

The relationship between algebraic data types (ADTs) and mathematical algebra is explored, emphasizing similarities in operations. Examples like Choice and binary trees illustrate how algebraic rules apply to ADTs, despite challenges with structures like Nat. Poking holes in data structures is introduced as a way to understand calculus on data types.

Higher-kinded bounded polymorphism in OCaml

Higher-kinded bounded polymorphism in OCaml

Higher-kinded bounded polymorphism is crucial for generic operations and DSLs. OCaml lacks direct support but can simulate it through its module system, leading to complex and verbose code.

A Knownbits Abstract Domain for the Toy Optimizer, Correctly

A Knownbits Abstract Domain for the Toy Optimizer, Correctly

The article details the Knownbits Abstract Domain's implementation in PyPy's Toy Optimizer, enhancing integer operation optimizations through bit analysis, property-based testing, and improving static analysis for efficient code generation.

AI: What people are saying
The comments reflect a diverse range of opinions on Pydantic and its role in Python's type system.
  • Some users argue that Pydantic adds unnecessary complexity compared to Python's built-in type system, suggesting alternatives like dataclasses or other libraries.
  • There is a discussion about the terminology and concepts surrounding type unions, with some expressing confusion over the various names used.
  • Several commenters highlight the learning curve associated with Pydantic, especially for teams unfamiliar with its features.
  • Some users mention other libraries, such as mashumaro and typedload, as viable alternatives to Pydantic.
  • There is a broader conversation about the challenges of typing in programming languages, comparing Python's approach to that of statically-typed languages like C# and TypeScript.
Link Icon 16 comments
By @carderne - 9 months
I think it would be useful to differentiate more clearly between what is offered by Python's type system, and what is offered by Pydantic.

That is, you can approximate Rusts's enum (sum type) with pure Python using whatever combination of Literal, Enum, Union and dataclasses. For example (more here[1]):

  @dataclass
  class Foo: ...
  @dataclass
  class Bar: ...
  
  Frobulated = Foo | Bar
Pydantic adds de/ser, but if you're not doing that then you can get very far without it. (And even if you are, there are lighter-weight options that play with dataclasses like cattrs, pyserde, dataclasses-json).

[1] https://threeofwands.com/algebraic-data-types-in-python/

By @adamc - 9 months
The problem I see with it is this: Now, instead of understanding Python, which is straightforward, you have to understand a bunch about Pydantic and type unions. In a large shop of Python programmers, I would expect many would not follow most of this.

Essentially, if this is a feature you must have, Python seems like the wrong language. Maybe if you only need it in spots this makes sense...

By @reubenmorais - 9 months
One caveat of the tip in the "Deduplicating shared variant state" section about including an underspecified discriminator field in the base class, is that it doesn't play well if you're using Literals instead of Enums as the discriminator type. Python does not allow you to narrow a literal type of a field in a subclass, so the following doesn't type check:

  from typing import Literal
  
  class _FrobulatedBase:
      kind: Literal['foo', 'bar']
      value: str
  
  class Foo(_FrobulatedBase):
      kind: Literal['foo'] = 'foo'
      foo_specific: int
  
  class Bar(_FrobulatedBase):
      kind: Literal['bar'] = 'bar'
      bar_specific: bool


  "kind" overrides symbol of same name in class "_FrobulatedBase"
    Variable is mutable so its type is invariant
      Override type "Literal['foo']" is not the same as base type "Literal['foo', 'bar']"
https://pyright-play.net/?code=GYJw9gtgBALgngBwJYDsDmUkQWEMo...
By @jghn - 9 months
Meta comment.

Something I've wondered of late. I keep seeing these articles pop up and they're trying to recreate ADTs for Python in the manner of Rust. But there's a long history of ADTs in other languages. For instance we don't see threads on recreating Haskell's ADT structures in Python.

Is this an artifact of Rust is hype right now, especially on HN? As in the typical reader is more familiar with Rust than Haskell, and thus "I want to do what I'm used to in Rust in Python" is more likely to resonate than "I want to do what I'm used to in Haskell in Python"?

At the end of the day it doesn't *really* matter as the underlying construct being modeled is the same. It's the translation layer that I'm wondering about.

By @LtWorf - 9 months
Author of typedload here.

typedload does this without need to pass a "discriminator" parameter.

Just having the types with the same field defined as a literal of different things will suffice.

I've also implemented an algorithm to inspect the data and find out the type directly from the literal field, to avoid having to try multiple types when loading a union. Pydantic has also implemented the same strategy afterwards.

typedload is faster than pydantic to load tagged unions. It is written in pure python.

edit: Also, typedload just uses completely regular dataclasses or attrs. No need for all those different BaseModel, RootModel and understanding when to use them.

By @__mharrison__ - 9 months
A pretty good article. Would be a great article if they used real world examples instead of made up "formulated" ones.
By @ks2048 - 9 months
It's a shame there's so many different names for a set of very related (or identical?) concepts. For example wikipedia says "tagged union" is also known as "variant, variant record, choice type, discriminated union, disjoint union, sum type, or coproduct". [https://en.wikipedia.org/wiki/Tagged_union]
By @blorenz - 9 months
Discriminated unions are also a wonderful part of the zod library. I use them to overload endpoints for multiple relevant operations.
By @hexane360 - 9 months
As an alternative to Pydantic, check out the wonderful mashumaro: https://github.com/Fatal1ty/mashumaro

I've also played around with writing my own dataclass/data conversion library: https://github.com/hexane360/pane

By @nsonha - 9 months
And people complain that typescript is crazy. I think we just need to acknowledge that typing is hard, especially with what mainstream languages give us.
By @adsharma - 9 months
python has been about expressing ideas. Even if the language doesn't support some of the concepts natively, it's useful to express it in python so it could be effectively transpiled into a language that does. This is what py2many needs from a curated subset of python with some enhancements as opposed to inventing a new language.

https://github.com/adsharma/adt contains a small enhancement for @sealed decorator from the excellent upstream repo.

https://github.com/py2many/py2many/blob/main/tests/cases/sea... https://github.com/py2many/py2many/blob/main/tests/expected/...

By @wizerno - 9 months
A slightly related discussion on Type Unions in C# from a week ago: https://news.ycombinator.com/item?id=41183240
By @gnulinux996 - 9 months
I feel like with typescript and pydantic taking center stage it seems that the dynamic vs static typing debate finally comes to a close.

More and more Java seems to be not that bad after all.

By @levhawk - 9 months
That looks crazy coming from statically-typed languages. So many hops, efforts, custom structures just to verify types. Come on, modern C# can do all of that out-of-the-box. You define a record/class for DTO and System.Text.Json will either convert it successfully or throw you an exception that will say exactly what the problem was and at what character/field. Combined with much more advanced IntelliSense, development comfort is so much better. But of course, whatever works for you.