Approximating sum types in Python with Pydantic
Pydantic enables robust data models in Python, supporting sum types and discriminated unions for clear, type-safe definitions. It enhances maintainability and reliability by preventing invalid states in applications.
Read original articlePydantic, a popular Python library, allows developers to create robust data models, facilitating the validation of inputs and outputs in applications. This article discusses how to approximate sum types in Python using Pydantic's support for tagged unions. Sum types, or algebraic data types, enable the representation of values that can take on multiple forms, ensuring that invalid states are unrepresentable within the type system. The author illustrates how to define a model with exclusive fields (foo XOR bar) using Pydantic, highlighting the limitations of traditional approaches that may allow invalid states. By employing field validators, developers can enforce invariants, but this can lead to complex implementations that obscure type safety. The article introduces discriminated unions, which utilize an additional field (discriminator) to differentiate between variants, allowing for more flexible and clear model definitions. The author also explores the use of non-string discriminators and enums to enhance maintainability. Ultimately, Pydantic's capabilities enable developers to create precise and type-safe models, improving the reliability of Python applications.
- Pydantic allows for the creation of robust data models in Python.
- Sum types help represent values that can take on multiple forms, preventing invalid states.
- Discriminated unions in Pydantic use a discriminator field to differentiate between variants.
- Non-string discriminators and enums can enhance maintainability in model definitions.
- Pydantic's features improve type safety and reliability in Python applications.
Related
Summary of Major Changes Between Python Versions
The article details Python updates from versions 3.7 to 3.12, highlighting async/await, Walrus operator, Type hints, F-strings, Assignment expressions, Typing enhancements, Structural Pattern Matching, Tomllib, and useful tools.
Beyond Hypermodern: Python is easy now
Python development in 2024 focuses on simplicity with tools like Rye aligning with packaging standards. It streamlines setup, dependency management, and project structuring, emphasizing typing with Pyright for efficient code maintenance and pytest for testing.
The algebra (and calculus) of algebraic data types
The relationship between algebraic data types (ADTs) and mathematical algebra is explored, emphasizing similarities in operations. Examples like Choice and binary trees illustrate how algebraic rules apply to ADTs, despite challenges with structures like Nat. Poking holes in data structures is introduced as a way to understand calculus on data types.
Higher-kinded bounded polymorphism in OCaml
Higher-kinded bounded polymorphism is crucial for generic operations and DSLs. OCaml lacks direct support but can simulate it through its module system, leading to complex and verbose code.
A Knownbits Abstract Domain for the Toy Optimizer, Correctly
The article details the Knownbits Abstract Domain's implementation in PyPy's Toy Optimizer, enhancing integer operation optimizations through bit analysis, property-based testing, and improving static analysis for efficient code generation.
- Some users argue that Pydantic adds unnecessary complexity compared to Python's built-in type system, suggesting alternatives like dataclasses or other libraries.
- There is a discussion about the terminology and concepts surrounding type unions, with some expressing confusion over the various names used.
- Several commenters highlight the learning curve associated with Pydantic, especially for teams unfamiliar with its features.
- Some users mention other libraries, such as mashumaro and typedload, as viable alternatives to Pydantic.
- There is a broader conversation about the challenges of typing in programming languages, comparing Python's approach to that of statically-typed languages like C# and TypeScript.
That is, you can approximate Rusts's enum (sum type) with pure Python using whatever combination of Literal, Enum, Union and dataclasses. For example (more here[1]):
@dataclass
class Foo: ...
@dataclass
class Bar: ...
Frobulated = Foo | Bar
Pydantic adds de/ser, but if you're not doing that then you can get very far without it. (And even if you are, there are lighter-weight options that play with dataclasses like cattrs, pyserde, dataclasses-json).[1] https://threeofwands.com/algebraic-data-types-in-python/
Essentially, if this is a feature you must have, Python seems like the wrong language. Maybe if you only need it in spots this makes sense...
from typing import Literal
class _FrobulatedBase:
kind: Literal['foo', 'bar']
value: str
class Foo(_FrobulatedBase):
kind: Literal['foo'] = 'foo'
foo_specific: int
class Bar(_FrobulatedBase):
kind: Literal['bar'] = 'bar'
bar_specific: bool
"kind" overrides symbol of same name in class "_FrobulatedBase"
Variable is mutable so its type is invariant
Override type "Literal['foo']" is not the same as base type "Literal['foo', 'bar']"
https://pyright-play.net/?code=GYJw9gtgBALgngBwJYDsDmUkQWEMo...Something I've wondered of late. I keep seeing these articles pop up and they're trying to recreate ADTs for Python in the manner of Rust. But there's a long history of ADTs in other languages. For instance we don't see threads on recreating Haskell's ADT structures in Python.
Is this an artifact of Rust is hype right now, especially on HN? As in the typical reader is more familiar with Rust than Haskell, and thus "I want to do what I'm used to in Rust in Python" is more likely to resonate than "I want to do what I'm used to in Haskell in Python"?
At the end of the day it doesn't *really* matter as the underlying construct being modeled is the same. It's the translation layer that I'm wondering about.
typedload does this without need to pass a "discriminator" parameter.
Just having the types with the same field defined as a literal of different things will suffice.
I've also implemented an algorithm to inspect the data and find out the type directly from the literal field, to avoid having to try multiple types when loading a union. Pydantic has also implemented the same strategy afterwards.
typedload is faster than pydantic to load tagged unions. It is written in pure python.
edit: Also, typedload just uses completely regular dataclasses or attrs. No need for all those different BaseModel, RootModel and understanding when to use them.
I've also played around with writing my own dataclass/data conversion library: https://github.com/hexane360/pane
https://github.com/adsharma/adt contains a small enhancement for @sealed decorator from the excellent upstream repo.
https://github.com/py2many/py2many/blob/main/tests/cases/sea... https://github.com/py2many/py2many/blob/main/tests/expected/...
More and more Java seems to be not that bad after all.
Related
Summary of Major Changes Between Python Versions
The article details Python updates from versions 3.7 to 3.12, highlighting async/await, Walrus operator, Type hints, F-strings, Assignment expressions, Typing enhancements, Structural Pattern Matching, Tomllib, and useful tools.
Beyond Hypermodern: Python is easy now
Python development in 2024 focuses on simplicity with tools like Rye aligning with packaging standards. It streamlines setup, dependency management, and project structuring, emphasizing typing with Pyright for efficient code maintenance and pytest for testing.
The algebra (and calculus) of algebraic data types
The relationship between algebraic data types (ADTs) and mathematical algebra is explored, emphasizing similarities in operations. Examples like Choice and binary trees illustrate how algebraic rules apply to ADTs, despite challenges with structures like Nat. Poking holes in data structures is introduced as a way to understand calculus on data types.
Higher-kinded bounded polymorphism in OCaml
Higher-kinded bounded polymorphism is crucial for generic operations and DSLs. OCaml lacks direct support but can simulate it through its module system, leading to complex and verbose code.
A Knownbits Abstract Domain for the Toy Optimizer, Correctly
The article details the Knownbits Abstract Domain's implementation in PyPy's Toy Optimizer, enhancing integer operation optimizations through bit analysis, property-based testing, and improving static analysis for efficient code generation.