# valgebra > Fast runtime validation through a complete Boolean algebra of schemas. Fast runtime validation through a closed Boolean algebra of schemas. A schema denotes a set of Python values; validation is membership, with no copy and no coercion. Schemas compile once into a Rust validator tree. Standard typing annotations are the primary notation; union, intersection, complement, refinement, and fixpoints compose them into a law-tested lattice. # Getting started # valgebra **Fast runtime validation through a closed Boolean algebra of schemas.** A schema denotes a *set of Python values*. Validation is membership: you ask whether the object you already hold belongs to the set — no copy, no coercion. Schemas compile once into a Rust validator tree, and the hot path crosses into Rust exactly once per call. ``` from typing import Annotated, TypedDict import annotated_types as at from valgebra import Validator class User(TypedDict): name: str age: Annotated[int, at.Ge(0)] users = Validator(User) assert users.is_valid({"name": "Ada", "age": 36}) assert not users.is_valid({"name": "Ada", "age": -1}) ``` Pre-alpha valgebra is under active development and published to PyPI. The APIs described here work today but may change before a stable `0.1.0` release. See the [changelog](https://github.com/ppigazzini/valgebra/blob/main/CHANGELOG.md) for what is built. ## What valgebra is for valgebra is a **contracts-and-checking** tool, not a parsing framework. Reach for it when you want to *check an object you already hold* against a composable, inspectable contract — cheaply enough to run on every request or every agent turn. For ingesting untrusted input into typed models with coercion and defaults, use **pydantic**; for the fastest deserialization into structs, use **msgspec**. valgebra occupies the niche neither covers: a closed, lawful **algebra** of schemas (union, intersection, complement, refinement, fixpoints) with **check-only** semantics, on a Rust core. ## What makes it different - **Schemas denote sets; validation is membership.** Subtyping is set inclusion and equivalence is mutual inclusion — decided soundly over a wide fragment and deliberately conservative beyond it. The whole model is one idea. - **A real Boolean algebra.** `union`, `intersection`, and `complement` compose any schema into a lattice whose laws are property-tested, with a law-justified [simplifier](https://ppigazzini.github.io/valgebra/algebra/index.md) that never changes a schema's value set. - **Typing-first.** Standard annotations are the primary notation, read through the typing spec's own introspection. - **Check, don't parse.** `validate` and `is_valid` never copy or coerce; `ensure` is the explicit, separate conversion mode. - **Few boundary crossings.** Tree walks, key lookups, and bound checks run in Rust; a comparison against a Python object — a literal, a refinement predicate, or an instance or attribute check — is the documented step into Python, never a silent fallback. - **JSON on the Rust path.** `validate_json` parses and validates JSON in Rust, consistent with the object path. - **Immutable and thread-safe** by design. Free-threaded CPython 3.14 is supported with a dedicated `cp314t` wheel where the release image exposes that interpreter. ## Where to go next - New here? Start with [installation](https://ppigazzini.github.io/valgebra/installation/index.md) and the [quickstart](https://ppigazzini.github.io/valgebra/quickstart/index.md). - Writing schemas? The [schema language](https://ppigazzini.github.io/valgebra/schema-language/index.md) reference covers every form with its denotation; [refinements](https://ppigazzini.github.io/valgebra/refinements/index.md) covers constraints, and [recursive schemas](https://ppigazzini.github.io/valgebra/recursion/index.md) the `recursive` fixpoint. - Composing them? See the [Boolean algebra](https://ppigazzini.github.io/valgebra/algebra/index.md). - What does it decide? The [decidability boundary](https://ppigazzini.github.io/valgebra/decidability/index.md) maps what subtyping, equivalence, and emptiness answer exactly and what stays conservative; the [foundations](https://ppigazzini.github.io/valgebra/foundations/index.md) record the theory. - Everything callable is in the [API reference](https://ppigazzini.github.io/valgebra/api/index.md). # Installation valgebra publishes prebuilt wheels to PyPI, so the common path needs no Rust toolchain. Building from source is the fallback for development or an unsupported platform; it is a Rust extension built with [maturin](https://www.maturin.rs/), which requires a Rust toolchain. ## From PyPI With **Python** 3.10 or newer: ``` pip install valgebra # or uv add valgebra ``` Wheels are published for Linux (manylinux and musllinux, x86_64 and aarch64), macOS (Intel and Apple silicon), Windows, and free-threaded CPython 3.14 where the release image exposes a `cp314t` interpreter. ## From source Building from source additionally requires: - A stable **Rust** toolchain (edition 2024, MSRV 1.88) via [rustup](https://rustup.rs/). - [**uv**](https://docs.astral.sh/uv/) (recommended) for the environment and the build. ``` git clone https://github.com/ppigazzini/valgebra && cd valgebra uv sync # create .venv and install the dev dependencies uv run maturin develop # build the Rust extension into the venv ``` ## Verify it works ``` import valgebra from valgebra import Validator print(valgebra.__version__) assert Validator(int).is_valid(7) ``` # Tutorial This is a guided first session with valgebra. You will start from a single scalar check and finish having validated a structured record — with a constraint, an optional field, a real failure you inspect, and a JSON document — building each step on the last. Every snippet runs as written; copy them in order. It assumes valgebra is [installed](https://ppigazzini.github.io/valgebra/installation/index.md). For the meaning of each form you meet here, the [schema language](https://ppigazzini.github.io/valgebra/schema-language/index.md) is the reference; this page is the path, not the catalogue. ## 1. Your first validator A schema denotes a *set of Python values*, and validating asks whether a value is in that set. Compile one with `Validator`, then ask with `is_valid`: ``` from valgebra import Validator is_int = Validator(int) assert is_int.is_valid(42) assert not is_int.is_valid("42") # a str is not in the set of ints ``` `Validator(int)` compiles once; call `is_valid` as often as you like. ## 2. Failing loudly `is_valid` returns a bool. When you want an exception instead, use `validate` — it raises `ValidationError` on a value outside the set: ``` from valgebra import ValidationError, Validator is_int = Validator(int) raised = False try: is_int.validate("42") except ValidationError: raised = True assert raised ``` You will read what a `ValidationError` carries in step 6. ## 3. Validating a collection Schemas nest. A `list[int]` is the set of lists whose every element is an `int`, and valgebra checks each element: ``` from valgebra import Validator numbers = Validator(list[int]) assert numbers.is_valid([1, 2, 3]) assert not numbers.is_valid([1, "two", 3]) # one bad element fails the list ``` ## 4. Describing a record Real data is usually structured. Write a record as a `TypedDict` — the standard typing form — and valgebra validates the shape and every field: ``` from typing import TypedDict from valgebra import Validator class User(TypedDict): name: str age: int users = Validator(User) assert users.is_valid({"name": "Ada", "age": 36}) assert not users.is_valid({"name": "Ada", "age": "old"}) # age must be an int ``` ## 5. Adding a constraint Plain types say *what kind*; a refinement says *which values*. Narrow a field with `Annotated` and an [annotated-types](https://ppigazzini.github.io/valgebra/refinements/index.md) marker — here, an age that cannot be negative: ``` from typing import Annotated, TypedDict import annotated_types as at from valgebra import Validator class User(TypedDict): name: str age: Annotated[int, at.Ge(0)] users = Validator(User) assert users.is_valid({"name": "Ada", "age": 36}) assert not users.is_valid({"name": "Ada", "age": -1}) # the bound holds ``` ## 6. Optional fields, and reading a failure A dict literal is a compact record: a trailing `?` on a key marks it optional, and the record is *closed* — an undeclared key is rejected. ``` from valgebra import Validator profile = Validator({"name": str, "age?": int}) assert profile.is_valid({"name": "Ada"}) # age omitted: fine assert profile.is_valid({"name": "Ada", "age": 36}) assert not profile.is_valid({"name": "Ada", "extra": 1}) # closed: no extra keys ``` When a check fails, `validate` raises a `ValidationError` that tells you exactly *what* failed and *where* — a machine-readable `code` and the `path` to the offending value, even deep in a nested structure: ``` from valgebra import ValidationError, Validator schema = Validator({"user": {"name": str}}) try: schema.validate({"user": {"name": 5}}) except ValidationError as err: assert err.code == "string_type" assert err.path == ("user", "name") ``` The [error model](https://ppigazzini.github.io/valgebra/error-model/index.md) covers aggregation and union reporting. ## 7. Validating JSON You do not have to parse first. `validate_json` and `is_valid_json` read JSON on the Rust path and run the very same checks, so a document is judged exactly as `json.loads` of it would be: ``` from valgebra import Validator users = Validator({"name": str, "age?": int}) users.validate_json('{"name": "Ada", "age": 36}') # passes, raises nothing assert Validator(list[int]).is_valid_json("[1, 2, 3]") # str or bytes ``` ## Where to go next You can now check scalars, collections, records, constraints, failures, and JSON. From here: - The [schema language](https://ppigazzini.github.io/valgebra/schema-language/index.md) lists every form and the set it denotes. - [Refinements](https://ppigazzini.github.io/valgebra/refinements/index.md) covers the full constraint vocabulary. - The [Boolean algebra](https://ppigazzini.github.io/valgebra/algebra/index.md) composes schemas with `union`, `intersection`, and `complement`, and the [foundations](https://ppigazzini.github.io/valgebra/foundations/index.md) explain the theory. - [Recursive schemas](https://ppigazzini.github.io/valgebra/recursion/index.md) handle trees and linked structures with `recursive`. # Quickstart This is the condensed tour — a one-line scalar check through a refined `TypedDict` to a composed contract — for a reader who wants valgebra at a glance. For a guided, step-by-step path, start with the [tutorial](https://ppigazzini.github.io/valgebra/tutorial/index.md) instead. Every snippet runs as written. ## Compile once, check many `Validator(schema)` compiles a schema into an immutable validator. Reuse it: ``` from valgebra import Validator is_int = Validator(int) assert is_int.is_valid(42) assert not is_int.is_valid("42") ``` A validator has three entry points: - `is_valid(obj)` returns a `bool` (the fast path). Since validation is set membership, `obj in validator` is the same check written as an operator. - `validate(obj)` returns `None` or raises `ValidationError`. - `ensure(obj)` validates and returns the object unchanged (the explicit, separate value-returning mode — validation is a membership check, so there is nothing to convert). ``` from valgebra import Validator ints = Validator(list[int]) assert [1, 2, 3] in ints # membership, the operator form of is_valid assert ["x"] not in ints ``` ## Schemas are standard annotations The primary notation is the typing you already write: ``` from typing import Literal from valgebra import Validator assert Validator(list[int]).is_valid([1, 2, 3]) assert Validator(dict[str, int]).is_valid({"a": 1}) assert Validator(tuple[int, ...]).is_valid((1, 2, 3)) assert Validator(int | None).is_valid(None) assert Validator(Literal["red", "green"]).is_valid("red") ``` ## Refinements with `Annotated` Constraints attach to a type with `Annotated` and the [annotated-types](https://pypi.org/project/annotated-types/) markers: ``` from typing import Annotated import annotated_types as at from valgebra import Validator adult = Validator(Annotated[int, at.Ge(18), at.Le(150)]) assert adult.is_valid(21) assert not adult.is_valid(5) ``` ## Classes compile too `TypedDict`, dataclasses, `NamedTuple`, enums, and runtime-checkable protocols all compile, and refinements on their fields are enforced: ``` from typing import Annotated, TypedDict import annotated_types as at from valgebra import Validator class User(TypedDict): name: str age: Annotated[int, at.Ge(0)] users = Validator(User) assert users.is_valid({"name": "Ada", "age": 36}) assert not users.is_valid({"name": "Ada", "age": -1}) ``` ## Compose with the algebra Any schema combines with `union`, `intersection`, and `complement`: ``` from valgebra import complement, intersection, Validator # an int that is not a bool strict_int = intersection(int, complement(bool)) assert strict_int.is_valid(5) assert not strict_int.is_valid(True) ``` ## Handle failures A failure raises `ValidationError` carrying a machine-readable `code`, the `path` to the offending value, and a summary: ``` from valgebra import ValidationError, Validator try: Validator({"user": {"name": str}}).validate({"user": {"name": 5}}) except ValidationError as err: assert err.code == "string_type" assert err.path == ("user", "name") ``` ## Validate JSON directly `validate_json` parses and checks JSON on the Rust path, reaching the same decision as validating the parsed object: ``` from valgebra import Validator users = Validator({"name": str, "age?": int}) users.validate_json('{"name": "Ada", "age": 36}') assert Validator(list[int]).is_valid_json(b"[1, 2, 3]") ``` # Guides # Schema language A schema denotes a **set of Python values**. This page lists every form valgebra reads and the set it denotes. The primary notation is standard typing; compact native forms and the combinators are alternatives for the same sets. ## Scalars | Schema | Denotes | | ------- | ---------------------- | | `int` | every `int` instance | | `float` | every `float` instance | | `str` | every `str` instance | | `bytes` | every `bytes` instance | | `bool` | `{True, False}` | | `None` | `{None}` | The set relationships follow Python's own, exactly: ``` from valgebra import Validator # bool is a subclass of int, so True and False are ints assert Validator(int).is_valid(True) # int does not subclass float, so an int is not a float assert not Validator(float).is_valid(1) assert Validator(float).is_valid(1.0) ``` ## `Any` versus `object` `object` is the **top** of the lattice (`anything`): every value. `Any` is the gradual dynamic type — at runtime it also admits every value, but it is a distinct atom that the [simplifier](https://ppigazzini.github.io/valgebra/algebra/index.md) never rewrites, preserving "deliberately unchecked" as different from "checked: all admitted". ``` from typing import Any from valgebra import Validator assert Validator(object).is_valid(["anything", 1, None]) assert Validator(Any).is_valid(object()) ``` ## Collections | Schema | Denotes | | ------------------ | ----------------------------------------------------- | | `list[T]` | lists whose every element is in `T` | | `set[T]` | sets whose every element is in `T` | | `frozenset[T]` | frozensets whose every element is in `T` | | `dict[K, V]` | dicts whose keys are in `K` and values in `V` | | `tuple[A, B]` | length-2 tuples with `A` then `B` | | `tuple[T, ...]` | tuples of any length, every element in `T` | | `tuple[A, B, ...]` | a fixed prefix `A`, then zero or more `B` (see below) | ``` from valgebra import Validator assert Validator(list[int]).is_valid([1, 2, 3]) assert Validator(dict[str, int]).is_valid({"a": 1}) assert Validator(tuple[int, str]).is_valid((1, "a")) assert Validator(tuple[int, ...]).is_valid((1, 2, 3)) assert Validator(tuple[str, int, ...]).is_valid(("x", 1, 2)) ``` ## Native forms A native form exists only where standard typing **cannot** spell the set: the list literal carries the sequence shapes typing has no syntax for. Everything a typing annotation already expresses is written that way — `set[T]`, not `{T}`; `tuple[A, B]`, not `(A, B)`; both literals are rejected with a message pointing to the typing spelling. | Native form | Denotes | | ------------------------ | ------------------------------------------------------------------------------ | | `[T]` | `list[T]` — a homogeneous list (the single-element idiom) | | `[T, ...]` | `list[T]` — homogeneous, written with the tail marker | | `[A, B]` | a **fixed-length list**, matched positionally (`list[A, B]` is illegal typing) | | `[A, B, ...]` | a fixed prefix, then a repeated tail (see below) | | `{K: V}` | `dict[K, V]` | | `{"key": T, "key2?": T}` | a **record** (see below) | | any constant `c` | `Literal[c]` | ``` from valgebra import Validator assert Validator([int]).is_valid([1, 2]) # homogeneous list[int] assert Validator([int, str]).is_valid([1, "a"]) # fixed-length list assert not Validator([int, str]).is_valid([1]) # wrong length assert Validator({str: int}).is_valid({"a": 1}) # dict[str, int] assert Validator("active").is_valid("active") # the literal "active" ``` A **fixed-length list** is matched positionally: element `i` must satisfy the `i`th schema and the length must match. typing cannot spell it (`list[A, B]` is illegal), which is the reason the list literal carries the shape; a fixed-length *tuple* is the typing `tuple[A, B]`, and the container is part of the type, so a list is never a member of the tuple form and vice versa. ### Prefix and repeated tail A sequence schema is, in general, a **regular expression over element types**: a fixed positional prefix followed by an optional repeated tail. A trailing `...` repeats the element just before it, so `[T, ...]` (any number of `T`) is the prefix-free case. The same shape is available for tuples with `tuple[A, B, ...]`; the container is part of the type, so a tuple is never a member of the list form and vice versa. | Form | Denotes | | ------------------ | -------------------------------------- | | `[A, B, ...]` | a list: an `A`, then zero or more `B` | | `[T, T, ...]` | a non-empty list of `T` (at least one) | | `tuple[A, B, ...]` | a tuple: an `A`, then zero or more `B` | ``` from valgebra import Validator prefixed = Validator([str, int, ...]) # a str, then zero or more ints assert prefixed.is_valid(["x"]) assert prefixed.is_valid(["x", 1, 2]) assert not prefixed.is_valid([1]) # the prefix must be a str non_empty = Validator([int, int, ...]) # at least one int assert non_empty.is_valid([1]) assert not non_empty.is_valid([]) tup = Validator(tuple[str, int, ...]) # the same shape, as a tuple assert tup.is_valid(("x", 1, 2)) assert not tup.is_valid(["x", 1, 2]) # a list is not a member of the tuple form ``` ## Literals `Literal[...]` denotes a typed singleton: a value is a member iff it has the **same type** as the literal and is equal to it. The same-type rule keeps `Literal[1]`, `Literal[True]`, and `Literal[1.0]` distinct, even though Python's `==` conflates them: ``` from typing import Literal from valgebra import Validator assert Validator(Literal[1]).is_valid(1) assert not Validator(Literal[1]).is_valid(True) assert not Validator(Literal[1]).is_valid(1.0) ``` ## Unions and `Optional` `X | Y` and `Optional[X]` denote the union of the member sets: ``` from typing import Optional from valgebra import Validator assert Validator(int | str).is_valid("x") assert Validator(Optional[int]).is_valid(None) ``` ## Records A dict literal with all-string keys is a **record**: named fields, closed by default. A required field's key must be present with a matching value; a trailing `?` on the key name marks it optional. A closed record admits no key outside the declared names. ``` from valgebra import Validator user = Validator({"name": str, "age?": int}) assert user.is_valid({"name": "Ada"}) # optional key absent assert user.is_valid({"name": "Ada", "age": 36}) assert not user.is_valid({"name": "Ada", "x": 1}) # closed: no extra keys ``` Open the record with `open` (undeclared keys admitted) or re-close it with `close`: ``` from valgebra import Validator closed = Validator({"name": str}) assert not closed.is_valid({"name": "Ada", "extra": 1}) assert closed.open().is_valid({"name": "Ada", "extra": 1}) ``` ### Heterogeneous maps and catch-alls A dict schema's string keys are named fields; any *other* key is a schema that keys a default clause for the rest. One form therefore expresses records, mappings, and their combination: several schema keys give a **heterogeneous map** whose value type depends on which key schema matches, and named fields plus a schema key give a record with a **typed catch-all**. Named fields take precedence over the catch-all. ``` from valgebra import Validator # str keys map to ints, int keys map to strs hetero = Validator({str: int, int: str}) assert hetero.is_valid({"a": 1, 2: "b"}) assert not hetero.is_valid({"a": "x"}) # a str key needs an int value # a record whose every other key must be an int extensible = Validator({"name": str, str: int}) assert extensible.is_valid({"name": "Ada", "age": 36}) assert not extensible.is_valid({"name": "Ada", "age": "old"}) ``` ## Classes | Form | How it validates | | ---------------------------- | ------------------------------------------------------------------------ | | `TypedDict` | a record; required keys from the class, `Required`/`NotRequired` honored | | dataclass | `isinstance` plus a deep check of each field | | `NamedTuple` | `isinstance` plus a deep check of each field | | `Enum` | an instance of the enumeration (any member) | | runtime-checkable `Protocol` | `isinstance` against the protocol | | `NewType` | validates the supertype it wraps | | PEP 695 `type` alias | validates the aliased type | ``` import enum from dataclasses import dataclass from valgebra import Validator class Color(enum.Enum): RED = 1 GREEN = 2 @dataclass class Point: x: int y: int assert Validator(Color).is_valid(Color.RED) assert Validator(Point).is_valid(Point(1, 2)) assert not Validator(Point).is_valid(Point(1, "y")) ``` Recursive classes A class whose own type appears in a field (a tree node, a linked list) is recursive and cannot compile directly — express it with [`recursive`](https://ppigazzini.github.io/valgebra/recursion/index.md), which ties the fixpoint explicitly. Bare classes, callables, and the runtime boundary A bare class is an `isinstance` check: `Validator(complex)` admits any `complex`, and any user class admits its instances. `Callable` (and `Callable[...]`) checks only that the value is callable — the argument and return types cannot be inspected at runtime, so they are not enforced. `Any` is admitted unchecked. Everything else is decided structurally: a `list[int]` schema does check each element. ## Refinements `Annotated[T, ...markers]` narrows `T` with constraints — bounds, lengths, multiples, and predicates. See the [refinements guide](https://ppigazzini.github.io/valgebra/refinements/index.md). ## Stable repr A compiled validator prints back as the annotation that produces it, which makes schemas inspectable: ``` from valgebra import Validator assert repr(Validator(list[dict[str, int]])) == "list[dict[str, int]]" assert repr(Validator({"name": str, "age?": int})) == "{'name': str, 'age?': int}" ``` # The Boolean algebra Union, intersection, and complement compose any schema — annotations, native forms, or other compiled validators — into a closed, lawful Boolean lattice. `anything` is the top (every value) and `nothing` is the bottom (no value). The typing-native spellings work too: `object` is the top and `Never` (or `NoReturn`) is the bottom, so `Validator(object)` equals `anything` and `Validator(Never)` equals `nothing`. ``` from valgebra import anything, complement, intersection, nothing, union assert union(int, str).is_valid("x") # a value in either set assert intersection(int, complement(bool)).is_valid(5) # ints that are not bools assert not intersection(int, complement(bool)).is_valid(True) assert complement(nothing).is_valid(5) # the complement of bottom is top assert not nothing.is_valid(5) # bottom admits nothing assert anything.is_valid(object()) # top admits everything ``` The combinators accept any schema spec, so they nest and mix freely: ``` from valgebra import complement, union, Validator color = union("red", "green", "blue") # union of three literals assert color.is_valid("red") assert not color.is_valid("teal") not_empty_text = complement(union("", b"")) # not the empty str or bytes assert not_empty_text.is_valid("x") assert not not_empty_text.is_valid("") ``` Union has an operator form, `|` — the same spelling typing uses for unions — so a compiled validator joins with another schema directly. Intersection and complement stay spelled out (typing has no operator for them, and valgebra invents none): ``` from valgebra import Validator, union assert (Validator(int) | str | None).is_equivalent(union(int, str, None)) # `|` works in either order: a validator on the right is the reflected operand. assert (int | Validator(str)).is_equivalent(union(int, str)) ``` ## The laws hold Because membership is Boolean and the combinators are exactly *or*, *and*, and *not*, every Boolean-algebra law holds — commutativity, associativity, idempotence, absorption, identities, distributivity, De Morgan, and double negation. These are property-tested in both Rust and Python against the membership relation, not asserted. The model — schemas as value-sets, subtyping as set inclusion, full union, intersection, and complement — is *semantic subtyping*. The [foundations](https://ppigazzini.github.io/valgebra/foundations/index.md) page records the theory and its references, and states where the simplifier decides relationships versus where it stays conservative. ## Composition recipes valgebra ships only the irreducible algebra; common patterns that reduce to it are recipes you compose, not combinators it bundles. The algebra expressing them is the point — a named wrapper for a one-line composition would be a standard library, not a schema algebra. ### Conditional fields "Condition implies consequent" is a `union` of two intersections: a value either matches the condition and must then satisfy the consequent, or fails the condition and must satisfy the alternative (`anything` by default): ``` from typing import Annotated import annotated_types as at from valgebra import anything, complement, intersection, union def implies(condition, then, otherwise=anything): return union( intersection(condition, then), intersection(complement(condition), otherwise), ) non_negative_if_int = implies(int, Annotated[int, at.Ge(0)]) assert non_negative_if_int.is_valid(5) assert not non_negative_if_int.is_valid(-1) assert non_negative_if_int.is_valid("not an int") # not an int: admitted ``` First-matching-case dispatch nests `implies` from the last case inward, so the earliest matching condition selects its consequent: ``` from typing import Annotated import annotated_types as at from valgebra import anything, complement, intersection, nothing, union, Validator # the same implies helper as above, repeated so this example runs on its own def implies(condition, then, otherwise=anything): return union( intersection(condition, then), intersection(complement(condition), otherwise), ) def first_match(*cases, default=anything): result = Validator(default) for condition, then in reversed(cases): result = implies(condition, then, result) return result shape = first_match( (str, Annotated[str, at.MinLen(1)]), (int, Annotated[int, at.Ge(0)]), default=nothing, ) assert shape.is_valid("ok") assert shape.is_valid(5) assert not shape.is_valid("") assert not shape.is_valid(1.5) # matches no case, falls to the default ``` ### Key cardinality "At least one of these keys is present", and its siblings, are also algebra. A record that merely asserts a key is present is an open record requiring it — `Validator({key: anything}).open()` — and the cardinality follows from `union`, `intersection`, and `complement`: ``` from valgebra import anything, complement, intersection, union, Validator def has(key): return Validator({key: anything}).open() at_least_one = union(has("a"), has("b")) assert at_least_one.is_valid({"a": 1}) assert at_least_one.is_valid({"b": 2, "x": 0}) assert not at_least_one.is_valid({"x": 0}) at_most_one = complement(intersection(has("a"), has("b"))) # not both assert at_most_one.is_valid({"a": 1}) assert at_most_one.is_valid({}) assert not at_most_one.is_valid({"a": 1, "b": 2}) exactly_one = union( intersection(has("a"), complement(has("b"))), intersection(has("b"), complement(has("a"))), ) assert exactly_one.is_valid({"a": 1}) assert not exactly_one.is_valid({"a": 1, "b": 2}) assert not exactly_one.is_valid({}) ``` ### Fixed-length and length-bounded lists The native list literal spells the sequence shapes typing cannot: `[A, B]` is a fixed-length list (positional) and `[]` the empty list. ``` from valgebra import Validator pair = Validator([int, str]) # exactly two elements: an int then a str assert pair.is_valid([1, "a"]) assert not pair.is_valid([1]) # wrong length assert not pair.is_valid((1, "a")) # a tuple is not a member of the list form ``` The single-element `[x]` is deliberately the **homogeneous** "list of x" (any length), following Python's `list[T]` — so a fixed-length-*one* list, and any length bound, is a refinement of the homogeneous list, not a separate form. This is the refinement type `{ x ∈ list[T] | len bound }`, written with `Annotated`: ``` from typing import Annotated import annotated_types as at from valgebra import Validator # a list of exactly one int (what `[int]`, being homogeneous, does not mean) one_int = Validator(Annotated[list[int], at.Len(1, 1)]) assert one_int.is_valid([1]) assert not one_int.is_valid([]) assert not one_int.is_valid([1, 2]) # a non-empty list, and an at-most-three list: the same length-refinement family non_empty = Validator(Annotated[list[int], at.MinLen(1)]) assert non_empty.is_valid([1, 2]) and not non_empty.is_valid([]) small = Validator(Annotated[list[int], at.MaxLen(3)]) assert small.is_valid([1, 2, 3]) and not small.is_valid([1, 2, 3, 4]) ``` ## The simplifier `simplify` is a method that reduces a schema by the lattice laws while admitting **exactly the same values**. It flattens nested unions and intersections, drops duplicates and identities, and pushes complements to negation-normal form: ``` from valgebra import complement, union assert repr(complement(complement(int)).simplify()) == "int" assert repr(union(int, int).simplify()) == "int" ``` A refinement is reduced to a single normal form too: nested refinements flatten onto their shared base, and the constraint list is sorted and deduplicated, so a repeated or reordered constraint does not change the result: ``` from typing import Annotated from annotated_types import Ge from valgebra import Validator assert repr(Validator(Annotated[int, Ge(0), Ge(0)]).simplify()) == "Annotated[int, Ge(0)]" ``` It also decides the **complement laws** and provable **disjointness**: a schema met with its complement, or with a provably disjoint type, is empty; a schema joined with its complement, or with the complement of a disjoint type, is everything. ``` from valgebra import complement, intersection, union assert repr(intersection(int, complement(int)).simplify()) == "nothing" assert repr(union(int, complement(int)).simplify()) == "anything" assert repr(intersection(int, str).simplify()) == "nothing" # disjoint types ``` The simplifier folds the scalar Boolean fragment — the builtin scalars (with `bool` a subtype of `int`) and the complement laws — so `intersection(int, str).simplify()` is `nothing`. It never treats `Any` as the top, so a deliberately-unchecked schema is preserved. The comparison operators below decide a wider fragment than the simplifier folds; the [decidability boundary](https://ppigazzini.github.io/valgebra/decidability/index.md) maps exactly what is decided. `simplify` applies the lattice laws only; it does not run the emptiness decision. An intersection that is empty by a deeper argument — contradictory refinement bounds, for instance — is left as written, even though `is_empty` reports it empty. So a simplified schema is a lattice normal form, not a fully reduced one: use `is_empty`, `is_subtype_of`, and `is_equivalent` to decide membership relations rather than reading them off the simplified structure. ``` from typing import Annotated from annotated_types import Ge, Le from valgebra import Validator, intersection contradiction = intersection(Annotated[int, Ge(10)], Annotated[int, Le(0)]) assert contradiction.is_empty() # decided empty assert repr(contradiction.simplify()) != "nothing" # but simplify leaves it ``` ## Subtyping, equivalence, and emptiness A compiled validator can be compared with another schema as *sets*. `is_subtype_of` is set inclusion, `is_equivalent` is mutual inclusion, and `is_empty` reports an unsatisfiable schema: ``` from valgebra import complement, intersection, union, Validator # subtyping is set inclusion; bool is a subtype of int assert Validator(bool).is_subtype_of(int) assert not Validator(int).is_subtype_of(bool) assert Validator(list[bool]).is_subtype_of(list[int]) # equivalence is mutual inclusion, whatever the syntax assert union(bool, int).is_equivalent(int) # emptiness detects a schema no value can satisfy assert intersection(int, complement(int)).is_empty() assert not Validator(int).is_empty() ``` `is_equivalent` is **semantic**: it compares the value sets, however the two schemas are spelled. Keep it distinct from `==` on validators, which is **syntactic** — `==` compares schema *shape*, so two schemas that denote the same set but are written differently are equal under `is_equivalent` yet not under `==`. Ask `is_equivalent` "do these mean the same set?" and `==` "are these the same shape?". ``` from valgebra import Validator, union assert union(bool, int).is_equivalent(int) # same set: bool is a subtype of int assert union(bool, int) != Validator(int) # different shape ``` The other side of a comparison is any schema spec or compiled validator. These decisions are **sound**: a `True` is always correct, and a `False` (or a "not empty") is either a genuine non-relation or a relation valgebra does not yet prove — never a wrong answer. They decide completely over a wide fragment — the scalar Boolean algebra, class and literal inclusion, refinements with bound and length constraints, prefix-and-tail sequences, sets and frozensets, records and mappings (including multi-clause mixed maps, and mixed maps where the supertype's extra field is optional and the subtype's catch-all covers it), and recursion — and stay conservative on the rest. The [decidability boundary](https://ppigazzini.github.io/valgebra/decidability/index.md) lists exactly what is decided, what is conservative, and what is undecidable at runtime; see the [foundations](https://ppigazzini.github.io/valgebra/foundations/index.md) for the theory. ## `Any` versus `anything` Both admit every value at runtime, but they are different in the algebra: - `anything` is the lattice **top**. It obeys the laws — `complement(anything)` is `nothing`, `intersection(anything, s)` is `s`. - `Any` is the gradual dynamic type, an **atom** the simplifier never rewrites. ``` from typing import Any from valgebra import Validator, anything, complement assert repr(complement(anything).simplify()) == "nothing" assert repr(Validator(Any).simplify()) == "Any" # left untouched ``` This keeps "checked: every value is admitted" (`anything`) distinct from "deliberately not checked" (`Any`). # Refinements A refinement narrows a base type to the subset satisfying one or more constraints. Write it with `Annotated[T, ...markers]`; the base `T` is checked first, then each constraint. valgebra reads the [annotated-types](https://pypi.org/project/annotated-types/) markers structurally, so it has no runtime dependency on that library. ``` from typing import Annotated import annotated_types as at from valgebra import Validator adult = Validator(Annotated[int, at.Ge(18), at.Le(150)]) assert adult.is_valid(21) assert not adult.is_valid(5) ``` Refinements built from bound and length markers also take part in the [decision procedure](https://ppigazzini.github.io/valgebra/decidability/index.md): a refinement is a subtype of its base and of a looser refinement, and a bound conjunction that cannot be satisfied is detected as empty. ``` from typing import Annotated import annotated_types as at from valgebra import Validator assert Validator(Annotated[int, at.Ge(0)]).is_subtype_of(int) # refinement <= base assert Validator(Annotated[int, at.Ge(0), at.Le(10)]).is_subtype_of( Annotated[int, at.Ge(0)] # a tighter bound is a subtype of a looser one ) assert Validator(Annotated[int, at.Ge(10), at.Le(0)]).is_empty() # no such int ``` A predicate marker is checked at validation time but stays opaque to subtyping and emptiness — its satisfiability is undecidable in general. ## Supported markers | Marker | Constraint | Failure code | | --------------- | -------------------------------------- | ------------------------- | | `Ge(n)` | `value >= n` | `greater_than_equal` | | `Gt(n)` | `value > n` | `greater_than` | | `Le(n)` | `value <= n` | `less_than_equal` | | `Lt(n)` | `value < n` | `less_than` | | `MinLen(n)` | `len(value) >= n` | `too_short` | | `MaxLen(n)` | `len(value) <= n` | `too_long` | | `MultipleOf(n)` | `value % n == 0` | `multiple_of` | | `Regex(p)` | the string fully matches the regex `p` | `string_pattern_mismatch` | | `Predicate(f)` | `f(value)` is truthy | `predicate_failed` | `Regex` is valgebra's own marker (`from valgebra import Regex`), since `annotated-types` defines none for strings. The match is **anchored** — the whole string must match, like `re.fullmatch` — and runs natively in Rust with a linear-time engine (no catastrophic backtracking), so unlike a `Predicate` it stays on the fast path and never crosses into Python per value. An invalid pattern is rejected when the validator is built, not at first use. A compiled `re.Pattern` works as metadata too: ``` import re from typing import Annotated from valgebra import Regex, Validator oid = Validator(Annotated[str, Regex(r"[0-9a-f]{24}")]) assert oid.is_valid("0123456789abcdef01234567") assert not oid.is_valid("0123456789abcdef0123456X") # not hex assert not oid.is_valid("0123") # not the full 24 characters assert Validator(Annotated[str, re.compile(r"\d+")]).is_valid("123") ``` The compound markers `Interval` and `Len` expand to the bounds they carry, so `Interval(ge=0, le=10)` contributes `Ge(0)` and `Le(10)`, and `Len(2, 4)` contributes `MinLen(2)` and `MaxLen(4)`: ``` from typing import Annotated import annotated_types as at from valgebra import Validator assert Validator(Annotated[int, at.Interval(ge=0, le=10)]).is_valid(5) assert not Validator(Annotated[int, at.Interval(ge=0, le=10)]).is_valid(11) assert Validator(Annotated[str, at.Len(2, 4)]).is_valid("abc") assert not Validator(Annotated[str, at.Len(2, 4)]).is_valid("a") assert Validator(Annotated[int, at.MultipleOf(3)]).is_valid(9) assert not Validator(Annotated[int, at.MultipleOf(3)]).is_valid(5) ``` ## Predicates: the slow path A `Predicate` runs an arbitrary Python callable. It is the one place validation leaves Rust, so it is a **documented slow path**, never a silent fallback. Use it for checks the markers cannot express: ``` from typing import Annotated import annotated_types as at from valgebra import Validator even = Validator(Annotated[int, at.Predicate(lambda x: x % 2 == 0)]) assert even.is_valid(4) assert not even.is_valid(3) ``` A predicate that *raises* is reported distinctly, as `predicate_error` rather than an ordinary failure, so a buggy predicate is not mistaken for a rejected value. ## On classes Refinements declared on a `TypedDict`, dataclass, or `NamedTuple` field are enforced — the constraint travels with the field: ``` from typing import Annotated, TypedDict import annotated_types as at from valgebra import Validator class Account(TypedDict): balance: Annotated[int, at.Ge(0)] assert Validator(Account).is_valid({"balance": 100}) assert not Validator(Account).is_valid({"balance": -1}) ``` ## Unrecognized markers Per the typing spec, metadata valgebra does not recognize as a constraint is ignored — so non-constraint `Annotated` metadata (documentation strings, unit markers) is harmless and carries no membership meaning. ``` from typing import Annotated from valgebra import Validator assert repr(Validator(Annotated[int, "a documentation note"])) == "int" ``` # Recursive schemas `recursive` ties a fixpoint: the builder it receives is given a placeholder standing for the schema being defined, and returns the body. The recursive reference must occur under a structural constructor (a list, tuple, set, dict, record, or object) so membership stays decidable; a non-contractive body is rejected when the validator is built. ## A recursive JSON value ``` from valgebra import recursive, union json_value = recursive( lambda j: union(None, bool, int, float, str, [j], {str: j}), ) assert json_value.is_valid({"a": [1, "x", {"b": None}], "c": [True, 3.5]}) assert not json_value.is_valid({"a": object()}) ``` ## A tree, then composed A `recursive` schema is an ordinary validator and composes like any other: ``` from valgebra import recursive, Validator tree = recursive(lambda t: {"value": int, "left?": t, "right?": t}) assert tree.is_valid({"value": 1, "left": {"value": 2}}) forest = Validator([tree]) assert forest.is_valid([{"value": 1}, {"value": 2, "right": {"value": 3}}]) ``` ## Why classes need it A class whose own type appears in a field is recursive in the same way, but a class definition has no place to tie the fixpoint. Compiling such a class directly is rejected with a message pointing here; model it with `recursive` instead: ``` from valgebra import recursive # instead of a self-referential @dataclass Node, write the shape with recursive: node = recursive(lambda n: {"value": int, "next?": n}) assert node.is_valid({"value": 1, "next": {"value": 2}}) ``` ## Soundness guarantees Recursion is bounded so it always terminates cleanly: - A value that **contains itself** is rejected with `recursion_loop` rather than looping forever (an object-identity guard). - A value nested **past a fixed depth** fails with `recursion_limit` rather than overflowing the native stack. - A **non-contractive** body — one whose recursive reference is not under a structural constructor — is rejected when the validator is built, not at validation time. ``` from valgebra import recursive, union cyclic = [] cyclic.append(cyclic) assert not recursive(lambda s: union(int, [s])).is_valid(cyclic) # recursion_loop ``` ## Recursion in the decision procedure Recursive schemas also take part in [subtyping, equivalence, and emptiness](https://ppigazzini.github.io/valgebra/decidability/index.md). Equirecursive schemas compare at their greatest fixpoint — a coinductive comparison that assumes a goal already being proven on the current path — so a recursive schema is a subtype of itself and two structurally identical recursive schemas are equivalent, and a recursive schema with no base case is detected as uninhabited. These are two views of one definition, not two definitions. *Membership* unfolds the unique guarded (contractive) fixpoint against a finite value — a value is in the set when its finite unfolding matches. *Comparison* uses the greatest fixpoint coinductively, which is the sound way to relate two such definitions without unfolding forever. On the inhabitants the two agree: the greatest fixpoint admits exactly the values the guarded unfolding accepts, so a subtype result never contradicts membership. ``` from valgebra import recursive, union, Validator json_value = recursive(lambda j: union(None, bool, int, float, str, [j], {str: j})) assert Validator(json_value).is_subtype_of(json_value) # reflexive across the fixpoint assert recursive(lambda t: {"value": int, "next": t}).is_empty() # no base case assert not recursive(lambda t: union(None, {"next": t})).is_empty() # a base case exists ``` # JSON input A compiled validator validates JSON source directly, parsing on the Rust path: ``` from valgebra import Validator users = Validator({"name": str, "age?": int}) users.validate_json('{"name": "Ada", "age": 36}') # passes, returns None assert users.is_valid_json('{"name": "Ada"}') # optional key absent assert not users.is_valid_json('{"name": 5}') # name is not a str # bytes input is accepted too assert Validator(list[int]).is_valid_json(b"[1, 2, 3]") ``` `validate_json(data, *, fail_fast=False)` mirrors `validate`: it raises `ValidationError` on failure and aggregates every independent failure by default. `is_valid_json(data)` mirrors `is_valid`: it returns a bool and never raises. Both accept a JSON `str` or `bytes`. When you need the data, not just the verdict, `load` validates and **returns the parsed value**, so it is not parsed twice: ``` from valgebra import Validator users = Validator({"name": str, "age?": int}) record = users.load('{"name": "Ada", "age": 36}') assert record == {"name": "Ada", "age": 36} ``` `load(data, *, fail_fast=False)` raises `ValidationError` on malformed JSON or a non-member, exactly as `validate_json` does, and otherwise returns the parsed object. ## Same decisions as the object path The JSON path parses the document into a Python value and runs the **same** validation walk as a native object. So validating a JSON document is exactly validating `json.loads` of that document — the same accept/reject decision, the same error codes, and the same paths: ``` import json from valgebra import Validator v = Validator(list[dict[str, int]]) doc = '[{"a": 1}, {"b": "x"}]' assert v.is_valid_json(doc) == v.is_valid(json.loads(doc)) ``` This equivalence is locked by tests over a corpus spanning the JSON value model. ## JSON-to-Python value mapping Parsing uses jiter (the parser pydantic-core uses) with the standard JSON model, so a document maps to Python values exactly as the standard library's `json` module produces them: | JSON | Python | Matches schema | | ----------------------------------------------- | ------- | --------------------------------------------- | | `null` | `None` | `None` | | `true` / `false` | `bool` | `bool` (and `int`, since `bool` is a subtype) | | number, no fraction or exponent (`42`) | `int` | `int`, not `float` | | number with fraction or exponent (`4.2`, `1e3`) | `float` | `float`, not `int` | | string | `str` | `str` | | array | `list` | `list[...]`, fixed lists, tuples | | object | `dict` | records and mappings | Two consequences follow from valgebra's value-set semantics: ``` from valgebra import Validator # JSON 42 is an int, and float is disjoint from int, so it is not a float assert not Validator(float).is_valid_json("42") assert Validator(float).is_valid_json("42.0") # JSON true is a bool, and bool is a subtype of int assert Validator(int).is_valid_json("true") ``` `Infinity` and `NaN` are not valid JSON and are rejected as malformed; whole numbers too large for a machine integer still parse to a Python `int`. ## Malformed JSON Unparseable input never reaches the validation walk. `validate_json` reports it through the same structured error model as any other failure — a single `errors` item coded `json_invalid` carrying the parser's diagnostic — and `is_valid_json` treats it as a non-member: ``` from valgebra import ValidationError, Validator v = Validator(int) assert not v.is_valid_json("{ not json") try: v.validate_json("{ not json") except ValidationError as err: assert err.code == "json_invalid" ``` A non-`str`, non-`bytes` argument is a `TypeError`, not a validation failure. ## Performance `is_valid_json` parses with jiter and validates the parsed JSON value **in place**: no intermediate Python objects are built for the structure it walks, so membership of a large array or a deep document is decided in Rust. A comparison against a Python object — a literal, a refinement predicate, or an instance or attribute check — is the documented step back into Python (detailed below). The same walk runs over either input source — a Python object or a JSON value — so the two paths stay equivalent. On the benchmark machine (AMD Ryzen 7 PRO 7840U, WSL2, CPython 3.14.6, jiter 0.16, the PGO release wheel — the same profile the release ships), per-call median on a passing document: | Shape | `is_valid_json` | `json.loads` + `is_valid` | speedup | | ---------------------------- | --------------- | ------------------------- | ------- | | Record, 50 int fields | 3.7 us | 6.5 us | ~1.8x | | List of 200 small records | 27.3 us | 40.6 us | ~1.5x | | `list[int]`, 10,000 elements | 105 us | 501 us | ~4.8x | Avoiding materialization helps most where the document is large or scalar-heavy: the 10,000-element array is nearly five times faster than parse-then-validate and well over twice as fast as a strict pydantic adapter on the same input. Nodes that compare against a Python object — literals, refinements, instance and object checks, and predicates — materialize just the value at that node, since the comparison runs in Python. The `validate_json` explain path still materializes the whole document (it reports Python-level value summaries in its errors); only the `is_valid_json` fast path is fully in place. # Error model When a value does not satisfy a schema, `validate` raises `ValidationError`. The exception is not just a message: it carries a stable, machine-readable model meant to be read by tools and agents, not only humans. ## The shape A `ValidationError` exposes: - `errors` — a tuple of structured items, one per failure. Each item is a plain dict with these keys: - `code` — a stable, machine-readable code (e.g. `int_type`, `missing_key`, `literal_error`). - `path` — the location of the offending value from the root, a tuple of string keys and integer indices (empty at the root). - `message` — the rendered one-line human message. - `expected` — a short label of the expected set (e.g. `int`). - `value` — a repr-style summary of the offending value. - `message`, `code`, `path`, `expected`, `value` — scalar convenience attributes mirroring the first item. `str(exc)` is a summary of every failure. ## Aggregation and fail-fast By default the walk does not stop at the first failure: it collects every independent failure — each record field, each sequence or tuple element, each mapping entry — into `errors`, so one call reports all the problems with a value. ``` from valgebra import ValidationError, Validator try: Validator({"a": int, "b": str, "c": int}).validate({"a": "x", "b": 1, "c": "y"}) except ValidationError as err: assert [e["path"] for e in err.errors] == [("a",), ("b",), ("c",)] ``` Pass `fail_fast=True` to stop at the first failure instead: ``` from valgebra import ValidationError, Validator try: Validator({"a": int, "b": str}).validate({"a": "x", "b": 1}, fail_fast=True) except ValidationError as err: assert len(err.errors) == 1 ``` A node-level type mismatch (a value that is not a dict where a record is expected) is terminal for that subtree: there is nothing to descend into. ## Unions report the closest branch When a value matches no branch of a union, valgebra does not dump every branch's failure. It reports the **closest** branch — the one that descended furthest into the value before failing — and that branch's own (aggregated) errors: ``` from valgebra import ValidationError, union try: union(int, {"a": int}).validate({"a": "x"}) except ValidationError as err: # The value is a dict, so the record branch is closer than `int`. assert err.errors[0]["path"] == ("a",) assert err.errors[0]["code"] == "int_type" ``` When no branch makes any progress past the union's own location — for example `int | str` against a float, where every branch is a flat type mismatch — there is no closer branch, so a single `union_error` is the honest report. A `complement` likewise reports one failure at its location. The closest-branch search is a bounded, best-effort heuristic: it runs only when a value has already failed the union, and it inspects at most the first 64 branches. A union wider than that still reports correctly — the membership decision always considers every branch — but its error may fall back to the `union_error` summary rather than pinpointing a branch past the cap. This keeps building an error for a pathologically wide union (a large `Literal[...]`, say) bounded; the successful path is unaffected. ## JSON output Every item is JSON-serializable (the `path` is a tuple of strings and ints), so the JSON output mode is the standard library: ``` import json from valgebra import ValidationError, Validator schema = Validator({"name": str, "age": int}) try: schema.validate({"name": "Ada", "age": "old"}) except ValidationError as err: payload = json.dumps(err.errors) restored = json.loads(payload) assert restored[0]["code"] == "int_type" assert restored[0]["path"] == ["age"] assert restored[0]["expected"] == "int" ``` ## When a comparison raises Checking membership reads a value through Python operations that can raise: an `__eq__` for a literal, a rich comparison for a numeric bound, `isinstance` for a class, `getattr` for an attribute, `__len__` for a length, `__mod__` for a multiple-of. A value whose comparison, instance check, or attribute access **raises an ordinary exception is treated as a non-member** — a value that cannot answer "are you in this set?" is not in it, the same pragmatic stance pydantic-core takes. The one ordinary-exception case carved out is a user predicate (`Annotated[..., some_callable]`): a predicate that raises an ordinary exception is reported as a distinct `predicate_error`, not folded into an ordinary failed match, so a buggy predicate stays visible. A **fatal interpreter signal is never folded** — at every site, the predicate and attribute access included. A base exception that is not an ordinary exception (`KeyboardInterrupt`, `SystemExit`, `GeneratorExit`), or a `MemoryError` or `RecursionError`, means the interpreter is unwinding, not that the value is a non-member, so it propagates out of `validate`/`is_valid` rather than being reported as "not a member" or a `predicate_error`. ## Determinism For a given schema and value the error model is deterministic: the same codes, paths, and order across runs and platforms. Tools can diff it. The exact output is locked by snapshot tests (the message format on the Rust side, the structured `errors` on the Python side), so any change to it is reviewed, never silent. ## Message style guide Messages and codes follow a fixed style so they stay predictable: - One line, present tense, of the form `expected , got []`; a located failure prefixes `at :`. - The `code` is stable and machine-readable; it is the field to branch on, not the prose. Codes do not change meaning across releases. - `expected` names the set, `value` is a short repr of what was found, truncated so a large value cannot flood the message. - Set-membership failures (`union`, `complement`) report at the location of the combinator, not inside a discarded branch. # Resource limits A validator runs against untrusted values, so every recursive descent and every error-reporting probe is bounded. A pathological input meets a gated limit and is rejected cleanly; it never overflows the native stack, raises a Python `RecursionError`, or hangs. The limits bound work driven by the *value* — the untrusted part. A schema's own size (the width of a union, the number of declared fields) is written by the developer and is trusted. ## The bounds - **Schema build depth.** A schema nested past a fixed depth is rejected when the validator is compiled, not at validation time. A self-referential class is the usual cause; model it with [`recursive`](https://ppigazzini.github.io/valgebra/recursion/index.md) instead. - **Value-walk depth.** A value nested past a fixed depth fails with `recursion_limit` rather than recursing into the native stack. This holds on both the object path and the JSON path; an over-deep JSON document is rejected by the parser as `json_invalid`. - **Self-reference.** A value that contains itself is caught by an object-identity guard and fails with `recursion_loop` rather than looping forever. - **Closest-branch probe.** When a value misses a wide union, the error report searches only a bounded number of branches for the closest match, so building the explanation stays bounded regardless of how the value is shaped. ## Rejection is clean, not catastrophic ``` from valgebra import ValidationError, Validator, recursive, union schema = Validator(recursive(lambda j: union(int, [j]))) # A value nested far past the walk depth: a clean error, not a crash. deep = 0 for _ in range(5000): deep = [deep] assert not schema.is_valid(deep) try: schema.validate(deep) except ValidationError as error: assert error.code == "recursion_limit" # A value that contains itself: caught as a loop. cyclic = [] cyclic.append(cyclic) assert not schema.is_valid(cyclic) # An over-deep JSON document: rejected by the parser. assert not schema.is_valid_json("[" * 5000 + "1" + "]" * 5000) ``` The worst-case timing of these shapes is measured by the adversarial benchmark and the bounds are correctness-tested, so each limit is an enforced, exercised guarantee rather than a comment. # Performance valgebra compiles a schema once into a Rust validator tree and crosses into Rust exactly once per validation call. This page records how that is measured, a reproducible baseline against other validators, and the honest limits of the numbers. A speed claim is only as good as its methodology. Every number here states the harness, the dataset, the library versions, and the machine class. Re-run the harnesses on your own hardware before relying on a ratio: absolute times move with the CPU, and the comparison points do different amounts of work. ## What is measured Two harnesses, one per side of the boundary: - **Core micro-benchmarks** (`crates/valgebra-core/benches/core.rs`, criterion) time the pure-Rust schema transformations — the simplifier, the index remap behind validator composition, and the recursive open/closed record transform. No Python is involved. - **End-to-end benchmarks** (`benches/`, pytest-benchmark) time a single boundary-crossing validation call through the public API, over synthetic shapes that each stress one cost dimension. Run them with: ``` # Core micro-benchmarks (Rust): cargo bench --bench core # End-to-end and comparison benchmarks (Python); install the bench group first: uv sync --group bench # To match the published figures, build the same PGO wheel the release ships # (needs the llvm-tools rustup component) and install it; a plain build is slower: uv run --group bench maturin build --release --pgo --out dist uv pip install --reinstall --no-deps dist/*.whl uv run --group bench pytest benches/bench_validate.py uv run --group bench pytest benches/bench_compare.py --benchmark-group-by=group ``` ## Comparison is not apples-to-apples The comparison runs the same shapes through three checkers that do **different** work. Read the ratios with that in mind: - **valgebra** checks membership of the object already in hand: no copy, no coercion. `is_valid` returns a bool through the membership fast path. - **jsonschema** (`Draft202012Validator.is_valid`) is also a pure check with no coercion — the closest semantic analogue — but it is pure Python. - **pydantic** (`TypeAdapter.validate_python`, strict mode) validates *and constructs* a value. Strict mode disables coercion, but it still builds and returns output, so it does strictly more work than a membership check. It is the relevant point of comparison because it is the fast, Rust-cored validator most users reach for. The record shape compares valgebra's closed record against a pydantic `TypedDict` and a jsonschema object with `additionalProperties: false`, so all three check the same set of named fields. ## Baseline matrix Machine class: AMD Ryzen 7 PRO 7840U (Zen 4, 8c/16t, up to 5.1 GHz, a 2023-era mobile part) under WSL2 on Linux 6.18. Toolchain: rustc 1.96.0 (the build these numbers were measured on; the supported minimum is the lower `rust-version` in the manifest), CPython 3.14.6, pydantic 2.13.4, jsonschema 4.26.0, criterion 0.8.2, pytest-benchmark 5.2.3. The extension is the **PGO** release build — the profile-guided, fat-LTO wheel the release ships (`maturin build --release --pgo`); a plain `--release` build is a few tens of percent slower on these shapes, and a debug build is not representative. pydantic's PyPI wheels are likewise PGO-built, so this is a release-to-release comparison. Figures are the per-call median; re-run on your own hardware for absolute numbers. They are measured on the wheel carrying valgebra's full feature set — the per-validator precompute (record-field lookups, literal-union dispatch) and native string patterns — which leaves these shapes unchanged: the features earn their keep elsewhere, not by regressing the core. End-to-end validation of a value that passes (median time per call, lower is better): | Shape | valgebra | pydantic (strict) | jsonschema | | ---------------------------- | -------- | ----------------- | ---------- | | `list[int]`, 10,000 elements | 47 us | 88 us | 26,000 us | | Closed record, 50 int fields | 0.97 us | 1.9 us | 134 us | | Nested `list[...]`, depth 25 | 0.25 us | 1.9 us | 77 us | valgebra relative to pydantic on this machine: ~7x faster on deep nesting, ~2x faster on the wide record, and ~1.9x faster on the large flat array. It is consistently far ahead of pure-Python jsonschema — by two to three orders of magnitude on every shape. pydantic does strictly more work on the record (it constructs output), so read that shape as a margin over a heavier operation, not a like-for-like loss for pydantic. Core micro-benchmarks (criterion, release+LTO, indicative single run): | Operation | Corpus | Median | | ------------------- | ------------------------------------- | ------- | | `simplify` | redundant Boolean expression, depth 8 | ~1.3 us | | `shifted` | 64-field pool-indexed record | ~2.0 us | | `with_records_open` | record spine, depth 32 | ~4.8 us | ## Honest limits - The numbers are a single machine class. They establish relative behavior, not a universal ranking. Shared CI runners are too noisy for a tight wall-clock budget, so the merge gate measures a deterministic instruction count instead. - The margins against pydantic come with the caveat that the two tools do different work: pydantic constructs output, valgebra only checks membership. The ratios answer "how fast is each tool's validation step," not "how much faster is membership than construction." Deep nesting is the widest gap; the array and record margins are narrower but consistent. - The comparison measures different operations (check vs check-and-construct vs pure-Python check). It answers "how fast is the validation step for each tool," not "are these tools interchangeable" — they are not. See the README for what valgebra is and is not for. - These figures are for the object path — validating a value already in hand. The JSON input path is measured separately, on the same machine class, in the [JSON page](https://ppigazzini.github.io/valgebra/json/index.md). ## How the record fast path is tuned The closed-record membership check visits each dict entry once and matches the key against the declared fields, rather than looking up every declared field in turn (which builds a temporary Python string per field) and then scanning the dict a second time for undeclared keys. The key's UTF-8 is borrowed without allocating, and the field-name index is computed once when the validator is first used — with a fast non-cryptographic hasher, since the keys are the schema's own declared names rather than attacker input — then reused across calls, so a wide record no longer rebuilds and reallocates its name map on every validation. On the 50-field record above this measures ~1.0 us per call (PGO release build); the earlier per-field-lookup form was several times slower. Profiling with cachegrind attributed the removed cost to temporary-string creation, hashing, and allocation churn from the per-field lookups, and that attribution is an instruction count, so it holds across machine classes. The bool fast path and the aggregating explain walk stay membership-equivalent, locked by tests that assert both reach the same verdict across record shapes. ## How large literal unions dispatch A union whose members are all literals (a `Literal["a", "b", ...]` enum, or a discriminator) is compiled once into value-keyed sets — one for the integer literals, one for the string literals. An exact `int` or `str` value is then a single set lookup rather than a scan of every branch, so membership cost stops growing with the number of literals. The same-type literal rule is preserved: the integer set is consulted only for an exact `int` (never a `bool`), the string set only for an exact `str`, and any other value — a `bool`, `float`, `None`, a subclass instance, a big integer, or a JSON value — falls back to the linear scan that remains the single source of truth. On a 32-literal union this cuts the per-call median several-fold; the decision is identical to the scan, locked by tests over the cross-type cases. ## Regression gate The wall-clock numbers above are for humans reading results; they are too noisy on shared CI runners to gate a merge. The merge gate is instead a deterministic instruction count: a fixed workload exercises the core schema operations (`crates/valgebra-core/examples/perf_workload.rs`), runs under cachegrind, and its executed-instruction count is compared against a committed budget (`scripts/perf_budget.json`) by `scripts/perf_gate.py`. The count is identical across runs of a given build, so a regression past the budget ceiling fails the build without flaking. The tolerance absorbs cross-environment startup and compiler-codegen drift while still catching algorithmic regressions, which are far larger than the tolerance. The gate covers the pure-Rust schema engine, which is portable enough for a committed budget. The end-to-end wall-clock suites run on the same CI lane with timing disabled, as a smoke test that they keep working. The headline claim — that valgebra is pydantic-core-class — is gated too, by `scripts/compare_gate.py`. For each shape in a matrix it measures the *ratio* of per-call time (valgebra over pydantic-core), taking the minimum over many repeats, and compares each ratio against a recorded baseline (`scripts/perf_compare.json`) with a tolerance. A ratio cancels the runner's absolute speed: if the machine is slow, both libraries are slow in proportion, so the comparison survives the shared-runner noise an absolute budget cannot. A shape fails the merge gate when valgebra's ratio rises materially past its baseline — a competitive regression, whether from valgebra slowing down or ceding ground. Re-record the budgets after an intentional change with: ``` python scripts/perf_gate.py --update # core instruction budget python scripts/compare_gate.py --update # competitive ratios ``` # Explanation # Foundations This page records the theory valgebra rests on: what a schema *means*, why the combinators form a real Boolean algebra, and where the algebra decides relationships versus where it stays deliberately conservative. It is the reference behind the claims the rest of the docs make — "a closed, lawful lattice", "subtyping is set inclusion", "a law-justified simplifier" — so each is backed rather than asserted. The [soundness argument](https://ppigazzini.github.io/valgebra/soundness/index.md) takes the next step: why an accept is never wrong, node by node. ## Schemas denote sets; validation is membership A schema denotes a **set of Python values**. Validating a value is deciding whether it is a member of that set — nothing is copied or coerced. This is the *denotational* view: the meaning of a schema is its value-set `[[s]]`, and every other relationship is defined from it. - **Subtyping is set inclusion.** `s` is a subtype of `t` exactly when `[[s]] ⊆ [[t]]`. - **Equivalence is mutual inclusion.** `s` and `t` are equivalent when `[[s]] = [[t]]` — they accept the same values, whatever their syntax. Because meaning is a set, the connectives are the set operations, and they obey the set-algebra laws by construction rather than by convention. ## A Boolean algebra of schemas `union`, `intersection`, and `complement` are set union, intersection, and complement; `anything` is the top (every value) and `nothing` is the bottom (no value). Schemas under these operations form a **Boolean lattice**: a bounded, distributive, complemented lattice. Every Boolean-algebra law therefore holds — commutativity, associativity, idempotence, absorption, identities, distributivity, De Morgan, and double negation — and valgebra property-tests each against the membership relation rather than asserting it (see the [algebra guide](https://ppigazzini.github.io/valgebra/algebra/index.md)). `simplify` rewrites a schema by these laws while admitting **exactly the same values**. That soundness — simplification never changes the value-set — is the single invariant the simplifier is held to. ## Semantic (set-theoretic) subtyping Treating types as sets of values, with full union, intersection, and negation and subtyping as inclusion, is **semantic subtyping**, developed by Frisch, Castagna, and Benzaken. valgebra is a runtime membership checker built on that model rather than a static type system, but it inherits the model's payoff: the combinators are not ad-hoc primitives, they are the Boolean operations on value-sets, and refinements like "an int that is not a bool" are simply `intersection(int, complement(bool))`. The same line of work models the structural forms valgebra uses: - **Sequences as regular-expression types.** A list or tuple schema is a regular expression over element schemas — the regular-tree-type approach from XDuce and CDuce. One node expresses fixed tuples, variadic tuples, and prefix-plus-tail lists uniformly. - **Maps as keyed-default functions.** A dict, record, or map is named fields plus key-schema-keyed default clauses — the set-theoretic model of records and maps as quasi-constant functions. ## Gradual `Any` versus the lattice top `Any` (the gradual dynamic type) and `anything` (the lattice top) both admit every value at runtime, but they are different objects in the algebra. `anything` obeys the laws — `complement(anything)` is `nothing`. `Any` is the **gradual** dynamic type: an atom the simplifier never rewrites, so `intersection(Any, s)` is not collapsed and "deliberately unchecked" stays distinct from "checked: all values admitted". This follows the treatment of the gradual `?` under union and intersection types, where the dynamic type is an interval, not the top. ## What the algebra decides, and the conservative frontier Deciding whether two arbitrary set-theoretic types are equal — equivalently, whether a type is empty — is decidable but **EXPTIME-complete**. valgebra does not need that decision to validate: membership is answered directly by the walk, not by reducing the schema. So the simplifier implements the **soundly decidable fragment** and is honest about the rest: - **Folded by the simplifier.** The complement laws (`X ∩ ¬X = ⊥`, `X ∪ ¬X = ⊤`) for any `X` except the gradual `Any`, and disjointness of the scalar fragment. So `intersection(int, complement(int)).simplify()` is `nothing` and `intersection(int, str).simplify()` is `nothing`. It never treats `Any` as the top, so a deliberately-unchecked schema is preserved. - **Decided by the comparison operators.** `is_subtype_of`, `is_equivalent`, and `is_empty` decide a wider fragment than the simplifier folds — class and literal inclusion, refinements (including the emptiness of contradictory bounds like `Ge(10) & Le(0)`), sequences, sets, records and mappings, and recursion at its greatest fixpoint. The [decidability boundary](https://ppigazzini.github.io/valgebra/decidability/index.md) lists exactly what is decided and what stays conservative. - **Conservative.** A predicate refinement is opaque, and a narrow decidable tail and the runtime-undecidable constructs remain (the boundary records them). Every answer is sound: `is_empty` never reports a non-empty schema as empty, and a subtype is never claimed unless it provably holds. The decision realizes the set-theoretic emptiness test directly rather than via a full type-automaton construction; that construction would only widen the few conservative cases the boundary records, never change a membership decision. ## References The essential reading, in the order it maps onto valgebra: 1. **Frisch, Castagna & Benzaken — "Semantic Subtyping: Dealing Set-Theoretically with Function, Union, Intersection, and Negation Types", *JACM* 55(4), 2008.** [doi:10.1145/1391289.1391293](https://doi.org/10.1145/1391289.1391293). The foundation: types as sets, subtyping as inclusion, full Boolean connectives. 1. **Gesbert, Genevès & Layaïda — "A Logical Approach to Deciding Semantic Subtyping", *TOPLAS* 38(1), 2015.** The decision procedure and its EXPTIME-completeness — why the full emptiness decision is deferred. 1. **Hosoya, Vouillon & Pierce — "Regular Expression Types for XML", *TOPLAS* 27(1), 2005.** Regular-tree types — the model behind sequences as one regex node. 1. **Castagna — "Typing Records, Maps, and Structs", *ICFP* 2023.** [doi:10.1145/3607838](https://doi.org/10.1145/3607838). Records and maps as keyed-default functions. 1. **Castagna & Lanvin — "Gradual Typing with Union and Intersection Types", *PACMPL* 1(ICFP), 2017**, and **Castagna, Lanvin, Petrucciani & Siek — "Gradual Typing: A New Perspective", *PACMPL* 3(POPL), 2019.** The gradual dynamic type under set-theoretic connectives — why `Any` is not the top. A current synthesis is Castagna, "Programming with Union, Intersection, and Negation Types", 2024 ([arXiv:2111.03354](https://arxiv.org/abs/2111.03354)). # Soundness argument This page argues, in a form a reader can check, why **an accept is never wrong**: if valgebra reports a value valid, the value really is in the schema's set. It is a written argument backed by adversarial tests, not a machine-checked proof; the [honest limits](#what-this-argument-assumes) say exactly what is taken on trust. ## The claim Write `⟦S⟧` for the set of Python values a schema `S` denotes (its [denotation](https://ppigazzini.github.io/valgebra/foundations/index.md)). valgebra rests on three soundness statements: 1. **Membership is exact.** For every schema `S` and value `x`, the walk accepts `x` if and only if `x ∈ ⟦S⟧`. The "only if" half is *soundness of acceptance* — the property downstream code relies on; the "if" half is completeness of the check. 1. **Simplification preserves meaning.** `⟦simplify(S)⟧ = ⟦S⟧`. 1. **Decisions are sound.** If `is_subtype_of(A, B)` is `True` then `⟦A⟧ ⊆ ⟦B⟧`; if `is_empty(S)` is `True` then `⟦S⟧ = ∅`. The converses are *not* claimed — the decision is deliberately conservative. There is no separate specification the implementation could disagree with: the denotation *is* the meaning, so soundness is the statement that the Rust walk computes `x ∈ ⟦S⟧`. The argument is therefore a node-by-node check that the walk's accept condition is, line for line, the membership condition of `⟦S⟧`. ## Why membership is exact: structural induction The walk recurses on the structure of `S`. Take as induction hypothesis that the walk is exact on every strict sub-schema; then check each node. The denotation and the walk's accept condition coincide at every one: ``` S accepts x ⟺ x ∈ ⟦S⟧, by: --------------- ----------------------------------------------------------- Anything always (⟦Anything⟧ = all values) Nothing never (⟦Nothing⟧ = ∅) Bool/Int/... isinstance(x, T) (the scalar region) Literal(c) type(x) is type(c) and x == c (typed singleton) Union(A_i) some A_i accepts x (∃: set union) Intersection every A_i accepts x (∀: set intersection) Complement(A) A does not accept x (¬: set complement) Refine(B, c_j) B accepts x and every c_j (base ∩ constraints) Seq(kind, r) x is a kind whose elements (regular language over match the regex r element denotations) Set/FrozenSet(A) every element accepts A (homogeneous container) KeyedMap(f, d) fields present-and-match, and (named fields ∩ keyed every other key matches a default clauses) default clause Instance(C) isinstance(x, C) (class extension) ``` For the Boolean nodes the equivalence is the definition of the set operation, so the step is immediate given the hypothesis on the children. For the structural nodes (`Seq`, `KeyedMap`, `Set`) the walk evaluates the children exactly by hypothesis and combines them by the same connective the denotation uses. The scalar and `Instance` leaves reduce to `isinstance`, which is Python's own membership test for those sets, and `Literal` adds the same-type guard that keeps `Literal[1]`, `Literal[True]`, and `Literal[1.0]` distinct. ### Recursion terminates and stays exact A `recursive` schema is a guarded (contractive) fixpoint: every back-edge sits under a structural constructor. Membership unfolds that fixpoint against the value in hand. Because the value is a **finite** Python object, the unfolding is finite, and two guards keep it so: - an object-identity guard rejects a value that contains itself (`recursion_loop`) rather than looping, and - a depth bound rejects a value nested past the limit (`recursion_limit`) rather than overflowing the stack (see [resource limits](https://ppigazzini.github.io/valgebra/limits/index.md)). On the inhabitants — the finite values — the guarded unfolding accepts exactly the members, which is why a coinductive *comparison* at the greatest fixpoint never contradicts a membership answer (see [recursion](https://ppigazzini.github.io/valgebra/recursion/index.md)). ## Why simplification preserves meaning Every rewrite `simplify` performs is a law of the Boolean algebra of sets — flattening associative nodes, dropping identities and duplicates, pushing complement to negation-normal form, folding `X ∩ ¬X` to `⊥` and `X ∪ ¬X` to `⊤`, and using scalar disjointness — each of which holds of the *sets*, so it cannot change `⟦S⟧`. The simplifier is held to this one invariant and to nothing stronger: it is a lattice normal form, not a decision, so membership relations are read off the decision procedures, never off simplified structure. ## Why the decisions are sound (and only sound) `is_subtype_of` applies structural inclusion rules, each a valid set inclusion: `A ⊆ B₁ ∪ … ∪ Bₙ` when `A ⊆ some Bᵢ`, `A₁ ∩ … ∩ Aₙ ⊆ B` when some `Aᵢ ⊆ B`, the contrapositive for complement, componentwise inclusion for the structural forms, and the coinductive rule for recursion (assume the goal on the current path — sound for inclusion at the greatest fixpoint). A leaf the rules cannot relate is handed to an oracle that returns `False` when it cannot prove the relation. Every rule preserves "the conclusion holds whenever the premises do", so a `True` is a proof. `is_empty` and `is_equivalent` are derived (`is_empty(S)` is `S ⊆ Nothing` via the same rules; `is_equivalent` is mutual inclusion), so they inherit the soundness. The conservatism is the price: when a rule does not fire and the oracle declines, the answer is `False` — "not proven", not "disproven". This is why the [decidability boundary](https://ppigazzini.github.io/valgebra/decidability/index.md) maps where `False` is exact and where it is conservative, and why the docs say *closed algebra, conservative decision*. ## How the argument is mechanized The argument is checked, not just asserted, by four independent test layers: - **Denotation oracle.** Each node's `⟦S⟧` is written as a reference predicate over a value generator, and the walk is property-tested to agree with it — this is the membership-exactness claim, checked on generated values. - **Algebra laws.** Every law `simplify` relies on is property-tested against the membership relation, in Rust (proptest) and Python (Hypothesis). - **External ground truth.** The same schemas and values run through pydantic-core and jsonschema; a divergence is a bug or a documented intentional difference — an independent check that the *reference predicates themselves* are right, closing the single-author blind spot. - **Coverage-guided fuzzing.** libFuzzer drives the decision procedures over the whole IR, asserting the sound order laws (reflexivity, the lattice bounds, equivalence as mutual inclusion); the same invariants gate every merge. These are partial mechanization — adversarial, independent, and coverage-guided — in place of a fully formal proof. ## What this argument assumes The soundness is relative to a small, explicit trust base: - `isinstance` and the PyO3 conversions report Python's own membership faithfully. - The JSON parser (jiter) yields the value `json.loads` would, so the JSON path's denotation matches the object path's. - The crates contain no `unsafe`, so there is no memory-safety obligation beyond the compiler's. - **Predicate refinements are opaque.** A `Predicate` constraint runs arbitrary Python; valgebra checks that it returned truthy, and the soundness of *that* leaf is the caller's. Regex constraints are matched natively and related only by syntactic identity. Within that base, an accept is a claim that `x ∈ ⟦S⟧`, justified node by node above and exercised by the four test layers — which is what "rock solid" can honestly mean before a machine-checked proof and outside review exist. # The decidability boundary valgebra compares schemas as sets: `is_subtype_of` is set inclusion, `is_equivalent` is mutual inclusion, and `is_empty` reports an unsatisfiable schema. The relation is `s <= t` exactly when `s` and `not t` share no value, so every comparison reduces to an emptiness test (see [foundations](https://ppigazzini.github.io/valgebra/foundations/index.md)). Every answer is **sound**. A `True` from `is_subtype_of`/`is_equivalent`, or a `True` from `is_empty`, is always correct. Where valgebra cannot yet prove a relation it answers conservatively — `False`, or "not empty" — never a wrong `True`. So a positive answer is a guarantee, and a negative answer is "no, or not proven". This page states which queries valgebra decides completely, which stay conservative, and which are undecidable at runtime and so are rejected or treated opaquely by necessity. ## Decided completely Over this fragment, valgebra returns the exact set-theoretic answer. - **The scalar Boolean algebra.** Every union, intersection, and complement of the scalar atoms (`None`, `bool`, `int`, `float`, `str`, `bytes`), with `bool` a subtype of `int`. The complement laws hold: `int & ~int` is empty, `int | ~int` is the universe. - **Complement and disjointness across kinds.** An intersection that carries a schema together with its complement (`A & ~A`), or two members of provably disjoint kinds (a list and a set, an `int` and a `str`), is empty — for the structural kinds, not only the scalars. The gradual `Any` is exempt from the complement law: `Any & ~Any` is *not* empty, because `Any` is the dynamic type, not a set whose complement cancels it. - **Class and literal inclusion.** A class is a subtype of its base classes (`issubclass`), and a literal is a subtype of any schema it is a member of. - **Refinements.** A refinement is a subtype of its base and of a refinement with looser bounds — a tighter numeric or length bound entails a looser one, not only a verbatim-contained constraint set; a bound conjunction that cannot be satisfied — a lower bound above an upper bound, or a minimum length above a maximum — is empty. On an integer base the bounds count integers, so an interval that skips every integer — `Annotated[int, Gt(0), Lt(1)]` — is empty even though its endpoints are ordered; a `float` base stays dense, so the same bounds are not empty. - **Sequences.** Homogeneous, fixed-length, and prefix-plus-tail lists and tuples, with the container as part of the type (a list is never a tuple). Every sequence schema valgebra builds takes this linear shape, so sequence inclusion is decided completely. - **Sets and frozensets.** By element inclusion. - **Records and mappings.** Closed-record width, depth, and required-ness; pure mappings with several key-pattern clauses (each subtype clause subsumed by a supertype clause); and a record mixed with a catch-all when the subtype carries at least the supertype's fields, or when a field the subtype lacks is optional in the supertype and the subtype's catch-all covers its value type (each extra or optional field covered by a catch-all over all string keys). - **Recursion.** Equirecursive schemas compare at their greatest fixpoint; the rule is sound and is witnessed by an independent reference denotation. ``` from typing import Annotated, Any import annotated_types as at from valgebra import complement, intersection, recursive, union, Validator assert Validator(bool).is_subtype_of(int) # bool is a subtype of int assert Validator(1).is_subtype_of(int) # a literal is a member of int assert Validator(Annotated[int, at.Ge(0)]).is_subtype_of(int) # refinement <= base assert Validator(Annotated[int, at.Ge(10), at.Le(0)]).is_empty() # no such int assert Validator(Annotated[int, at.Gt(0), at.Lt(1)]).is_empty() # no int strictly between assert not Validator(Annotated[float, at.Gt(0), at.Lt(1)]).is_empty() # floats are dense assert Validator({str: int}).is_subtype_of({str: int, int: bool}) # mapping clauses assert Validator({str: int}).is_subtype_of({"b?": int, str: int}) # optional field, catch-all covers it assert union(bool, int).is_equivalent(int) # bool | int is just int assert intersection(int, complement(int)).is_empty() # the complement law assert intersection(list[int], complement(list[int])).is_empty() # complement law, structurally assert intersection(list[int], set[int]).is_empty() # disjoint kinds: a list is never a set assert not intersection(Any, complement(Any)).is_empty() # Any is exempt from the complement law json_value = recursive(lambda j: union(None, bool, int, float, str, [j], {str: j})) assert json_value.is_valid({"a": [1, "x", {"b": None}]}) ``` ## Sound but conservative Here valgebra is correct but not complete: it may answer `False` or "not empty" for a relation that does in fact hold. These are decidable in principle and are tracked as future work. - **Mixed maps where the supertype declares a *required* field the subtype lacks.** When the missing field is optional, the subtype's catch-all covers it and the case is decided; a required field is not, because a catch-all over the key space does not prove that field is present. Deciding it needs the full quasi-constant-function comparison. General regular-expression-types inclusion of sequences (a union of sequence languages that splits across branches, or a repeated heterogeneous group) is not implemented, but no schema valgebra builds takes that shape — every sequence is the linear prefix-and-tail form, which is decided completely — so it is not a reachable gap. ## Undecidable at runtime These have no decidable runtime membership, so valgebra rejects them with a clear message or treats them opaquely — it never guesses. - **Erased generics and type variables.** A `TypeVar`, `Generic[T]`, `ParamSpec`, or `TypeVarTuple` is rejected; a runtime value carries no binding for a free type variable. - **Abstract-collection generics.** `Sequence[int]`, `Mapping[str, int]`, and `Iterable[T]` are rejected; checking `Iterable` elements would consume the iterable, and `str`/`bytes` are themselves sequences. Use a concrete container — `list[int]`, `tuple[int, ...]`, `dict[str, int]` — or the bare abstract type for an `isinstance` check. - **Callable signatures.** `Callable[[int], str]` checks only that the value is callable; a function does not expose its argument and return types at runtime. - **Predicates.** An `Annotated[T, predicate]` runs the predicate at validation time; its satisfiability cannot be reasoned about (Rice's theorem), so it is opaque to subtyping and emptiness. - **Typing qualifiers.** `Final` and `ClassVar` are rejected; they qualify a declaration and carry no value-membership meaning. ``` from collections.abc import Sequence from typing import TypeVar from valgebra import Validator T = TypeVar("T") for undecidable in (Sequence[int], T): try: Validator(undecidable) raise AssertionError("expected a rejection") except NotImplementedError: pass # rejected with a clear message, never a silent wrong validator ``` ## The contract A positive answer (`is_subtype_of`/`is_equivalent`/`is_empty` returning `True`) is a proof. A negative answer is "no, or not yet proven". valgebra never reports a relation it cannot justify, so widening the decided fragment can only turn a conservative `False` into a `True` — it can never change a previously-correct answer. # API reference # API reference The full public surface of the `valgebra` package. Every name is re-exported from the top-level `valgebra` namespace. ## Compiling and checking ### Validator A compiled, immutable schema validator. Build one by calling `Validator(schema)`, or with a combinator such as `union`, `intersection`, or `recursive`. Then check values with `validate`, `is_valid`, or `ensure`, and JSON documents with `validate_json` or `is_valid_json`. Validation is a membership test against the set the schema denotes: the value is never copied or coerced. A validator never changes after it is built and is safe to share across threads. Its `repr` is the annotation that produces it, and it can be copied with `copy.copy`/`copy.deepcopy`. #### close ``` close() -> Validator ``` Close every record in the schema: only declared keys are admitted throughout. The inverse of `open`. Returns a new validator; this one is unchanged. Returns: | Type | Description | | ----------- | ------------------------------------------------------------- | | `Validator` | A validator whose every record admits only its declared keys. | #### ensure ``` ensure(obj: object) -> object ``` Validate `obj` and return it unchanged. The value-returning check. Because validation is a membership test rather than a coercion, the returned object is exactly the input; `ensure` exists so code that wants the checked value back reads distinctly from the boolean `is_valid`. Parameters: | Name | Type | Description | Default | | ----- | -------- | -------------------- | ---------- | | `obj` | `object` | The object to check. | *required* | Returns: | Type | Description | | -------- | -------------- | | `object` | obj unchanged. | Raises: | Type | Description | | ----------------- | ------------------------------------------- | | `ValidationError` | If obj is not a member of the schema's set. | #### is_empty ``` is_empty() -> bool ``` Whether the schema is unsatisfiable — provably empty, so `is_valid` returns `False` for every value. Decided soundly: `True` only when no value can belong to the schema — an unsatisfiable intersection, a fixed sequence with an impossible position, a record with an impossible required field, a refinement whose bounds cannot hold together (a lower bound above an upper bound, or a minimum length above a maximum), or a recursive schema with no base case (a mandatory self-reference that can never bottom out). It never reports a satisfiable schema as empty; for forms it cannot decide it returns `False`. The decision is also bounded by a fixed work budget, so on a deeply nested adversarial schema a `False` can mean "not proven empty within the bound" rather than "non-empty"; a real schema decides far inside the bound. Returns: | Type | Description | | ------ | ----------------------------------------------------- | | `bool` | True if the schema denotes the empty set, else False. | #### is_equivalent ``` is_equivalent(other: object) -> bool ``` Whether this schema and `other` denote the same set — mutual inclusion. `other` is any schema spec or compiled validator. Sound, like `is_subtype_of`: `True` only when the two are provably equivalent, whatever their syntax (`bool | int` is equivalent to `int`). Bounded by the same work budget as `is_subtype_of`, so on a deeply nested adversarial schema a `False` can mean "not proven equivalent within the bound". Parameters: | Name | Type | Description | Default | | ------- | -------- | ---------------------------------------------- | ---------- | | `other` | `object` | The schema to compare, as a spec or validator. | *required* | Returns: | Type | Description | | ------ | --------------------------------------------------- | | `bool` | True if the two schemas are equivalent, else False. | #### is_subtype_of ``` is_subtype_of(other: object) -> bool ``` Whether every value of this schema is also a value of `other` — set inclusion, the subtyping relation. `other` is any schema spec or compiled validator. The decision is sound: `True` only when the inclusion provably holds (`bool` is a subtype of `int`, `list[bool]` of `list[int]`, a recursive schema of a wider one, a class of a base class). For the forms it cannot decide — an alternation of sequence shapes, or a leaf relation the oracle declines — it returns `False` rather than a relation it cannot justify. The decision is bounded by a fixed work budget, so on a deeply nested adversarial schema a `False` can mean "not proven a subtype within the bound"; a real schema decides far inside the bound. Parameters: | Name | Type | Description | Default | | ------- | -------- | ------------------------------------------------------- | ---------- | | `other` | `object` | The candidate supertype, as a schema spec or validator. | *required* | Returns: | Type | Description | | ------ | ------------------------------------------------------ | | `bool` | True if this schema is a subtype of other, else False. | #### is_valid ``` is_valid(obj: object) -> bool ``` Whether `obj` is a member of the schema's set. Check-only: it does not build an error and returns as soon as membership is decided. It raises only if a comparison the membership test performs raises a fatal interpreter signal (for example a KeyboardInterrupt during a long check); an ordinary exception in a comparison is folded to a non-member, as the membership contract requires. Parameters: | Name | Type | Description | Default | | ----- | -------- | -------------------- | ---------- | | `obj` | `object` | The object to check. | *required* | Returns: | Type | Description | | ------ | -------------------------------------------------------- | | `bool` | True if obj is a member of the schema's set, else False. | Raises: | Type | Description | | --------------- | ------------------------------------------------------------------------------------------------------------------- | | `BaseException` | If a membership comparison raises a fatal interpreter signal, it propagates rather than being read as a non-member. | #### is_valid_json ``` is_valid_json(data: str | bytes) -> bool ``` Whether a JSON document parses and is a member of the schema's set. Check-only: malformed or undecodable JSON, or input that is neither `str` nor `bytes`, is simply not a member and returns `False`. The raising entries (`validate_json`/`load`) report the same undecodable input as a structured `json_invalid` error. The document is validated in place against the parsed value, with no intermediate Python objects for the structure it walks. It raises only if a membership comparison raises a fatal interpreter signal. Parameters: | Name | Type | Description | Default | | ------ | ----- | ----------- | ----------------------------------- | | `data` | \`str | bytes\` | The JSON document, as str or bytes. | Returns: | Type | Description | | ------ | ------------------------------------------------------------- | | `bool` | True if data parses and is a member of the schema's set, else | | `bool` | False. | Raises: | Type | Description | | --------------- | ------------------------------------------------------------------------------------------------------------------- | | `BaseException` | If a membership comparison raises a fatal interpreter signal, it propagates rather than being read as a non-member. | #### load ``` load( data: str | bytes, *, fail_fast: bool = False ) -> object ``` Validate a JSON document and return the parsed value. Like `validate_json`, but returns the parsed Python object instead of discarding it, so a caller that needs the data does not parse it again. Parsing runs in Rust (jiter), and the parsed value is validated by the same walk, reaching the same decision and errors as `validate`. Parameters: | Name | Type | Description | Default | | ----------- | ------ | ------------------------------------------------- | ----------------------------------- | | `data` | \`str | bytes\` | The JSON document, as str or bytes. | | `fail_fast` | `bool` | Stop at the first failure instead of aggregating. | `False` | Returns: | Type | Description | | -------- | ------------------------------------------------------------------- | | `object` | The parsed Python object, once it is confirmed a member of the set. | Raises: | Type | Description | | ----------------- | ------------------------------------------------------------------------------------------------------------ | | `ValidationError` | If the document is malformed or undecodable JSON (code json_invalid) or is not a member of the schema's set. | | `TypeError` | If data is not str or bytes. | #### open ``` open() -> Validator ``` Open every record in the schema: undeclared keys are admitted throughout. Returns a new validator; this one is unchanged. Returns: | Type | Description | | ----------- | ----------------------------------------------------------------- | | `Validator` | A validator whose every record admits keys beyond those declared. | #### simplify ``` simplify() -> Validator ``` An equivalent validator reduced by the lattice laws. The result admits exactly the same values in a simpler form (flattened and deduplicated unions and intersections, identities applied, complements in negation-normal form). Returns a new validator; this one is unchanged. Returns: | Type | Description | | ----------- | ---------------------------------------------------------- | | `Validator` | A validator denoting the same set in negation-normal form. | #### validate ``` validate(obj: object, *, fail_fast: bool = False) -> None ``` Validate `obj`, raising `ValidationError` if it is not a member of the schema's set. Check-only: `obj` is never copied or coerced. Parameters: | Name | Type | Description | Default | | ----------- | -------- | ------------------------------------------------------------------------------------------ | ---------- | | `obj` | `object` | The object to check. | *required* | | `fail_fast` | `bool` | Stop at the first failure instead of aggregating every independent failure into the error. | `False` | Returns: | Type | Description | | ------ | -------------------------------------------- | | `None` | None if obj is a member of the schema's set. | Raises: | Type | Description | | ----------------- | ----------------------------------------------------------------------------- | | `ValidationError` | If obj is not a member; its errors lists each failure with a code and a path. | #### validate_json ``` validate_json( data: str | bytes, *, fail_fast: bool = False ) -> None ``` Validate a JSON document, parsing it on the Rust path. Parsing runs in Rust, faster than the standard library, and the parsed value runs the same validation walk as a native object, so this reaches the same decision and the same errors as `validate` on the parsed object. `fail_fast` behaves as it does for `validate`. Parameters: | Name | Type | Description | Default | | ----------- | ------ | ------------------------------------------------- | ----------------------------------- | | `data` | \`str | bytes\` | The JSON document, as str or bytes. | | `fail_fast` | `bool` | Stop at the first failure instead of aggregating. | `False` | Returns: | Type | Description | | ------ | ---------------------------------------------------------------- | | `None` | None if the document parses and is a member of the schema's set. | Raises: | Type | Description | | ----------------- | ------------------------------------------------------------------------------------------------------------ | | `ValidationError` | If the document is malformed or undecodable JSON (code json_invalid) or is not a member of the schema's set. | | `TypeError` | If data is not str or bytes. | ## Combinators ### union ``` union(*schemas: object) -> Validator ``` The union of the given schemas: a value in at least one of their sets. ### intersection ``` intersection(*schemas: object) -> Validator ``` The intersection of the given schemas: a value in every one of their sets. ### complement ``` complement(schema: object) -> Validator ``` The complement of a schema: every value not in its set. The whole-schema transforms `simplify` (reduce by the lattice laws), `open`, and `close` (a record's key set) are methods on the compiled validator (`Validator.simplify`/`open`/`close`), documented above. A fixed-length list is the native `[A, B]` literal (see the [schema language](https://ppigazzini.github.io/valgebra/schema-language/index.md)). ## Refinement markers ### Regex `Annotated` metadata: a string fully matches this regular expression. Use as `Annotated[str, Regex(r"[0-9a-f]{24}")]`. The match is anchored — the whole string must match, as `re.fullmatch` does — and runs natively on the Rust path (a linear-time engine), so a pattern check stays on the validation fast path rather than crossing into Python like a predicate. A bare `re.Pattern` (from `re.compile`) is accepted as metadata too. ## Recursion ### recursive ``` recursive( builder: Callable[[Validator], object], ) -> Validator ``` Build a recursive schema as a checked fixpoint. `builder` receives a placeholder validator standing for the schema being defined and returns its body. The placeholder's self-reference is resolved to a back edge, and a non-contractive body — one whose recursive reference is not under a structural constructor — is rejected. ## Lattice bounds ### anything ``` anything: Validator = anything ``` A compiled, immutable schema validator. Build one by calling `Validator(schema)`, or with a combinator such as `union`, `intersection`, or `recursive`. Then check values with `validate`, `is_valid`, or `ensure`, and JSON documents with `validate_json` or `is_valid_json`. Validation is a membership test against the set the schema denotes: the value is never copied or coerced. A validator never changes after it is built and is safe to share across threads. Its `repr` is the annotation that produces it, and it can be copied with `copy.copy`/`copy.deepcopy`. ### nothing ``` nothing: Validator = nothing ``` A compiled, immutable schema validator. Build one by calling `Validator(schema)`, or with a combinator such as `union`, `intersection`, or `recursive`. Then check values with `validate`, `is_valid`, or `ensure`, and JSON documents with `validate_json` or `is_valid_json`. Validation is a membership test against the set the schema denotes: the value is never copied or coerced. A validator never changes after it is built and is safe to share across threads. Its `repr` is the annotation that produces it, and it can be copied with `copy.copy`/`copy.deepcopy`. ## Errors ### ValidationError Bases: `Exception` ## Package version `valgebra.__version__` is the installed distribution version as a string. It is read from the package metadata maturin derives from the Cargo workspace manifest, so it always matches the built wheel and never drifts from a hand-maintained literal. ``` import valgebra print(valgebra.__version__) ```