Soundness argument¶
This page argues, in a form a reader can check, why an accept is never wrong: if valgebra reports a value valid, the value really is in the schema's set. It is a written argument backed by adversarial tests, not a machine-checked proof; the honest limits say exactly what is taken on trust.
The claim¶
Write ⟦S⟧ for the set of Python values a schema S denotes (its
denotation). valgebra rests on three soundness statements:
- Membership is exact. For every schema
Sand valuex, the walk acceptsxif and only ifx ∈ ⟦S⟧. The "only if" half is soundness of acceptance — the property downstream code relies on; the "if" half is completeness of the check. - Simplification preserves meaning.
⟦simplify(S)⟧ = ⟦S⟧. - Decisions are sound. If
is_subtype_of(A, B)isTruethen⟦A⟧ ⊆ ⟦B⟧; ifis_empty(S)isTruethen⟦S⟧ = ∅. The converses are not claimed — the decision is deliberately conservative.
There is no separate specification the implementation could disagree with: the
denotation is the meaning, so soundness is the statement that the Rust walk
computes x ∈ ⟦S⟧. The argument is therefore a node-by-node check that the
walk's accept condition is, line for line, the membership condition of ⟦S⟧.
Why membership is exact: structural induction¶
The walk recurses on the structure of S. Take as induction hypothesis that the
walk is exact on every strict sub-schema; then check each node. The denotation
and the walk's accept condition coincide at every one:
S accepts x ⟺ x ∈ ⟦S⟧, by:
--------------- -----------------------------------------------------------
Anything always (⟦Anything⟧ = all values)
Nothing never (⟦Nothing⟧ = ∅)
Bool/Int/... isinstance(x, T) (the scalar region)
Literal(c) type(x) is type(c) and x == c (typed singleton)
Union(A_i) some A_i accepts x (∃: set union)
Intersection every A_i accepts x (∀: set intersection)
Complement(A) A does not accept x (¬: set complement)
Refine(B, c_j) B accepts x and every c_j (base ∩ constraints)
Seq(kind, r) x is a kind whose elements (regular language over
match the regex r element denotations)
Set/FrozenSet(A) every element accepts A (homogeneous container)
KeyedMap(f, d) fields present-and-match, and (named fields ∩ keyed
every other key matches a default clauses)
default clause
Instance(C) isinstance(x, C) (class extension)
For the Boolean nodes the equivalence is the definition of the set operation, so
the step is immediate given the hypothesis on the children. For the structural
nodes (Seq, KeyedMap, Set) the walk evaluates the children exactly by
hypothesis and combines them by the same connective the denotation uses. The
scalar and Instance leaves reduce to isinstance, which is Python's own
membership test for those sets, and Literal adds the same-type guard that keeps
Literal[1], Literal[True], and Literal[1.0] distinct.
Recursion terminates and stays exact¶
A recursive schema is a guarded (contractive) fixpoint: every back-edge sits
under a structural constructor. Membership unfolds that fixpoint against the
value in hand. Because the value is a finite Python object, the unfolding is
finite, and two guards keep it so:
- an object-identity guard rejects a value that contains itself
(
recursion_loop) rather than looping, and - a depth bound rejects a value nested past the limit (
recursion_limit) rather than overflowing the stack (see resource limits).
On the inhabitants — the finite values — the guarded unfolding accepts exactly the members, which is why a coinductive comparison at the greatest fixpoint never contradicts a membership answer (see recursion).
Why simplification preserves meaning¶
Every rewrite simplify performs is a law of the Boolean algebra of sets —
flattening associative nodes, dropping identities and duplicates, pushing
complement to negation-normal form, folding X ∩ ¬X to ⊥ and X ∪ ¬X to ⊤,
and using scalar disjointness — each of which holds of the sets, so it cannot
change ⟦S⟧. The simplifier is held to this one invariant and to nothing
stronger: it is a lattice normal form, not a decision, so membership relations
are read off the decision procedures, never off simplified structure.
Why the decisions are sound (and only sound)¶
is_subtype_of applies structural inclusion rules, each a valid set inclusion:
A ⊆ B₁ ∪ … ∪ Bₙ when A ⊆ some Bᵢ, A₁ ∩ … ∩ Aₙ ⊆ B when some Aᵢ ⊆ B, the
contrapositive for complement, componentwise inclusion for the structural forms,
and the coinductive rule for recursion (assume the goal on the current path —
sound for inclusion at the greatest fixpoint). A leaf the rules cannot relate is
handed to an oracle that returns False when it cannot prove the relation. Every
rule preserves "the conclusion holds whenever the premises do", so a True is a
proof. is_empty and is_equivalent are derived (is_empty(S) is
S ⊆ Nothing via the same rules; is_equivalent is mutual inclusion), so they
inherit the soundness.
The conservatism is the price: when a rule does not fire and the oracle declines,
the answer is False — "not proven", not "disproven". This is why the
decidability boundary maps where False is exact and where it
is conservative, and why the docs say closed algebra, conservative decision.
How the argument is mechanized¶
The argument is checked, not just asserted, by four independent test layers:
- Denotation oracle. Each node's
⟦S⟧is written as a reference predicate over a value generator, and the walk is property-tested to agree with it — this is the membership-exactness claim, checked on generated values. - Algebra laws. Every law
simplifyrelies on is property-tested against the membership relation, in Rust (proptest) and Python (Hypothesis). - External ground truth. The same schemas and values run through pydantic-core and jsonschema; a divergence is a bug or a documented intentional difference — an independent check that the reference predicates themselves are right, closing the single-author blind spot.
- Coverage-guided fuzzing. libFuzzer drives the decision procedures over the whole IR, asserting the sound order laws (reflexivity, the lattice bounds, equivalence as mutual inclusion); the same invariants gate every merge.
These are partial mechanization — adversarial, independent, and coverage-guided — in place of a fully formal proof.
What this argument assumes¶
The soundness is relative to a small, explicit trust base:
isinstanceand the PyO3 conversions report Python's own membership faithfully.- The JSON parser (jiter) yields the value
json.loadswould, so the JSON path's denotation matches the object path's. - The crates contain no
unsafe, so there is no memory-safety obligation beyond the compiler's. - Predicate refinements are opaque. A
Predicateconstraint runs arbitrary Python; valgebra checks that it returned truthy, and the soundness of that leaf is the caller's. Regex constraints are matched natively and related only by syntactic identity.
Within that base, an accept is a claim that x ∈ ⟦S⟧, justified node by node
above and exercised by the four test layers — which is what "rock solid" can
honestly mean before a machine-checked proof and outside review exist.