JSON input¶
A compiled validator validates JSON source directly, parsing on the Rust path:
from valgebra import Validator
users = Validator({"name": str, "age?": int})
users.validate_json('{"name": "Ada", "age": 36}') # passes, returns None
assert users.is_valid_json('{"name": "Ada"}') # optional key absent
assert not users.is_valid_json('{"name": 5}') # name is not a str
# bytes input is accepted too
assert Validator(list[int]).is_valid_json(b"[1, 2, 3]")
validate_json(data, *, fail_fast=False) mirrors validate: it raises
ValidationError on failure and aggregates every independent failure by default.
is_valid_json(data) mirrors is_valid: it returns a bool and never raises.
Both accept a JSON str or bytes.
When you need the data, not just the verdict, load validates and returns the
parsed value, so it is not parsed twice:
from valgebra import Validator
users = Validator({"name": str, "age?": int})
record = users.load('{"name": "Ada", "age": 36}')
assert record == {"name": "Ada", "age": 36}
load(data, *, fail_fast=False) raises ValidationError on malformed JSON or a
non-member, exactly as validate_json does, and otherwise returns the parsed
object.
Same decisions as the object path¶
The JSON path parses the document into a Python value and runs the same
validation walk as a native object. So validating a JSON document is exactly
validating json.loads of that document — the same accept/reject decision, the
same error codes, and the same paths:
import json
from valgebra import Validator
v = Validator(list[dict[str, int]])
doc = '[{"a": 1}, {"b": "x"}]'
assert v.is_valid_json(doc) == v.is_valid(json.loads(doc))
This equivalence is locked by tests over a corpus spanning the JSON value model.
JSON-to-Python value mapping¶
Parsing uses jiter (the parser pydantic-core uses) with the standard JSON model,
so a document maps to Python values exactly as the standard library's json
module produces them:
| JSON | Python | Matches schema |
|---|---|---|
null |
None |
None |
true / false |
bool |
bool (and int, since bool is a subtype) |
number, no fraction or exponent (42) |
int |
int, not float |
number with fraction or exponent (4.2, 1e3) |
float |
float, not int |
| string | str |
str |
| array | list |
list[...], fixed lists, tuples |
| object | dict |
records and mappings |
Two consequences follow from valgebra's value-set semantics:
from valgebra import Validator
# JSON 42 is an int, and float is disjoint from int, so it is not a float
assert not Validator(float).is_valid_json("42")
assert Validator(float).is_valid_json("42.0")
# JSON true is a bool, and bool is a subtype of int
assert Validator(int).is_valid_json("true")
Infinity and NaN are not valid JSON and are rejected as malformed; whole
numbers too large for a machine integer still parse to a Python int.
Malformed JSON¶
Unparseable input never reaches the validation walk. validate_json reports it
through the same structured error model as any other failure — a single errors
item coded json_invalid carrying the parser's diagnostic — and is_valid_json
treats it as a non-member:
from valgebra import ValidationError, Validator
v = Validator(int)
assert not v.is_valid_json("{ not json")
try:
v.validate_json("{ not json")
except ValidationError as err:
assert err.code == "json_invalid"
A non-str, non-bytes argument is a TypeError, not a validation failure.
Performance¶
is_valid_json parses with jiter and validates the parsed JSON value in
place: no intermediate Python objects are built for the structure it walks, so
membership of a large array or a deep document is decided in Rust. A comparison
against a Python object — a literal, a refinement predicate, or an instance or
attribute check — is the documented step back into Python (detailed below). The
same walk runs over either input source — a Python object or a JSON value — so
the two paths stay equivalent. On the benchmark machine (AMD Ryzen 7 PRO 7840U,
WSL2, CPython 3.14.6, jiter 0.16, the PGO release wheel — the same profile the
release ships), per-call median on a passing document:
| Shape | is_valid_json |
json.loads + is_valid |
speedup |
|---|---|---|---|
| Record, 50 int fields | 3.7 us | 6.5 us | ~1.8x |
| List of 200 small records | 27.3 us | 40.6 us | ~1.5x |
list[int], 10,000 elements |
105 us | 501 us | ~4.8x |
Avoiding materialization helps most where the document is large or scalar-heavy: the 10,000-element array is nearly five times faster than parse-then-validate and well over twice as fast as a strict pydantic adapter on the same input.
Nodes that compare against a Python object — literals, refinements, instance and
object checks, and predicates — materialize just the value at that node, since
the comparison runs in Python. The validate_json explain path still
materializes the whole document (it reports Python-level value summaries in its
errors); only the is_valid_json fast path is fully in place.