JSON Schema - Structuring a complex schema (2024)

Reference

Overview

What is JSON Schema?SponsorsSimilar TechnologiesLandscapeCode of Conduct

Getting Started

Creating your first schema

Examples

Miscellaneous examplesModelling a file systemOther examples

Reference

JSON Schema GlossaryLearn JSON SchemaUnderstanding JSON Schema

Conventions usedWhat is a schema?The basicsJSON Schema Reference

Type-specific keywords

stringregular expressionsnumeric typesobjectarraybooleannull

Generic keywords

AnnotationsCommentsEnumerated valuesConstant values

Media: string-encoding non-JSON dataSchema CompositionApplying Subschemas ConditionallyDeclaring a Dialect

Structuring a complex schema

For implementers

Common Interfaces across Implementations

Specification

OverviewSpecification Links2020-12 notes2019-09 notesdraft-07 notesdraft-06 notesdraft-05 notes

JSON Hyper-Schema

2019-09 notesdraft-07 notesdraft-06 notes

Overview

What is JSON Schema?SponsorsSimilar TechnologiesLandscapeCode of Conduct

Getting Started

Schema Identification

Like any other code, schemas are easier to maintain if they can bebroken down into logical units that reference each other as necessary.In order to reference a schema, we need a way to identify a schema.Schema documents are identified by non-relative URIs.

Schema documents are not required to have an identifier, but you willneed one if you want to reference one schema from another. In thisdocumentation, we will refer to schemas with no identifier as"anonymous schemas".

In the following sections we will see how the "identifier" for aschema is determined.

URI terminology can sometimes be unintuitive. In this document, thefollowing definitions are used.

URI [[1]](https://datatracker.ietf.org/doc/html/rfc3986#section-3) or non-relative URI: A full URI containing a scheme (https). It may contain a URI fragment (#foo). Sometimes this document will use "non-relative URI" to make it extra clear that relative URIs are not allowed.
relative reference [[2]](https://datatracker.ietf.org/doc/html/rfc3986#section-4.2): A partial URI that does not contain a scheme (https). It may contain a fragment (#foo).
URI-reference [[3]](https://datatracker.ietf.org/doc/html/rfc3986#section-4.1): A relative reference or non-relative URI. It may contain a URI fragment (#foo).
absolute URI [[4]](https://datatracker.ietf.org/doc/html/rfc3986#section-4.3) A full URI containing a scheme (https) but not a URI fragment (#foo).

Even though schemas are identified by URIs, those identifiers are notnecessarily network-addressable. They are just identifiers. Generally,implementations don't make HTTP requests (https://) or read from thefile system (file://) to fetch schemas. Instead, they provide a way toload schemas into an internal schema database. When a schema isreferenced by it's URI identifier, the schema is retrieved from theinternal schema database.

Base URI

Using non-relative URIs can be cumbersome, so any URIs used in JSONSchema can be URI-references that resolve against the schema's base URIresulting in a non-relative URI. This section describes how a schema'sbase URI is determined.

Base URI determination and relative reference resolution is defined byRFC-3986. Ifyou are familiar with how this works in HTML, this section should feelvery familiar.

Retrieval URI

The URI used to fetch a schema is known as the "retrieval URI". It'soften possible to pass an anonymous schema to an implementation in whichcase that schema would have no retrieval URI.

Let's assume a schema is referenced using the URIhttps://example.com/schemas/address and the following schema isretrieved.

schema

{ "type": "object", "properties": { "street_address": { "type": "string" }, "city": { "type": "string" }, "state": { "type": "string" } }, "required": ["street_address", "city", "state"]}

The base URI for this schema is the same as the retrieval URI, https://example.com/schemas/address.

$id

You can set the base URI by using the $id keyword at the root of theschema. The value of $id is a URI-reference without a fragment thatresolves against the retrieval-uri. The resulting URI isthe base URI for the schema.

Draft-specific info

Draft 4

Draft 4-7

In Draft 4, $id is just id (without the dollar sign).

This is analogous to the <base> tag in HTML.

JSON Pointer

In addition to identifying a schema document, you can also identifysubschemas. The most common way to do that is to use a JSONPointer in the URI fragment thatpoints to the subschema.

A JSON Pointer describes a slash-separated path to traverse the keys inthe objects in the document. Therefore, /properties/street_addressmeans:

1) find the value of the key properties
2) within that object, find the value of the key street_address

The URI https://example.com/schemas/address#/properties/street_addressidentifies the highlighted subschema in the following schema.

schema

{ "$id": "https://example.com/schemas/address", "type": "object", "properties": { "street_address": { "type": "string" }, "city": { "type": "string" }, "state": { "type": "string" } }, "required": ["street_address", "city", "state"]}

$anchor

A less common way to identify a subschema is to create a named anchor inthe schema using the $anchor keyword and using that name in the URIfragment. Anchors must start with a letter followed by any number ofletters, digits, -, _, :, or ..

Draft-specific info

Draft 4

Draft 6-7

In Draft 4, you declare an anchor the same way you do in Draft 6-7except that $id is just id (without the dollar sign).

If a named anchor is defined that doesn't follow these naming rules,then behavior is undefined. Your anchors might work in someimplementation, but not others.

The URI https://example.com/schemas/address#street_address identifiesthe subschema on the highlighted part of the following schema.

schema

{ "$id": "https://example.com/schemas/address", "type": "object", "properties": { "street_address": { "$anchor": "street_address", "type": "string" }, "city": { "type": "string" }, "state": { "type": "string" } }, "required": ["street_address", "city", "state"]}

$ref

A schema can reference another schema using the $ref keyword. Thevalue of $ref is a URI-reference that is resolved against theschema's Base URI. When evaluating a $ref, animplementation uses the resolved identifier to retrieve the referencedschema and applies that schema to the instance.

Draft-specific info

In Draft 4-7, $ref behaves a little differently. When an object contains a $ref property, the object is considered a reference, not a schema. Therefore, any other properties you put in that object will not be treated as JSON Schema keywords and will be ignored by the validator. $ref can only be used where a schema is expected.

For this example, let's say we want to define a customer record, whereeach customer may have both a shipping and a billing address. Addressesare always the same — they have a street address, city andstate — so we don't want to duplicate that part of the schemaeverywhere we want to store an address. Not only would that make theschema more verbose, but it makes updating it in the future moredifficult. If our imaginary company were to start doing internationalbusiness in the future and we wanted to add a country field to all theaddresses, it would be better to do this in a single place rather thaneverywhere that addresses are used.

schema

{ "$id": "https://example.com/schemas/customer",
"type": "object", "properties": { "first_name": { "type": "string" }, "last_name": { "type": "string" }, "shipping_address": { "$ref": "/schemas/address" }, "billing_address": { "$ref": "/schemas/address" } }, "required": ["first_name", "last_name", "shipping_address", "billing_address"]}

The URI-references in $ref resolve against the schema'sBase URI (https://example.com/schemas/customer) whichresults in https://example.com/schemas/address. The implementationretrieves that schema and uses it to evaluate the "shipping_address"and "billing_address" properties.

When using $ref in an anonymous schema, relative references may not beresolvable. Let's assume this example is used as an anonymous schema

schema

{ "type": "object", "properties": { "first_name": { "type": "string" }, "last_name": { "type": "string" }, "shipping_address": { "$ref": "https://example.com/schemas/address" }, "billing_address": { "$ref": "/schemas/address" } }, "required": ["first_name", "last_name", "shipping_address", "billing_address"]}

The $ref at /properties/shipping_address can resolve just finewithout a non-relative base URI to resolve against, but the $ref at/properties/billing_address can't resolve to a non-relative URI andtherefore can't be used to retrieve the address schema.

$defs

Sometimes we have small subschemas that are only intended for use in thecurrent schema and it doesn't make sense to define them as separateschemas. Although we can identify any subschema using JSON Pointers ornamed anchors, the $defs keyword gives us a standardized place to keepsubschemas intended for reuse in the current schema document.

Let's extend the previous customer schema example to use a commonschema for the name properties. It doesn't make sense to define a newschema for this and it will only be used in this schema, so it's a goodcandidate for using $defs.

schema

{ "$id": "https://example.com/schemas/customer",
"type": "object", "properties": { "first_name": { "$ref": "#/$defs/name" }, "last_name": { "$ref": "#/$defs/name" }, "shipping_address": { "$ref": "/schemas/address" }, "billing_address": { "$ref": "/schemas/address" } }, "required": ["first_name", "last_name", "shipping_address", "billing_address"],
"$defs": { "name": { "type": "string" } }}

$defs isn't just good for avoiding duplication. It can also be usefulfor writing schemas that are easier to read and maintain. Complex partsof the schema can be defined in $defs with descriptive names andreferenced where it's needed. This allows readers of the schema to morequickly and easily understand the schema at a high level before divinginto the more complex parts.

It's possible to reference an external subschema, but generally youwant to limit a $ref to referencing either an external schema or aninternal subschema defined in $defs.

Recursion

The $ref keyword may be used to create recursive schemas that refer tothemselves. For example, you might have a person schema that has anarray of children, each of which are also person instances.

schema

{ "type": "object", "properties": { "name": { "type": "string" }, "children": { "type": "array", "items": { "$ref": "#" } } }}

A snippet of the British royal family tree

data

{ "name": "Elizabeth", "children": [ { "name": "Charles", "children": [ { "name": "William", "children": [ { "name": "George" }, { "name": "Charlotte" } ] }, { "name": "Harry" } ] } ]}

compliant to schema

Above, we created a schema that refers to itself, effectively creating a"loop" in the validator, which is both allowed and useful. Note,however, that a $ref referring to another $ref could cause aninfinite loop in the resolver, and is explicitly disallowed.

schema

{ "$defs": { "alice": { "$ref": "#/$defs/bob" }, "bob": { "$ref": "#/$defs/alice" } }}

Extending Recursive Schemas

New in draft 2019-09

Documentation Coming Soon

Bundling

Working with multiple schema documents is convenient for development,but it's often more convenient for distribution to bundle all of yourschemas into a single schema document. This can be done using the $idkeyword in a subschema. When $id is used in a subschema, it indicatesan embedded schema. The identifier for the embedded schema is the valueof $id resolved against the Base URI of the schema itappears in. A schema document that includes embedded schemas is called aCompound Schema Document. Each schema with an $id in a Compound SchemaDocument is called a Schema Resource.

Draft-specific info

Draft 4

Draft 4-7

In Draft 4, $id is just id (without the dollar sign).

This is analogous to the <iframe> tag in HTML.

It is unusual to use embedded schemas when developing schemas. It'sgenerally best not to use this feature explicitly and use schemabundling tools to construct bundled schemas if such a thing is needed.:::

This example shows the customer schema example and the address schemaexample bundled into a Compound Schema Document.

schema

{ "$id": "https://example.com/schemas/customer", "$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object", "properties": { "first_name": { "type": "string" }, "last_name": { "type": "string" }, "shipping_address": { "$ref": "/schemas/address" }, "billing_address": { "$ref": "/schemas/address" } }, "required": ["first_name", "last_name", "shipping_address", "billing_address"],
"$defs": { "address": { "$id": "https://example.com/schemas/customer", "$schema": "http://json-schema.org/draft-07/schema#",
"type": "object", "properties": { "street_address": { "type": "string" }, "city": { "type": "string" }, "state": { "$ref": "#/definitions/state" } }, "required": ["street_address", "city", "state"],
"definitions": { "state": { "enum": ["CA", "NY", "... etc ..."] } } } }}

All references in a Compound Schema Document need to be the same whetherthe Schema Resources are bundled or not. Notice that the $ref keywordsfrom the customer schema have not changed. The only difference is thatthe address schema is now defined at /$defs/address instead of aseparate schema document. You couldn't use #/$defs/address toreference the address schema because if you unbundled the schema, thatreference would no longer point to the address schema.

Draft-specific info

In Draft 4-7, both of these URIs are valid because a subschema $id only represented a base URI change, not an embedded schema. However, even though it's allowed, it's still highly recommended that JSON Pointers don't cross a schema with a base URI change.

You should also see that "$ref": "#/definitions/state" resolves to thedefinitions keyword in the address schema rather than the one at thetop level schema like it would if the embedded schema wasn't used.

Each Schema Resource is evaluated independently and may use differentJSON Schema dialects. The example above has the address Schema Resourceusing Draft 7 while the customer Schema Resource uses Draft 2020-12. Ifno $schema is declared in an embedded schema, it defaults to using thedialect of the parent schema.

Draft-specific info

In Draft 4-7, a subschema $id is just a base URI change and not considered an independent Schema Resource. Because $schema is only allowed at the root of a Schema Resource, all schemas bundled using subschema $id must use the same dialect.

Draft-specific info

In Draft 2020-12, support for changing dialects in an embedded schema (using $schema with a different value than the parent schema) was added.

Need Help?

Did you find these docs helpful?

Help us make our docs great!

At JSON Schema, we value docs contributions as much as every other type of contribution!

Edit this page on Github

Learn how to contribute

Still Need Help?

Learning JSON Schema is often confusing, but don't worry, we are here to help!.

Ask the community on GitHub

Ask the community on Slack