Overview of JSON and JSON Schema
You may also view this tutorial on YouTube. Summary slides can be found here.
JSON
JSON stands for JavaScript Object Notation. It is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate.
According to json.org, JSON is built on two structures:
- A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.
- An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.
An object is an unordered set of name/value pairs. An object begins with a { left brace and end with a } right brace. Each name is followed by a : colon and the name/value pairs are separated by a , comma.
An array is an ordered collection of values. An array begins with a [ left bracket and end with a ] right bracket. Values are separate by a ,comma.
A value can be a string in double quotes, or a number, or true or false or null, or an object or an array. These structures can be nested.
Keys must be strings (text) and values must be valid JSON data types: string, number, another JSON object, array, boolean or null.
There are infinite ways to organize JSON objects, depending on your needs. It can be as simple as a list or attributes (keys) and values, or it can become very complex with nested JSON objects, arrays of JSON objects, arrays inside attributes, etc...
JSON in databases
One of the big advantages of document databases such as MongoDB, Couchbase, DocumentDB, Elasticsearch, etc., besides their distributed horizontal scalability, is that you can leverage the flexibility and easy evolution of JSON documents.
- you can embed information using objects, arrays or a combination thereof to capture relationships between data into a single document structure, thereby denormalizing the data and avoiding expensive joins while maintaining integrity of the transactions
- collections of documents do not require that all the documents have the same schema: some may have more fields than others, plus the data type of some fields may differ across documents of the same collection. Be careful however about this point... While the flexibility is wonderful, it should be carefully managed to not lead to chaos.
The benefits of a single document atomicity with embedded objects is best illustrated with a shopping cart or an invoice. With relational databases, the storage of this information would require multiple tables with foreign keys, implying that data is split between all these tables when storing the data.
Then, when comes the time to display the information on screen or print the invoice, the information needs to be gathered again, using lengthy and CPU-intensive joins. Besides the performance impact of these joins, developers encounter difficulties known as impedance mismatch.