Relationships and denormalization
Even non-relational databases have relationships!
With traditional relational databases, a modeler tries to minimize data redundancy through normalization. Data redundancy is considered a bad practice. Given the need to often work with data from multiple tables, they share a common value, known as foreign key, so joins can be performed. Some integrity of the data is enforced by the database and its structure, but the execution of joins when reading data can adversely affect performance.
Document databases provide scalability and flexibility, but the responsibility to avoid data anomalies resides in the application code. Some of the performance is achieved by avoiding costly joins, in a process known as denormalization. The goal is to keep data that is frequently used together in one document. This allows the document database to minimize the number of times it must read from disk.
So why is Hackolade providing support for foreign key relationships? With complex systems using document databases, a lot of data may be duplicated in different places. Even if referential integrity is not enforced by the database itself, but by the application code, there is a need to keep track of all the places where a common piece of data is stored. In effect, there are logical relationships in the data, even if the database does not enforce them.
In addition to implicit relationships due to embedding, Hackolade documents and helps visualize 2 types of relationships: foreign key and foreign master. The foreign key relationship is the unique identifier of the data. The foreign master relationship identifies the master for the duplicated field of the denormalized data.
Let's take a simplified example... Say we have a master collection of customers, each identified by a unique id. When creating a document for each sales order, the customer name and address are repeated so as to avoid having to perform joins each time an order is accessed.
Three relationships can be documented in the model:
- a foreign key relationship for the customerID in the customers collection
- a foreign master relationship for the customerName
- a foreign master relationship for the customer address
The interest of documenting these relationships is that a routine could be implemented such that, if the address of a customer changes, all pending sales orders for that customer should be updated. The update process would then look for the customerID in the salesOrders collection and update the customerAddress field, in this case only if the order has not shipped yet.
Hackolade loosely adopts some ER concepts to a NoSQL document environment to help keep complex schemas under control.
Originally, MongoDB did not support server-side foreign key relationships. With v3.2, MongoDB introduced a $lookup function in the aggregation pipeline to perform a left-outer join with another collection.
With the following syntax to perform the join:
- leftCollection is the collection that the aggregation is being performed on and is the left collection in the join
- from identifies the collection that it will be joined with – the right collection (rightCollection in this case)
- localField specifies the key from the original/left collection – leftVal
- foreignField specifies the key from the right collection – rightVal
- as indicates that the data from the right collection should be embedded within the resulting documents as an array called embeddedData