MongoDB Field-Level Encryption
Starting with v4.2, MongoDB provides a field level encryption ("FLE") framework, both server-side and client-side. Applications can encrypt fields in documents prior to transmitting data over the wire to the server. Only applications with access to the correct encryption keys can decrypt and read the protected data. Deleting an encryption key renders all data encrypted using that key as permanently unreadable.
Starting with v5.4.9 of Hackolade Studio, we support MongoDB FLE functionality, which is also known as In-Use Encryption.
Note: using Client-Side FLE alongside in-flight and at-rest encryption gives an end-to-end, complementary approach in building applications that provide a defense-in-depth security posture to address different threat models.
- In-flight encryption protects all data traversing the network, but does not encrypt data in-memory or at-rest.
- At-rest encryption protects all stored data, but does not encrypt data in-memory or in-flight.
- With client-side encryption, the most sensitive data never leaves applications in plain text. Fields that are encrypted client-side remain encrypted over the network, as they are being processed in database server memory, and at-rest in storage, backups, and logs.
As explained in this MongoDB page, consider the following document:
{
"name" : "John Doe",
"address" : {
"street" : "1234 Main Street",
"city" : "MongoDBVille",
"zip" : 99999
},
"phone" : "949-555-1212",
"ssn" : "123-45-6789"
}
With field-level encryption, sensitive information like the ssn and phone can be encrypted. Encrypted fields are stored as binary data with subtype 6
{
"name" : "John Doe",
"address" : {
"street" : "1234 Main Street",
"city" : "MongoDBVille",
"zip" : 99999
},
"phone" : BinData(6,"U2FsdGVkX1+CGIDGUnGgtS46+c7R5u17SwPDEmzyCbA="),
"ssn" : BinData(6,"AaloEw285E3AnfjP+r8ph2YCvMI1+rWzpZK97tV6iz0jx")
}
Note: While MongoDB calls this feature "Field-Level Encryption", the encryption can actually be applied at the collection level as well.
Watch an animation of the field-level encryption process, along with the following legend of the steps:
Enforcement strategies
Either server-side or client-side encryption can be used, or both. It is a good idea to have both server-side and client-side FLE because they complement each other. In case of a legacy client, or a misconfigured client, server-side FLE eliminates the possibility that any plain text is used to insert or update a document, when it was meant to be encrypted. Conversely, a single person with access to the database does not have the power to disconnect field-level encryption if it is also implemented client-side.
The configuration server-side leverages the familiar $jsonschema validator to declare the encryption rules, for example for the phone and ssn fields below:
db.runCommand({
"collMod": "employee",
"validator": {
"$jsonSchema": {
"bsonType": "object",
"title": "employee",
"properties": {
"_id": { "bsonType": "objectId" },
"name": { "bsonType": "string" },
"address": {
"bsonType": "object",
"properties": {
"street": { "bsonType": "string" },
"city": { "bsonType": "string" },
"zip": { "bsonType": "int"
}
},
"additionalProperties": false
},
"phone": {
"encrypt" : {
"keyId" : [UUID("e114f7ad-ad7a-4a68-81a7-ebcb9ea0953a")],
"algorithm" : "AEAD_AES_256_CBC_HMAC_SHA_512-Random",
"bsonType": "string"
}
},
"ssn": {
encrypt" : {
"keyId" : [UUID("33408ee9-e499-43f9-89fe-5f8533870617")],
"algorithm" : "AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic",
"bsonType": "string"
}
}
},
"additionalProperties": false
}
},
"validationLevel": "off",
"validationAction": "warn"
});
MongoDB also supports two methods of client-side field level encryption:
- Automatic encryption of fields: applications must create a database connection object using the Mongo() constructor with the automatic encryption configuration settings. The configuration settings must include automatic encryption rules using a strict subset of the $jsonschema validator (same as the server-side encryption described above.) Applications do not have to modify code associated with the read/write operation. See Automatic Encryption Rules for complete documentation on automatic encryption rules.
- Explicit (manual) encryption of fields: applications are responsible for selecting the appropriate data encryption key for encryption/decryption on a per-operation basis. The connection is also established using the Mongo() constructor, but without declaring a $jsonschema validator. For more information, see this page.
In theory this method should not affect Hackolade Studio from a data modeling perspective. But when performing reverse-engineering, we'll be encountering encrypted fields not declared in the $jsonchema validator.
Warning: The automatic feature of field level encryption is only available in MongoDB Enterprise 4.2 or later, and MongoDB Atlas 4.2 or later clusters. Most of our major customers are on Enterprise or Atlas.
Community | Enterprise | Atlas | |
---|---|---|---|
Automatic encryption | -- | X | X |
Explicit (manual) encryption | X | X | X |
Encryptions algorithms
MongoDB client-side field level encryption uses the encrypt-then-MAC approach combined with either a deterministic or random initialization vector to encrypt field values. MongoDB only supports the AEAD AES-256-CBC encryption algorithm with HMAC-SHA-512 MAC.
-
deterministic: gives the same value every time the data is encrypted. While deterministic encryption provides greater support for read operations, encrypted data with low cardinality is susceptible to frequency analysis recovery.
- supports direct match (equality) against encrypted fields
- uses indexes to provide efficient data access
- does not support the more complex queries (ranges, aggregations) directly against encrypted fields
-
randomized: gives a different value every time the data is encrypted. While randomized encryption provides the strongest guarantees of data confidentiality, it also prevents support for any read operations which must operate on the encrypted field to evaluate the query.
- strongest level of protection
- prevents direct match (equality) queries against encrypted fields
Note: Randomized encryption also supports encrypting entire objects or arrays. While this protects all fields nested under those fields, it also prevents querying against those nested fields.
More info: For sensitive fields that are used in read operations, applications must use Deterministic Encryption for improved read support on encrypted fields.
For sensitive fields that are not used in read operations, applications may use Randomized Encryption for improved protection.
More advanced queries should be run on unencrypted fields.
$jsonschema validator syntax
encrypt Schema Keyword
The encrypt object can contain only the following fields:
-
keyId: Array of single UUID. The UUID of the data encryption key to use for encrypting field values. The UUID is a BSON binary data element of subtype 4.
-
algorithm: Indicates which encryption algorithm to use when encrypting values of <fieldName>. Supports the algorithms:
- AEAD_AES_256_CBC_HMAC_SHA_512-Random
- AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic
Warning:
-
if the algorithm is deterministic, then:
-
bsonType must specify a single value -- multiple data types are NOT allowed
-
bsonType does not support any of the following BSON types:
- double
- decimal128
- bool
- object
- array
- javascriptWithScope
- minKey
- maxKey
- null
- undefined
-
-
if the algorithm is random, then:
-
bsonType may specify an array of supported bson types -- multiple data types are allowed
-
for fields with bsonType of array or object, the client encrypts the entire array or object and not their individual elements.
-
bsonType does not support any of the following BSON types:
- minKey
- maxKey
- null
- undefined
-
- encrypt cannot have any sibling fields in the <fieldName> object. encrypt must be the only child of the <fieldName> object. -- all other constraints should ideally be disabled, or at least filtered from the validator
- encrypt cannot be specified within any subschema of the items or additionalItems keywords. Specifically, automatic client-side field level encryption does not support encrypting individual elements of an array.
"phone": {
"encrypt" : {
"keyId" : [UUID("e114f7ad-ad7a-4a68-81a7-ebcb9ea0953a")],
"algorithm" : "AEAD_AES_256_CBC_HMAC_SHA_512-Random",
"bsonType": "string"
}
},
encryptMetadata Schema Keyword
Note: only at collection level (root object), and optionally for objects inside a collection
encryptMetadata defines encryption options which an encrypt object nested in the sibling properties may inherit. If an encrypt in a nested field is missing an option required to support encryption, mongocryptd searches the entire tree of parent objects to locate an encryptMetadata object that specifies the missing. option.
encryptMetadata must be specified in subschemas with bsonType: "object". encryptMetadata cannot be specified to any subschema of the items or additionalItems keywords. Specifically, automatic client-side field level encryption does not support encrypting individual elements of an array.
The encryptMetadata object can contain only the following fields:
-
keyId: Array of single UUID. The UUID of the data encryption key to use for encrypting field values. The UUID is a BSON binary data element of subtype 4.
-
algorithm: Indicates which encryption algorithm to use when encrypting values of <fieldName>. Supports the algorithms:
- AEAD_AES_256_CBC_HMAC_SHA_512-Random
- AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic
Including any other field to the encrypt object results in errors when issuing automatically encrypted read or write operations.
db.runCommand({
"collMod": "employee",
"validator": {
"$jsonSchema": {
"title": "employee",
"bsonType": "object",
"encryptMetadata" : {
"keyId" : [UUID("6c512f5e-09bc-434f-b6db-c42eee30c6b1")],
"algorithm" : "AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic"
},
"properties": {
"_id": { "bsonType": "objectId" },
"name": { "bsonType": "string" },
"address": {
"bsonType": "object",
"properties": {
"street": { "bsonType": "string" },
"city": { "bsonType": "string" },
"zip": { "bsonType": "int"
}
},
"additionalProperties": false
},
"phone": { "bsonType": "string" },
"ssn": { "bsonType": "string" }
},
"additionalProperties": false
}
},
"validationLevel": "off",
"validationAction": "warn"
});
Note: the structure is slightly different than for encrypt, with "bsonType": "object", being outside the encryptMetadata structure.
See detailed examples here on how to encrypt multiple fields with individual encrypt, or with encryptMetadata inheritance.
Note: in an object that had encryptMetadatafor inheritance, it is possible for a nested field to have its own encryption that overrides the parent object encryptMetadata. This is to allow a stricter random algorithm in a nested field than the parent, or a different keyId.
Forward-Engineering
JSON Data sample
The encrypted field appears encrypted in the JSON sample, specifically with random binary data with subtype 6, e.g; BinData(6,"U2FsdGVkX1+CGIDGUnGgtS46+c7R5u17SwPDEmzyCbA="). In the case of objects with inheritance, the object must appear, but each nested field should be encrypted with a sample in random binary data with subtype 6.
MongoDB script (both individual collection tab and model tab)
The $jsonschema validator is enriched:
-
if encryption is enabled AND explicit (manual) is disabled
-
taking into account the difference in structure
- if encrypt, then create structure with bsonType, keyId, and algorithm
- if encryptMetadata, then create structure with keyIdand algorithm
Note: that the keyId and/or algorithm properties can be empty for a given field that has been flagged for encryption, but only if there's an encryptMetadata higher in the object hierarchy. We should probably add this rule to the MongoDB script linter, rather than creating PP rules?
Reverse-Engineering
When performing reverse-engineering, the usual process is to fetch the $jsonschema validator script, then to go through the sampling and schema inference process.
If the $jsonschema validator includes the encrypt and/or encryptMetadata keywords, the process populates the collection and field properties accordingly.
If during schema inference, we encounter a field with binary data with subtype 6 that was not identified as encrypted in the $jsonschema validator (or there was not $jsonschema validator), we can assume that there was explicit (manual) encryption.
But for this process to work as described above, it is necessary to provide the proper parameters in Connections Settings. In this page, you will find the parameters required for the application to infer the data type of encrypted fields as well as all the necessary metadata.
Failing to do so will show every encrypted field with a binary data type:
instead of the actual data type:
Consult this MongoDB field-level encryption guide for more details.