Document Validation and What Dynamic Schema Means

When we first published a mongodb.org homepage, we sloppily described MongoDB as “schema free”. That description over-emphasizes the baggage MongoDB left behind, at the expense of true clarity. At the time, however, document databases were brand new, and it was simple to describe them in terms of what they were not (witness the prevalence of the terms “non-relational” and “nosql”). This over-simplification was much more than an oversight. As you can see by reviewing this old blog post, it reflects an immaturity in our thinking. By 2011 we had come to see that calling MongoDB “schema free” reflected an old way of thinking about what “schemas” actually are, so we changed the homepage to say “dynamic schema”.

To appreciate the context for this evolution, recall that when we launched MongoDB, “schema” meant the tables your data was stored in, and the rules that governed the relationship between those tables. Relational schemas have a fixed structure, with strongly typed fields, so complex entities can only be modeled as collections of tables, with their relationships to each other also strongly defined. So schemas are fixed, and altering them is a high cost operation. It seemed correct to say that MongoDB was free of schema.

The DDL used to define a relational schema affords a few additional usability benefits as a side effect of how it requires data to conform to the relational model. Two key benefits: schemas provide documentation of what data is in a table (if you’ve seen one row, you’ve seen ’em all!), and validation of the fields, by their very definition.

At this point it seems needlessly reductionist to call MongoDB schema-free, since of course, MongoDB and the apps built on it have always had schema, they just embodied them in their queries and the indexes they build to support them, rather than in a table definition. Furthermore, we did plan to offer our users the documentation and validation aspects of schema, but wanted to focus on developing the document model first. When MongoDB was created, we saw more value in doing away with the restrictive elements of tables than keeping them for their side effects, especially when they could be delivered as features, deliberately designed to suit the needs of developers and operators.

In MongoDB 3.2 we are following through on that plan, and one of those features is document validation. To use it, you attach a validation document to a collection. Validation documents use the mongodb query language to add constraints to the documents inserted to that collection. An example validator might be:

{ age : { $gte : 0, $lte : 150 } }

If someone tried to insert a document with a null or missing age, the document would be rejected. If you tried to insert 32 as a string or -5 it would also be rejected. This allows the database to enforce some simple constraints about the content of the documents, similar to PostgreSQL’s check constraints.

One common use case for MongoDB is aggregating data from different sources. With document validation, you’ll be able to ensure that all of the sources have some common fields (like ‘email’) so they can be linked.

You can attach a validation document to a collection at creation time, by including it as a validator field in the db.createCollection command, or by using the collMod database command:

db.runCommand( {
   collMod: "contacts",
   validator: { $or: [ { phone: { $exists: true } }, { email: { $exists: true } } ] }
} )

There are a number of options that can be used to tune the behavior of validation, such as warn only mode, and how to handle updates that don’t pass validation, so have a look at the dev-series documentation for the complete picture.

Along with the rest of the 3.2 “schema when you need it” features, document validation gives MongoDB a new, powerful way to keep data clean. These are definitely not the final set of tools we will provide, but is rather an important step in how MongoDB handles schema.

Posted on Sep 11, 2015 at 10:34