Moving from Tables to Documents with MongoDB

I’m going to ask you to set aside your concept of “proper data modeling” and “3rd normal form.”  Going forward in this article, those concepts will hold you back.  Some DBA’s and data modelers become angry at this suggestion.  If that’s you… welcome – but please hold judgement until you read this complete post.

Data normalization focuses on organization of the data for the purpose of eliminating duplication.  It’s a data-focused approach.  Not an application-focused approach.

With document data modeling, we flip the script and we turn to our application to answer the questions about how we organize the data.  Before we get into exactly how to do this, let’s delve a bit into the possible structures of a document.

MongoDB is document-based.  That is to say that it stores data in documents.  JSON-like documents to be more specific.  Most people, might think of Microsoft Word Documents, or PDF documents when I mention the word document – and while it is true that MongoDB can store these types of documents, what I’m really talking about is JSON documents.

JSON

JavaScript Object Notation (JSON)  is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate.  It is based on a subset of the JavaScript Programming Language.

When you store data in JSON documents, you approach data modeling quite differently than as with Relational technologies.  You see, relational technologies were developed in the 1970’s and 1980’s when disk space was extremely expensive.  Thousands and even 10’s of Thousands of dollars per gigabyte of disk was not unusual early on.  So to preserve this most valuable resource, relational technologies developed the concept of data normalization with a set of rules.

Normalization

Normalization is the systematic method of deconstructing data into tables to eliminate redundancy and undesirable characteristics like Insertion, Update and Deletion Anomalies.  It is a multi-step process that puts data into tables: rows and columns, by removing duplicated data from the tables.

Normalization is used for mainly two purposes:

  • Eliminate redundant data.
  • Ensure data dependencies make sense i.e data is logically stored in line with the core objectives stated above.

Normalization techniques and the rules associated with it are all well and good if you intend to leverage a relational database technology.  However, as discussed, MongoDB is document-based… i.e. non-relational.

That is not to say that you cannot define and maintain relationships between data elements in your document model.  However, this is not a primary constraint when building a document-based data model.

Rich Data Structures

JSON documents are relatively simple structures.  They begin with a curly-brace and end with a curly-brace.  In between these braces, you have a set of key value pairs delimited by commas.  Here’s an example:

In my example, I’ve tidied things up using indents (spaces before the keys) but this is not necessary.  The above example is extremely simple.  These structures can get quite complex and rich.  The above example includes keys, and values.  The keys in all cases with JSON are strings.  The values however, can be string, numeric, decimal, Dates, Arrays, Objects, Arrays of Embedded Objects, and so forth.  Let’s look at a more complex example:

As you can see, the values don’t have to be simple strings or numbers.  They can be quite complex.  Now, if you’re aware of JSON, you might be saying something like – wait a minute, JSON only supports strings and numbers… and you’d be correct.

At the beginning of this article, I stated specifically that MongoDB stores data in JSON-like documents.  We actually, store the data in BSON documents.  BSON is a Binary representation of the JSON document.  You can read all about this standard at bsonspec.org.

We use BSON so that we can honor the types not supported by JSON… to make it easier for developers to store rich data types and not have to marshal them back from their non-native forms.  When you write a decimal in MongoDB, and then read it back – it comes to you via the drive in decimal form.

Now that we understand a bit about how MongoDB stores and organizes data in document structures, let’s address migrating data from a relational structure to a document-based data model with MongoDB.

Let’s use the obligatory Books and Authors example, not because it’s stunningly brilliant, no – because I’m lazy and it happens to be something to which we can all relate.

Consider the following ERD.

In this simple example, we have two tables.  Authors, and Books.  There is a relationship expressed between these two tables in that Books have an Author.  Rather than storing this data together, we’re storing it separately and expressing the relationship through LINKING.

With MongoDB, we can store this very same information but instead of linking between two disparately, separate locations, we can EMBED the same data.  Consider the following:

In this example, we’ve designed a document structure by creating an actual document.  Notice in the previous, relational example, we created an ERD, or Entity Relationship diagram.  This same ERD may be useful for us as we model our data in documents… but the difference is that with MongoDB, there is no separate, distinct schema.  The schema does not live separately from the actual documents.

In atomic and molecular physics, there’s a concept known as the observer effect.  This applies here to the concept of a schema with MongoDB.  If you don’t look at the data, the schema does not exist.  It’s not until you observe the data do you see that a schema defining what keys / values you have truly exists.

Now, you may begin to wonder something along the lines of what if a data element in the subordinate changes?  What if a subdocument element such as book title changes?  Unlikely, I suppose but possible.  And since we’re storing book titles inside of an Author Record, and possibly even storing the very same information such as book title, description, etc. in another collection specific to these data elements, how will we address this change?  ARE YOU SAYING WE MAY HAVE TO UPDATE THE DATA MORE THAN ONCE!?!

Yes.

Calm down.  We’re not under the same constraints as relational developers.  We own the destiny of our document structures as well as the content.

We can change data multiple times, in multiple locations.

But… but but that’s wrong.  I feel your terror.  It’s not wrong because we don’t adhere to data normalization rules.  Who cares?  Who cares if we store data in multiple locations – we are the czar of our data and we control when its written with our code.  We are no longer beholden to a schema wielding dba.  We are that dba.  If this feels wrong, you’re not alone.  But trust me the benefits of this approach far out-way the drawbacks.

Benefits of the Document Model Approach

Benefit One: Data Locality – Data that’s accessed together is stored together

When we toss out normalization we gain the notion of “Data that’s accessed together is stored together”, otherwise known as data locality.  An Author document contains all relevant details about an author including the books he or she has written.  When my application needs this data, it issues a read and in most cases, a single read fetches ALL of the data needed.  In relational, or normalized data, a single read gets me perhaps a single row in a single table and then I need to issue another read to get the related data from an additional table.  Multiple reads equals multiple disk seeks equals slower performance for my application.

Benefit Two: Readability

When all the data that’s accessed together is stored together, it’s just logical – it makes sense – you can see it all at once in a document.  Whereas with relational technologies, you must issue SQL commands with JOIN clauses to pull data from multiple locations.  Much less readable.

Benefit Three: Flexibility and Agility

When we store data in documents, adding, removing or modifying the data structures is much easier.  There literally is no governing schema.  We simply modify the code we use to update the data in the database.  We have no external schema to modify.  Therefore, we gain the flexibility to make these changes without stopping the database… without issuing an “alter table” command.

Conclusion

In this first of a series of articles on migrating from relational to documents, we’ve looked at how data is stored in MongoDB, what documents are, the structure of JSON and BSON and explored just a few of the benefits.  While the examples are basic, I hope these have illustrated the power and flexibility of this modern approach to data storage.

In my next article, I’ll tackle a bit more challenging relational schema and convert that to documents and incorporate the code used to maintain the data.

If you’re interested in this topic but need a more structured approach to enablement and learning, MongoDB has amazing resources to help you wrap your mind around the document model.  I highly recommend MongoDB University  if you’re new – or trying to improve your knowledge of MongoDB.

Please leave a comment, ask a question or reach out to me on LinkedIn with feedback.

 

 

One Reply to “Moving from Tables to Documents with MongoDB”

Leave a Reply

Your email address will not be published. Required fields are marked *