Skip to main content

Document Database: Definition, Features, Use Cases

· 17 min read
Alexander Fashakin

Document databases

In this blog post, we will discuss document databases, diving into their history, features, benefits, use cases, and some examples like FerretDB, MongoDB, and Couchbase.

Since the inception of databases in the 1960s, the world of data has undergone significant changes, going from relational databases, which were the norm for decades, to the emergence of NoSQL databases in the 2000s. A constant during this period has been the need for databases to handle large volumes of data and offer fast data retrieval in a secure and consistent manner.

While relational databases like Oracle, MySQL, and PostgreSQL have long been the stalwarts of the database world, there was concern on their capabilities to handle unstructured data without a fixed schema and the need to pigeonhole data into tables and rows.

In the last two decades, there's been a growing number of applications with huge volumes of unstructured data that would normally be challenging to store and process using traditional relational databases. This concern, among others, gave rise to the NoSQL and document database movement.

In this article, you will learn about document databases, their unique benefits, use cases, and examples that have made it quite popular among developers.

What is a Document Database?

Document databases – or document-oriented databases – are a type of NoSQL database that stores data as JSON-like documents instead of rows, columns, and tables commonly associated with traditional SQL databases.

In relational databases, every data record goes into a table-column-row format that ideally requires a fixed schema beforehand; this was not the case with document databases.

So imagine you are to collate data on a number of published books; one book contains information on the author, title, and number of page, while another adds more information with the publisher, genre, and ISBN.

When modeling these records a relational database, you would have to create a table for these books, and the table would have to contain the same columns, even if some of the columns are empty.

idtitleauthornumber_of_pagespublishergenreisbn
1Book Title 1Author 1200NULLNULLNULL
2Book Title 2Author 2300Publisher 2Genre 2ISBN-2

Situations like this are where document databases truly thrive. Instead of having empty columns, these two books can be stored as separate documents with each containing all the necessary information for that particular book - no fixed schema or structure.

First document:

{
"_id": "uniqueId1",
"title": "Book Title 1",
"author": "Author 1",
"number_of_pages": 200
}

Second document:

{
"_id": "uniqueId2",
"title": "Book Title 2",
"author": "Author 2",
"number_of_pages": 300,
"publisher": "Publisher 2",
"genre": "Genre 2",
"isbn": "ISBN-2"
}

Understanding Documents Data Model

Documents are at the heart of everything in a document database. Documents are akin to real-life blank "documents" where you can enter as much information as possible for that particular record. And just like you have related real-life documents in a drawer, related documents in a document database are stored in collections.

A document in a collection represents a single record of information about an object and any associated metadata, all stored as key-value pairs of data, containing various data types such as numbers, strings, objects, arrays, etc. These documents can be stored in various formats, such as JSON, BSON, YAML, or XML.

For instance, the following is a typical example of a document containing information on a book:

{
title: "The Lord of the Rings",
author: {
name: "J.R.R. Tolkien",
nationality: "British",
},
publication_date: "July 29, 1954",
publisher: "George Allen & Unwin",
genre: ["High Fantasy", "Adventure"],
isbn: "978-0618640157",
number_of_pages: 1178,
has_movie_adaptation: true,
movie_adaptation: {
director: "Peter Jackson",
release_date: "December 19, 2001",
awards: ["Best Picture", "Best Director", "Best Adapted Screenplay"],
box_office: "$1.19 billion"
},
}

In the example, we can see how different types of document data are used to represent various aspects of the book, such as its author, publication information, genre, and movie adaptation.

Data types are another interesting thing to note here; there's no fixed schema so you can model the fields in any data type necessary for that data record. In the example above, the genre field currently stored as an array ["High Fantasy", "Adventure"] can also be updated or available in another document record as a single string object "High Fantasy", or even an object.

This flexibility - often unavailable in relational databases - makes document databases suitable for semi-structured and flexible data that allows them to adapt to a company's evolving needs.

This structure also makes data retrieval easier and faster; instead of sifting through multiple records in different tables using complex joins, which are quite resource and time-intensive, you can query a single document and get all the information you need in a single query.

Benefits of Document Databases

While most of these are already covered, let's look at some of the advantages of using a document database:

Flexible Database Schema

Many applications today do not have a defined data format or structure since new types of information are constantly being added in different data types; emails, social media posts, customer reviews are all examples that show the necessity of a flexible of flexible schema. In these cases, each data record may have different elements such as text, images, hashtags, location data, emojis, etc.

Document databases are incredibly flexible and can accommodate these kind of data, and their unusual nature. While in relational databases, you often end up with many null values for optional columns, fields that don't have a value simply do not need to be included in the document.

All the documents in a collection don't have to have the same fields, or even data types; each document is a separate entity with its own structure and there's no need to predefine a schema or structure. Even at that, the documents can be updated to accommodate new fields and data types without affecting other documents in the collection. You can even have complex data structures with nested objects and arrays.

Such flexibility is truly unheard of in relational databases.

High Scalability

As your applications grow with higher read/write operations and larger data, scalability becomes an important factor to consider since your original set up and resources – CPU, RAM, hard disk etc. – may not be able to handle the increased load.

One significant advantage of document databases over traditional relational databases is their ability to scale horizontally (also known as "sharding"), which is the ability to add more servers (nodes) to your database cluster to handle increased traffic and storage needs. This option, in contrast to vertical scaling, is more cost-effective and offers better performance.

Both relational and non-relational databases have the option to scale vertically where you increase the computational resources based on your needs. Most times, however, the performance and costs of vertical scaling do not scale linearly - you might reach a point of diminishing returns where more resources do not necessary lead to an equal increase in performance. In such cases, you might need to scale horizontally by adding more servers to your database cluster. Moreover, even though it's possible, it's quite challenging and complex to scale horizontally in relational databases due to the presence of multiple related data across nodes.

Horizontal scaling in document databases makes them more fault-tolerant and highly available; even when some nodes fail, the system can remain operational with no single-point of failure. Plus they enable low latency for applications that are globally distributed.

Performance

The two previous benefits mentioned above (flexible schema and high scalability) culminates in document databases being highly-performant, particularly when working with nested objects and documents; you can easily query and update nested objects in a single atomic operation. Applications where this can be a huge advantage include content management systems, social media apps, real-time analytics, IoT applications, and any use case where you need to handle numerous data types and structures.

With the possibility of horizontal scaling, document databases can handle large amounts of data and high traffic loads by just spreading them across multiple distributed nodes. And since related object data are stored in a single document and no need for complex JOIN operations, along with the chance to create indexes for any field - even in a nested object - data retrieval is so much faster.

Document Databases vs. Relational Databases

In a relational database, data is structured in separate tables defined by the programmer so that the same object appears in multiple tables. To get the desired result from the database, you must use join statements.

On the other hand, you can use a document database to store data for all the information about an object as a single database instance, although each object may differ significantly from the others.

Here is a comparison table between document databases and relational databases:

FeatureDocument DatabasesRelational Databases
StructureFlexible schema design, documents can have different structures within the same collectionPredefined, rigid structure with tables, columns, and rows
ScalabilityEfficient horizontal scaling, allowing for easy distribution of data across multiple hostsVertical scaling, adding resources to a single machine to handle more data
QueryingRich querying capabilities, with support for complex nested data structuresLimited support for querying nested data structures, with more focus on join operations
DevelopmentDeveloper-friendly, with a more natural data model for object-oriented programming languagesLess user-friendly data model, more complex SQL queries required
ConsistencyLower consistency guarantees, with more focus on performance and availabilityHigher consistency guarantees, with more focus on data integrity and accuracy
Use CasesBest for handling large volumes of semi-structured or unstructured data, suited for modern web and mobile applicationsBest for handling structured data, suited for traditional business applications

Read more: PostgreSQL vs MongoDB - Understanding a Relational Database vs Document Database

Examples of NoSQL Document Databases

FerretDB

FerretDB is an open source document database alternative to MongoDB with PostgreSQL as the backend. This database was born out of a need to offer a truly open-source alternative to MongoDB after it's switch to SSPL back in 2018.

While it's relatively new to the scene with its first GA release in 2023, FerretDB is already gaining traction and being leveraged by users seeking freedom away from the vendor lock-in associated with MongoDB.

With MongoDB compatibility built-in, FerretDB converts MongoDB wire protocols to SQL in PostgreSQL, allowing you to run MongoDB workloads on PostgreSQL. It translates documents using MongoDB's BSON format to JSONB in PostgreSQL (preserving the order and data types of the document field) through its own mapping system called PJSON (Learn more about this in this blog post).

Users are able to leverage similar syntax and query language as MongoDB, so a insert statement and query in FerretDB looks like this:

db.users.insert({
name: 'John Doe',
age: 25,
address: {
street: '123 Main Street',
city: 'New York',
state: 'NY',
zip: '10001'
}
})
db.users.find({ 'address.state': 'NY' })

OUTPUT:

{
"_id": ObjectId("5f5b9e2e8b0c0f0001a2b2c3"),
"name": "John Doe",
"age": 25,
"address": {
"street": "123 Main Street",
"city": "New York",
"state": "NY",
"zip": "10001"
}
}

It looks like MongoDB, doesn't it? But it's actually FerretDB using PostgreSQL under the hood.

Besides, experienced PostgreSQL users can manage FerretDB using all the extensions and administrative features already available in PostgreSQL, such as replication, backup, and monitoring, while still enjoying the flexibility and ease-of-use associated with MongoDB. You can also use FerretDB with familiar MongoDB GUI applications like Studio3T, Mingo, NoSQLBooster, and more.

In terms of performance, while FerretDB's primary focus is to enable more compatibility with MongoDB, it's also working on improving performance by pushing more queries to the backend. Read more about this here.

In addition to PostgreSQL, FerretDB is also building support for other database backends, like basic experimental support for SQLite (not available yet); our friends at SAP Hana are also currently working on adding SAP HANA compatibility to FerretDB - these are really exciting updates to look forward to.

MongoDB

Among all document databases, MongoDB is indisputably the most popular - and by a wide margin. In particular, it's rich query language, complex query support, aggregation framework, secondary indexes, multi-language support, and all-round ease-of-use play a favorable role in its popularity. Another contributing factor is its inclusion in popular JavaScript stacks like MEAN and MERN, which is widely used for web application development.

Besides its popularity, MongoDB has been highly influential in the increased use of document databases today. With drivers available for a large number of programming languages, MongoDB makes it possible for developers to work in their preferred programming language. You also have access to a strong ecosystem of tools and services, including MongoDB Atlas (fully managed cloud database service), MongoDB Compass (database GUI), and many more.

MongoDB started as an open-source project, a factor that helped it gain initial adoption, however, it has since made away with its open-source license for a more proprietary and controversial SSPL license; this move was a motivation behind the creation of FerretDB - an open-source alternative to MongoDB.

Designed to be horizontally scalable, MongoDB is built to handle large volumes of data and traffic, sharding data across multiple nodes. It also enables high availability through replica sets that provide redundancy and failover; if a primary node fails, a new primary is elected from the remaining secondary nodes.

In the early days, one of the biggest concerns with MongoDB was its lack of support for ACID (Atomicity, Consistency, Isolation, and Durability) transactions.

Ensuring consistency across multiple nodes is a challenge for many distributed databases, and MongoDB was no exception. There's an interesting theorem on this called the CAP theorem, which implies that it's impossible for a distributed database to provide more than two of these three guarantees: Consistency, Availability, and Partition tolerance.

Many document database settle for an option between being choosing consistency (CP) - all nodes have the same data and remain consistent - or availability (AP), where all nodes can answer queries, even with stale data. In MongoDB's case, they chose to sacrifice consistency for availability and partition tolerance, which was why it didn't support ACID transactions in the early days.

Since MongoDB version 4.0, MongoDB has added support for multi-document ACID transactions; support for distributed multi-document ACID transactions was added in version 4.2.

Couchbase

Couchbase is an open-source NoSQL distributed multi-model database built and optimized for interactive applications. Similar to MongoDB, Couchbase uses a flexible JSON model which doesn't require a fixed data model and can be modified on the fly. But that's where the similarities end. Couchbase offers its own unique query language called N1QL (pronounced "nickel"), a sort of SQL-like query language for JSON.

A typical query in Couchbase looks like this:

SELECT title, author
FROM `bucket`
WHERE genre = "Novel" AND publication_date BETWEEN "1850-01-01" AND "1860-12-31"

As you can see, it's more akin to SQL than MongoDB's query language, and allows a lot of the same operations you would do in SQL, including joining, filtering, aggregating, ordering, and more. In Couchbase, bucket represents the name of your Couchbase bucket - an analogous term for "database".

Using a distributed architecture with sharding and load balancing, Couchbase is built for easy scalability, replication, and failover. It also provides distributed ACID transactions across multiple documents, buckets, or nodes.

RavenDB

RavenDB is a NOSQL document database that is fully transactional (ACID) across the database and across clusters, perfectly suited for complex, semi-structured, and hierarchical data in ML/AI models. Asides being one of the first NoSQL databases to support ACID transactions, RavenDB is a multi-model database that supports document, relational, graph, and key-value data models.

Like other document databases (and unlike relational databases), RavenDB uses a jSON-like flexible data model that doesn't require a fixed data schema.

Using its own SQL-like query language called RavenDB Query Language (RQL), RavenDB supports a wide range of queries, including full-text search, spatial queries, and more. RavenDB also supports LINQ syntax queries, which is a popular query language for .NET developers.

An example RQL query in RavenDB might look like this:

from Employees
where hiredAt > '2000-01-01T00:00:00.0000000'

Interestingly, queries in RavenDB always uses indexes with no support for full scans.

When you run a query in RavenDB, the query optimizer searches for an existing index to satisfy the query and if it doesn't find one, a new index is created. While this approach may result in many indexes being created, which could potentially have adverse effects, RavenDB's query optimizer tries to mitigate this by modifying an existing index for the new query, when possible.

In addition to indexing, RavenDB uses caching and batching to optimize server and network resources, along with provisions for multi-master replication, sharding, and replication to enhance availability and scalability. RavenDB also offers a built-in Lucene-based full text search which is a top-tier highly customizable, fully featured, and near real-time search engine.

As an all-in-one database, RavenDB also includes a cloud service, a Time Series model, ML processing, and an online analytical processing (OLAP) plugin for business analysis.

Firebase

Firebase is a fully managed cloud-based NoSQL document database provided by Google as part of its Google Cloud Platform (GCP) services. It's a serverless database that is fully managed by Google, so you don't have to worry about provisioning, scaling, or managing your database.

Interacting with Firebase is done through the Firebase SDK, which is available for a wide range of platforms, including web, Android, iOS, and Unity. Firebase also provides an API for different programming languages, including JavaScript, Node.js, Java, Python, and Go. However, while they support the same set of features, the syntax and usage vary slightly depending on the language.

For instance, a typical query in JavaScript looks like this:

const db = firebase.firestore()

db.collection('users')
.where('address.state', '==', 'NY')
.get()
.then((querySnapshot) => {
querySnapshot.forEach((doc) => {
console.log(doc.id, ' => ', doc.data())
})
})
.catch((error) => {
console.log('Error getting documents: ', error)
})

Getting Started with Document Databases

Document databases represent a significant departure from traditional relational databases in storing and accessing data. This evolution is interesting for developers looking to build their applications with a flexible data model that offers high scalability and agility.

If you want to be part of a growing community of NoSQL document database enthusiasts, the Document Database Community is a global network of developers where you can learn more about recent trends, technologies, and news in the document database space.

If you're looking for a document database to practice and build with, Ferret is a good option for you. It's open source nature and compatibility with MongoDB wire protocols and queries makes it an attractive option.

To get started, checkout the installation guide for FerretDB.