Hey there! I'm currently diving into a cool book called "Designing Data-Intensive Applications." I wanted to share what I picked up from Chapter 2, all about Data Models and Query Languages. In this post, I'll talk about different types of databases and which ones work best for different applications. I'll also compare them using examples from the book and look at how they handle queries. Feel free to correct me if I goof up. The goal is to help you figure out the best database for your system. Let's keep it simple and learn together!
Data Models :
Majorly there are two different types of database models:
Relational Database model (Tuple or Table base)
Document database model (Key-value pair based)
Relational Database model :
The Relational Database model is like organizing information into tables, and these tables are connected using a key. This key is like a special code that helps link one table to another.
For instance, let's take a look at a user table and a positions table. They are connected using a key called "user_id." This means there's something common in both tables, and the "user_id" is like a special code that helps them work together. In the positions table, we call "user_id" a foreign key because it's linking back to the user table. It's like the secret handshake that ties these tables together.
Document Database model :
The Document-Based Model is like a storage system where data is kept as pairs of keys and values. Most of the time, it's saved in formats like JSON or XML.
Here's an example of the Document Database model in JSON:
{
"user_id": 255,
"first_name": "Steve",
"last_name": "Jobs",
"summary": "The man of innovations ...",
"region_id": "us:91",
"industry": "Tech Team",
"positions": [
{
"job_title": "Developer",
"organisation": "Atari"
},
{
"job_title": "Co-founder",
"organisation": "Apple"
}
]
}
In this setup, each piece of information has a key and a corresponding value. For example, "user_id" has the value 255, and it goes on like that for different aspects like first name, last name, and even job positions. It's like having organized information tags that help you quickly find what you're looking for.
Which Database Model should you use?
Choosing the right database model for your application depends on how your data is connected. If your app mostly deals with one-to-many relationships, then a document-based database model is often the way to go. On the other hand, if you have many-to-one or many-to-many relationships, relational database models are usually the better choice. Let me break down what this means.
For instance, let's consider an example from the book involving user data for someone like Bill Gates. In a relational database model, they might create a separate table for positions and link it using a foreign key (like user_id). This, however, can become a bit tricky when a user can have multiple positions. In this case, the database needs to create a new table for positions, leading to more queries for reading and writing data.
This is where document-based models shine. They are perfect for situations where one field can have many entries, forming a tree-like structure. So, if your data has a one-to-many relationship, a document-based model is likely the best fit for your application. It simplifies things and makes managing data with multiple entries much more efficient.
Many-to-One and Many-to-Many Relationships:
The industry number and the region_id in the relational-database model serve as a great example of Many-to-One relationships. In this scenario, if there's any change in the industries table, a relational database model offers a significant advantage. You only need to make the change in one place, and it will automatically update for all users. This practice of keeping related information in different tables provides several benefits:
Consistent Spelling and Style: Maintaining uniformity across all profiles.
Avoiding Ambiguity: Ensuring clear and unambiguous data.
Ease of Updating: Simplifying updates by making changes in a single location.
Localization Support: Facilitating the translation of the site into other languages with a standardized list.
Better Search: Enhancing search capabilities due to structured data.
Relational-based models excel in scenarios with Many-to-One or Many-to-Many relationships.
Now, you might think we can achieve something similar in a document-based model by introducing joins. However, this approach adds extra code to the application, potentially complicating it. It goes against the principle of keeping code simple and maintainable. We'll delve deeper into this argument in the later sections on Imperative coding and declarative coding.
With the relational-based model described, you could easily create a school’s page on the website, incorporating joins from the schools table. This would then evolve into Many-to-Many relations, showcasing the flexibility and scalability of relational database models.
Even if the initial version of the application may fit well in a join-free document model, data has a tendency to become more interconnected as features are added to applications.
Comparing Relational and Document models:
Relational models operate with a Declarative coding model, specifically through SQL (Structured Query Language). In the context of querying, being declarative means you don't have to define the logic of how to retrieve data from the database. Instead, you can issue queries in a language that reads like English instructions. For example:
SELECT name FROM user_table WHERE name = "Bill Gates";
This simplicity is possible due to the SQL query optimizer, which handles the logic of retrieving data efficiently.
On the other hand, Document models follow an Imperative coding model. In imperative coding, you build everything from scratch, including APIs for querying. This tends to make application code more complex as it involves specifying the exact steps for the computer to follow.
When it comes to Parallel Programming, SQL's query optimizer excels in handling parallelization logic. It automatically optimizes queries for parallel processing. In contrast, imperative languages in NoSQL databases require you to write explicit programs for parallelization, adding complexity to the code.
Now, let's touch on the concepts of Schema-on-read and Schema-on-write:
Schema-on-read: In this approach, the structure of the data is implicit and is only interpreted when the data is read. This means the schema is checked at the time of reading the data. Document databases operate on this principle, where the schema is flexible and only matters when retrieving the data.
Schema-on-write: Here, the schema is explicit, and the database ensures that data written conforms to it. This implies that the schema of the data is checked when the data is being written to the database. Relational databases typically follow this approach, demanding a strict schema check during the writing process.
Relational v/s Document Databases:
When it comes to simple application code, the choice between relational and document models depends on the nature of relationships:
One-to-Many Relationships: Document models often provide a simpler solution for one-to-many relationships, avoiding unnecessary complications in the application code. Each piece of information can be neatly organized without the need for complex structures.
Many-to-One Relationships: Relational models shine in many-to-one scenarios, thanks to the query optimizer that streamlines querying complexities. It makes the application code cleaner by hiding the intricacies of data retrieval.
In terms of schema flexibility:
Relational Model: It is not schema-flexible. For example, splitting names into first and last names would require updating the entire database, as the schema needs to be adjusted.
Document Model: Offers schema flexibility. If changes are needed, like splitting a full name into first_name and last_name, you can modify the schema for new entries and adapt the application code for older entries.
Regarding data locality for queries:
Relational Databases: Data is often split across multiple tables, requiring multiple index lookups for retrieval. This can result in more disk seeks and increased disk time, especially for complex queries involving multiple tables.
Document Databases: Documents are usually stored as a single continuous string encoded in JSON or XML. This means only one disk seek is needed to retrieve all the data, making it efficient for large documents. However, it may be less optimal for small queries or changes, as loading the entire document can be resource-intensive for large documents.
Query Languages for Data:
When the relational model was introduced, it included a new way of querying data: SQL is a declarative query language.
Many commanly used programming languages are imperative. Let’s take the example from the book to see the difference between imperative and Declarative programming languages.
If you have a list of animal species, you mihgt write something like this to return only the sharks in the list :
Imperative programming
function getSharks() {
// Create an empty array to store the sharks
var sharks = [];
// Loop through each element in the 'animals' array
for (var i = 0; i < animals.length; i++) {
// Check if the current animal's family is "Sharks"
if (animals[i].family === "Sharks") {
// If true, add the animal to the 'sharks' array
sharks.push(animals[i]);
}
}
// Return the array containing shark species
return sharks;
}
Declarative programming
SELECT * FROM animals WHERE family = 'Sharks';
An imperative programming tells the computer to perform certain operations in certain order. But incase of declarative programming language, you just specify the pattern of the data you want —what conditions the results must meet and how you want the data to transformed—but not how to achieve that goal. It is upto the database query optimizer.
Finally, declarative languages often lend themselves to parallel execution. Imperative code is very hard to parallelize across multiple cores and multiple machines, because it specifies intructions that must be performed in a particular order. Delclarative languages have a better chance ofgetting faster in parallel execution because they specify only the pattern of the results not the algorithms that is used to determine the results.
There is one more query model, that is MapReduce Query model which is neither declarative nor imperative. But it has a whole chapter so I am not including it in this blog.
Conclusion
In the dynamic realm of database models, the choice between relational and document-based approaches boils down to the nature of your application's data relationships. The relational model, with its one-to-many and many-to-one optimizations, offers simplicity and efficiency in handling specific scenarios. On the other hand, the document model excels in one-to-many relationships, providing flexibility and ease of adaptation.
Understanding the coding models is crucial in appreciating the advantages each model brings. SQL, with its declarative nature, simplifies querying in relational models, reducing the complexity of application code. In contrast, imperative coding, often associated with document models, involves building queries from scratch, potentially adding intricacies to the code.
Moreover, the flexibility of schema and considerations for data locality further underscore the strengths and trade-offs of each model. While relational databases demand adherence to a strict schema, document databases allow for adaptability, simplifying schema changes for new entries.
As we reflect on the journey through relational and document-based models, it's clear that no one-size-fits-all solution exists. The optimal choice depends on the intricacies of your data relationships, the nature of your application, and the balance between simplicity and flexibility. Whether you opt for the structured elegance of relational databases or the adaptable nature of document models, the key lies in aligning your choice with the unique demands of your system. By understanding the nuances and trade-offs, you empower yourself to make informed decisions that resonate with the specific needs of your application.
Again, the choice of a database depends on various factors. This is just one of them, so you can begin analyzing while keeping this in mind.
Reference: Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann
My profiles: LinkedIn => www.linkedin.com/in/lanjewar-arya
| Twitter => https://twitter.com/AryaLanjew3005