In the past years, we follow the rising of NoSQL technology and it's employed in an even more extended set of applications. This article aims to make an objective comparison from SQL and NO-SQL technology and tries to clarify some unclear aspect to help people to choose its backend knowledgeably.
- What is NoSQL
- NoSQL implementations
- Advantages of NoSQL
- Limits of NoSQL
- Performance comparison
- Points of interest
What is NoSQL
In simple words, NoSQL is a new data storage backend that does not follow the relational DB model. That means we are speaking about a “container” that works differently from traditional SQL based backends.
NoSQL technologies born to cover a set of new needs came out while traditional databases were already mature. Of course in past years, applications needed changes and became even more discerning (big data, clustering, file repository), so this new storage system was designed keeping in mind these new requirements.
But, what do I mean for “requirements”? Here is a set of case that NoSQL is designed to support.
- Application works with large amount of data (Big Data)
- Application works with data that quickly change relationship and data types (semi-structured, unstructured and polymorphic data)
- Developers work in a small team using agile approach: lot of small spring against long term waterfall iterations
- Application served as service, may be published on the web
- Application delivered to thousands of users instead of few people inside the company
- No certainty about the future load of application: need to be scalable and dynamic, need base software on a backend cluster easily
There are many NoSQL solutions offered by the market, open-source or not. Each of them works a little bit differently, maybe specializing for some particular need, but the basic idea and the common feature is to offer better scalability and performance. To do this, they give up to some features of generic RDBMS, introducing new ones, but keeping enough functionality to be useful.
One of the breaking changes from SQL DBs is that, meanwhile SQL backends are general purpose storage systems, NoSQL distributions are focused on a specific kind of data. This allow DBs to be more efficient on their scope and allow to have more performant system. In this section, some kind of NoSQL databases are reported to give an idea of what application they have. Note that they can be used together (also with traditional SQL systems) to get the best from all systems.
This type of database doesn't need to have a consistent data structure, so they are useful when you have to work with polymorphic data, or data structure changes constantly. This kind of backend can convert normalized entities like key-value dataset or EAV models to simple documents set.
- Goal: Storage non typed set of “records”, called “documents"
- Examples: MongoDB, CouchDB
- Target: Heterogeneous data, working object-oriented, agile development
We are told that NoSQL database removes the concept of relationship to achieve better performance. In this kind of db, this is not true. Instead, these databases enforce the concept of relationship.
Their goal is to define data by its relation with other data. This kind of database can be useful when most of the data structure is designed to keep relationship with entities (i.e., if you have a lot of tables when there are mostly foreign keys columns).
- Goal: describe data relations
- Examples: Neo4j, GiraffeDB
- Target: Data Mining
This is a kind of db designed to store a big amount of key-value pair data. This can be useful when database is used to store properties, translations, or caching purpose.
- Goal: Store data in key-value form
- Examples: Redis, Cassandra, MemcacheDB
- Target: key-value storage
Advantages of NOSQL
We are aware that NoSQL db has some interesting advantages and they can easily resolve simple problems that traditional RDMS can't. Nowadays, their large employment in critical systems, like big cloud systems and some large Saas product, confirm they are mature and useful. But the question is, why should I move to them? And in this case, when is the move profitable? We can’t base such kind of decision only on our impression, and read some vendor brochure where it’s granted that NoSQL is very cool is not enough. Moreover, we cannot stay on inadequate platform just because we fear the change.
In this chapter, I’ll try to explain why this solution could be good enough to move on it and what use case makes it more profitable.
As we said, NoSQL databases were created in response to the limitations of traditional relational database technology. This means we will find some improvement, or better, some feature that in traditional rdbms are not present and can’t be added, even if producer would implement them.
The advantages of NoSQL include the capabilities to handle easily:
- Big data: where with this term, we describe data sets containing very large volumes of data.
- Mutable data: Data could be structured, semi-structured, and unstructured as well. NoSQL can also manage transformation of data.
- Dynamic development: In contexts where we need agile sprints, quick iteration, frequent code pushes, and in summary to be responsive to changes, having a database that embraces dynamism is very helpful.
- Object-oriented: programming that is easy to use and flexible
- Expandable: We can easily implement and efficient, scalable architecture instead of expensive, monolithic architecture. Even in traditional db, we can do that but it is harder and limited.
- Open Source: Most of the solutions are open source, so with no cost for licenses.
NoSQL databases are more scalable and provide better performances, and their data model is closer to the domain model used inside application. Today, companies starting projects based on NoSQL databases are growing. NoSQL databases also tend to be open-source and that means a relatively low-cost way of developing, implementing and sharing software.
Limits of NO-SQL
In evaluating the limitations of NoSQL databases, it's important to bear in mind that the NoSQL world is a diverse ecosystem. Not all NoSQL storage products are subject to all drawbacks to the same extent. And that's a good thing, since it means that organizations have a lot of options when weighing the pros and cons of different NoSQL solutions in order to decide which one is best for their specific needs. In this chapter are summarized some features you may miss using a NoSQL solution.
By reading the article, you'll find this chapter expanded more than the advantages one. This won't be a way to discourage to use NoSQL. This chapter would be an impartial description of all limits of NoSQL technologies and simply want to let you know every possible problem you could find employing them. Many points could be varied by the implementation (i.e., when I said there are few tools in support, it could be right for most of these but not for all ), so consider them just an overview that will alert you about possible risk you could find. What I expect is that, after you choose a NoSQL product to employ, you can use this chapter as checklist to understand if these issues are present in your specific db and if they are relevant for the application.
Security is something everybody wants but that’s hard to reach. Theoretically, there could be security issues on every technology. There was, and maybe there are security issue also on SQL system. So why do I mark this as possible issues on NoSQL? There is no real issue on NoSQL “concept” related with security, but we can have security issues related with maturity of the product we employ. Security issues come out while the product grows and they are fixed. Intuitively, a young product could have many unknown security issues. Moreover, young products are on the market since a little time, so consultants haven’t had time to make experience on them and many security constraints could be neglected. So, the problem is the youngness of most NoSQL platform. For business use, I suggest to employ only mature solutions, with a vendor behind.
When we start learning RDBMS, they have taught us that ACID transaction is the best option to make operations that remains consistent across the whole database. Well, most of NoSQL technology doesn’t implement such kind of transaction. NoSQL systems are based on the principle of Eventual consistency. In practice, embracing a little risk of consistency (one node could be out of sync with others), they gain some performance boost. Yes, it is a compromise, but we cannot have everything.
I have to mention that some NoSQL implementations, like FoundationDB, allow ACID-like transactions keeping NoSQL performances high. By the way, while we stay on NoSQL, data consistency remains a critical part: basing on application you are developing this could be an issue or not.
When you speak to someone that is trying to convert you to NoSQL technology, one of the first benefits you could hear from them is the benefit of performance due to removal of relations. All of us agree that relation may bring performance reduction, but what do we lose by removing them? It’s like you were in dip rise with a big and heavy backpack on your back. Of course, dropping it you will go faster. It is convenient to do that? Depends on what this pack contains, depends on what is the value of the backpack content has for you. If it contains the tent for the night, maybe, it is better to reach your destination one hour later but sleep warm, then go faster. If you are bringing useful disposable things, maybe you can do the opposite.
Following this parallelism, can we accept losing consistency to get performance? It is convenient?
To take a step back, I’ll start from the origin of joins. RDBMS uses relation to link data from one table to another in order to keep data in a single place and does not replicate them.
Join is construct that allows to reconnect them in query. Of course, making a join between tables is an additional computational cost, with respect to finding data directly inside the table you are querying. But this cost is necessary to keep relation (no replication, consistency).
It’s clear that while such construct has an acceptable overhead, this is ok, and probably the best choice. But what when it slows down everything or requires too much hardware? This issue allows NoSQL developer to claim the lack of
JOIN to a feature, but NoSQL is the solution?
Not always. Sometimes, we just need to redesign database structure, maybe removing some relation or making restructure data. Yes, we will lose some relation or we will replicate some part of date, but it could be acceptable (in NoSQL, we will lose all of the relationship).
Another problem is with consistency. Think about category and products. We may have a nested tree of category, with many products as leaf of the tree. In traditional RDMS, changing the category tree is just an update on a foreign key (self relationship) on the
category table. These changes automatically reflect on all child categories and products. In the NoSQL way, we could have redundant data on all category\product and a change will need a massive update on child elements.
Let me assume that our application can renounce to
JOIN for gaining speed and in our case, it is an acceptable trade off. We yet said that in many NoSQL implementations, it is hard to keep the various entries consistent. When you work without transaction, you can make many operations in sequence but after a crush, you get inconsistency. This is true for first implementations of NoSQL and some new technology try to give out of the box something that offers out of the box transactions. You can also think about managing transaction at application level, trying to rollback dirty data, but it could be very hard to manage in many cases.
Missing Standard Across Vendor Technologies
SQL is a standard language. There could be many variations that bring to specific dialects, but this is complex doesn’t forbid to abstract data access. Think about Hibernate, NHibernate, Doctrine, Entity Framework or the other ORM you like: they are proof that differentiate between SQL dialects are not so relevant. We can conclude that SQL is a standard language, even if many vendors implement different database technologies. Also in case you are not based on ORM layer, if you produce a query for a db, most of the code could be reused in others. This makes migration easier and developers can adapt quickly to different db solutions.
On the other side, on NoSQL world, there is more confusion. Each vendor implements their specific syntax without involving any shared standard. This means it is harder to migrate an application between different NoSQL implementations. This means it is harder to find a programmer that knows many NoSQL technologies well.
Schema Flexibility Could Be a Trouble
One of the peculiarities of NoSQL system is that they do not require a schema. In practice, it is the programmer that decides on the data structure in the moment he saves it. So there is no place where it’s written how data is structured of what is the meaning of data. Even if you could easily recreate a db model starting from data relation using some automated tool, this could be something missing in traditional applications. Moreover, what if a bug occurs? We know there could be situations where something is wrong with the code. Traditional RDMS are scaffolded, so in case you switch some fields or you are wrong with field format, they protect you from inconsistency. In NoSQL case, there is no help from the db, because without any schema defined, there isn’t any information about data should be saved: nobody can say if data is wrong or not. The worst side effect is that process brings a lot of power and lot of responsibilities to the developer, that often doesn’t know all the processes or the structure.
Moreover, even in case you now know what is saved where, do you think you’ll remember everything the next month? and the next year? Not all projects are subject to continuous development, there could be business applications that stay as-is for a year, before we need to make some changes. Anyway in IT, a company often commissions a project to some supplier so this part will have to be taken into account to ensure an easy handover at the end of the project, maybe asking for an accurate documentation about data is structured and what each field\collection means. Last problem related with schema flexibility is that every member of the team could not work in the project for all its life, so turnover is critical on little teams where not all the members have full knowledge of data structure or there isn't an adequate documentation.
Saving a lot of nested data inside single documents, you may lost analytic features like “
COUNT” and so on. The bad thing is that this could not be a problem during the first application development, but someone could ask later for some report, so what to do in this case? It is hard to change data structure after database is filled, and doing that could have many unpredictable effects due to the leak of a well defined data structure. Analytics is a hard point for NoSQL.
Moreover, while there are many commercial tools you can connect to your traditional db to manage analytic part, there is a limited support for NoSQL systems.
Another solution that could be taken is to replicate some sort of "relationship" with unstructured data inside NoSQL db, maybe creating many collections and linking object one with others. If you are planning to follow this path to allow analytics reporting, keep in mind that this could slow down performances to be comparable with standard SQL systems. This could be acceptable when the part involved in this db is minimal and record count is limited. Anyway, even from my experience, a construct that allows to join data on NoSQL query, because there aren't well defined relationships behind are very limited and performance is not as good as we expect (i.e., at the time this article was written, MongoDB does not support inner join and can evolve only to table each time without creating temp tables).
We spoke about the lack of standardization on NoSQL query language and syntax. This problem may be reflected also in tools, together with youngness of most platforms. I’m speaking about tools for querying, but also for migrating data between database, managing backup, etc. Of course, most of NoSQL projects are growing, and we expect tools will grow with them, so this problem will be automatically solved in the near future.
The lack of standardization makes it hard for third party vendors build tools that can support multiple NoSQL solutions. Moreover young platforms mean less users, less customers and less time to develop mature tools.
It is important to specify how comparisons are made. First of all, I needed to place both solutions in the same condition. This means, for example to use the same hardware and have the same level of tuning. So I installed MongoDB (last version) and SQLServer Express on the same machine. Because we are not interested in the performance inside the database itself, I built my benchmark using C# code based on standard framework. Upon this two ways to save data, everything is shared (entities, logic, data generation) to ensure equity.
|Feature ||NoSQL ||SQL |
|CPU ||i7 ||i7 |
|RAM ||16GB ||16GB |
|Disk ||SSD ||SSD |
We will compare the list of all operations:
- mass insert
- transaction (or better, in NoSQL case, transaction simulation)
Mass Operation on a Single Entity
This benchmark consists of a big set of objects to insert to be performed in less time is possible. This test is replicated using a growing number of items to save to prove how performance scales in both systems. This benchmark measures execution time in ms and insists on a single table\collection.
|#rows ||NoSQL(ms) ||SQL(ms) |
|100 ||1 ||3 |
|1000 ||12 ||163 |
|10000 ||85 ||202 |
|100000 ||842 ||2182 |
|1000000 ||9179 ||19875 |
This benchmark is focused on query feature. We separate the following pattern:
- CASE 1 gets one entity using primary key: This pattern is used to fetch a single entity from db using is unique identifier.
- CASE 2 full scan with fail: When you are looking for a deleted element and database has to scan all index before reply “no”.
- CASE 3 Paged query: A complex query where you have some filters, one order condition, and you want to take just a page of data.
I created some benchmark simulating different ratio of patterns above. In the example, the first benchmark assumes 5% of queries of kind 1, 70% of kind 2 and 25% of kind 3. This benchmark measures the execution time in ms. This benchmark insists on a single table \ collection.
You can find all the code used to perform these tests on github.
| ||CASE 1 ||CASE 2 ||CASE 3 |
|Benchmark 1 ||5% ||70% ||25% |
|Benchmark 2 ||10% ||45% ||45% |
|Benchmark 3 ||15% ||8% ||77% |
First test is on a “small” set of data, about 2.500.000 rows.
|benchmark ||NoSQL(ms) ||SQL(ms) |
|Benchmark 1 ||47 ||244 |
|Benchmark 2 ||1 ||13 |
|Benchmark 3 ||1 ||12 |
Second test on a “bigger” data set, about 5M rows.
|benchmark ||NoSQL(ms) ||SQL(ms) |
|Benchmark 1 ||56 ||287 |
|Benchmark 2 ||2 ||17 |
|Benchmark 3 ||2 ||18 |
This benchmark highlights a big improvement of performance on query over index, but when MongoDB is used to read set of data, the gain is reduced and kept stable over data increase.
We know transaction on NoSQL world is mostly unsupported. We also understood that renouncing to transaction could benefit performance, a question is: how much do I gain from this? I built this benchmark to compare insert in transaction of one master row related with many children. Benchmark is focused on execution time, expressed in ms.
|#transactions ||SQL (ms) ||NoSQL (ms) |
|10 ||99 ||89 |
|100 ||1004 ||764 |
|1000 ||11025 ||7309 |
This benchmark is focused on analytics. Suppose we have a categorized master-detail data model, where you want to:
- export: whole join over all data tree
- report: sum all items in category for all category, i.e., give invoice amount for all customers
- KPI: sum all mater totals summing detail subtotals
On a base of 4M row after inner join:
|test ||SQL (ms) ||NoSQL (ms) |
|KPI ||1176 ||1403 |
|REPORT ||805 ||1363 |
Points of Interest
Revolutions are inevitable in tech context. A new technology comes and brings some revolutionary features, but often has to defeat developer preconceptions. Sometimes, they are misunderstood so their weaknesses come out after they are employed.
About innovation, we have "opposite" cluster of people:
- "enthusiastic" people that want to embrace change unconditionally and are ready to dispose everything done in the past to work with the latest technology;
- "conservatives" people that hate changes and prefer to stay with their habits, rejecting any new technology.
In real life, we have to stay in the middle, so it is important to know and understand what new technology could do for us and be ready to employ new technology before they are needed on projects. "Trial on the job" is a bad habit that can lead to bad results.
The same principle is applied to the NoSQL technologies. Because we looked at NoSQL some time ago, now we know the pro and cons, so we can take advantage from such kind of tools. When we analyze that technology, we cannot stop looking at the things we miss from our traditional habits like transactions, schemas, and standards. We need to study and familiarize with these technology that until some years ago was young and new, but now are a concrete option. Study, learn, understand, employ: this is the nature of progress.
When I Should Use a NoSQL Db?
Why NoSQL would be better than using a SQL Database? After you have read this article, I'm sure you'll understand that NoSQL isn't a replacement for SQL database, but just a different storage system with different features and is useful in some particular fields. So, the answer cannot be different from "it depends". Because it really depends on many characteristics of the project.
To be honest and cautious, NoSQL is the best solution when all the following statements are true:
- when your project needs to scale, or it could in the future.
- when you have to work on big data, or your data will be big in the closer future
- when analytics component in application is simple, or not so important
- when your application needs to fit database purpose (i.e., you are saving data in graph and database does it)
In some cases, NoSQL could be a good alternative, but is not essential to build a durable infrastructure. Of course, if in your application a NoSQL system covers 99% of all needs, there is no reason to couple it with a RDBMS. But if you need relations, transaction, and other features of standard RDBMS, maybe it's better to employ them as main storage system and use NoSQL to cover only the critical parts (maybe derived from size of data).
How Much Better are Performances in the Case Above?
It depends on the specific use case. On the one side, we have lots of benefits on large table or massive usage, but on the other side, we have some performance loss using lookup instead of join on small dataset. A realistic estimation, if there is the basis to employ a NoSQL db, we can improve performance from a 10 to 100 factor. Of course, this estimation takes into account all aspects of an application and it is related to the final user experience. That means you could measure better speedup on db layer, but the final user experience is distorted by many factors that reduce the gap (cache, network delay, page rendering). Just to explain what I’m telling, take in example the limit case when there is a page that makes a query and returns data. Assume for this page 500 ms for getting result using traditional database, 50ms using NoSQL,and 200ms for rendering the page and 1 second for transfer data over internet. Improvement of performance on db layer is -90%, but for the final user, there is only 450ms gained on 1700, so only 26%. With this example, I would explain that it is hard to measure improvement on complex system and in many cases, NoSQL is not enough to resolve performance issue. To be more direct, if you think of resolving performance issue due to bad design in an application moving to NoSQL, you are not on the right path.
But the big question is: what do I lose to get this performance? Because in some cases, it is not possible to renounce some feature like transaction or relationship. This is very important to understand before the move.
NoSQL Systems are Mature to be Employed on Production Environment?
Depends mainly on your needs, or better on the project requirements. We can state that NoSQL is certainly mature enough to use. So, if you need, you can use it without any fear. But not all applications need to work with big data or to scale that massively. Most of Saas product does, also many critical applications inside the enterprise context does, but most of the applications nowadays are still very simple. In my experience, it’s hard to find tables with more than 100 000 rows in a database. Think about your db, exclude the 2-3 bigger tables you have on it and look at the row count. How big are they? Usual db structure on db application counts lot of “little” table (less than 100 000 rows) related. For such kind of application, a traditional RDBMS is sufficient and it will be forever. What is important, instead of starting to employ it, is to understand benefit and development pattern to be ready when they will be needed in your case.
Is SQL Obsolete?
When men invented airplanes, did cars become obsolete? No, of course. Even if airplanes are faster than cars, they simply are two different systems to move people, with different characteristics. Based on the kind of travel you are starting, the time you have to spend on the travel and the budget, you'll decide what’s the best suitable alternative for you. In the same way, NoSQL coming don’t make SQL obsolete. They are simply two different ways to store data, with different characteristics. You’ll decide what’s the best for you based on your need.
There are problems SQL isn't suitable for, so you don't have to start big data project with it. It will be like trying to reach an island with a car instead of by plane. But SQL has still got its strengths. Lots of data models are simply best represented as a collection of tables which reference each other. It will be like trying to go to buy the milk using a plane. NoSQL databases aren't a replacement for SQL, but they are an alternative.
Is the Market Ready to NoSQL?
The key point to answer this question is closer to the experience that developers achieve. Most database programmers were trained for a year to think of data in a relational way. How could they change to thinking differently in such a short time? It is not so easy, especially when developers have to work in many projects together and some of them are SQL and others NoSQL. The temptation of reproducing the same pattern on SQL to NoSQL system is hard to defeat and often leads to bad results.
So actually, there is much more know-how for SQL around, there are more developers well skilled in RDBMS than in NoSQL. Meanwhile, there are DBAs that spend most of their time into focusing on relational databases, we can't expect to find the same on technology born less than ten years ago. SQL is reached at school and in universities, NoSQL is starting to be.
After this first point, a secondary one is that because these systems are newer, there are less development tools, or they are not so advanced like others, but I’m sure this is not a real problem. There are some “enterprise ready” solutions that come out with tools that manage all basic needs and we all hope that these tools will be grown with the growing of their platform.
What is the Best Solution?
There is no best solution that covers any case. The answer is simple and still the same: "it depends". With this article, I hope to have given you an overview of all capabilities of such systems and some basics to understand when they could be useful.
- 11th December, 2016: Added charts to compare performances
- 6th December, 2016: First version of this article
I'm senior developer and architect specialized on portals, intranets, and others business applications. Particularly interested in Agile developing and open source projects, I worked on some of this as project manager and developer.
My programming experience include:
Frameworks \Technlogies: .NET Framework (C# & VB), ASP.NET, Java, php
Databases: MSSQL, ORACLE, MYSQL, Postgres