RavenDB and MongoDB: Not easily interchangeable (at least not always)
On a recent internal project, we experimented with various NoSQL database back-ends. The project had used MongoDB since its inception. However, due to some memory issues, we wanted to evaluate other document database back-ends. In the end, we selected RavenDB for evaluation.
First, I want to mention the things I liked about using RavenDB:
- There is a very complete C# client API.
- It is very easy to set-up a new project on RavenDB. You can start with an embedded database and easily transition into a server-hosted mode.
- No additional mark-up is required in your documents (i.e. no attributes are needed to be able to locate documents).
- Auto-indexing. RavenDB will automatically create indices as queries are executed. If a query is executed enough times, those indices are promoted to be permanent (auto-generated) indices. In addition to the automatic performance tweak this provides, it also grants you some insight into which indices you should consider adding yourself.
The issues I uncovered using RavenDB fell into four categories:
- Incomplete IQueryable implementation
- “Safe by Default” design
- Eventual consistency
- Bounded number of requests
Incomplete IQueryable implementation
Prior to my evaluation of RavenDB began, an attempt had been made to alter the code to work with RavenDB. As part of the conversion, the IQueryable interface into RavenDB was used heavily for queries. These queries consisted both of simple SELECT…WHERE queries and of queries that leaned on the aggregation facilities of the IQueryable interface. It was the latter style of query that ran into problems.
RavenDB does not fully implement all of IQueryable. So, even though the code compiled and everything seemed as though it would work, when actually executed, RavenDB threw exceptions when using aggregation. The exceptions did clearly state that those aggregation functions were not supported. However, you would not necessarily expect that behavior from an IQueryable object.
I actually defended RavenDB on this point in discussions with some other developers. I could see a case to be made for wanting to have some of the built-in .NET framework support for the IQueryable interface without needing to implement the rather exhaustive set of methods declared on that interface. That defense, however, was weakened by the next issue I came across.
“Safe by Default” design
When you read through the RavenDB documentation, one of the phrases you come across repeatedly is “safe by default“. What that means is that RavenDB is designed to stop you from doing operations that would take up “too much” time or return “too many” results.
This is actually a really useful feature…as long as you are designing for it. Having IQueryable methods implemented, but in a way that violates the normal conventions for those methods, leads to results that you wouldn’t expect from reading the application code directly. Our code, for instance, assumed that it would be iterating over an entire result set.
To further complicate matters, RavenDB did not emphatically say anything about the result size cap. With the other “safe by default” features, RavenDB notifies you (through exceptions) when you do something “unsafe”. In this case, there was no notification. This led to some very confusing results until the note about result size was noticed in the documentation.
Changing the code to use the RavenDB statistics object and paging over the collection corrected this. However, this feature adds to the list of oddities using the RavenDB IQueryable interface.
The concept of “eventual consistency” was this biggest obstacle to switching to RavenDB. Operations affecting documents in RavenDB are ACID. However, the indexing mechanism are designed around a BASE architecture. There is a write-up about RavenDB’s transaction support available as part of the client API documentation.
In short, the eventual consistency concept means the following:
- Modifications to documents are contained within an atomic transaction
- Reading a document by its ID will return the most recent version
- Other queries are executed against indices that are updated asynchronously
- Therefore, queries may return stale results
RavenDB allows you to get statistics from a query that includes, among other items, whether the results are stale. That’s great if your application can be designed such that using stale data is OK.
For our application, however, stale results were a big problem. In many places, our application depended upon having up-to-date information. That meant that we had to litter our code with calls to ensure that the indices had non-stale results. By doing so, we were trading away some of the performance benefits of using RavenDB.
Bounded number of requests
The document sessions in RavenDB are designed to have limited use. Beyond the session time-outs common to most databases, the document sessions are limited (by default) to a set number of network operations. That means performing query or update operations in a loop become more complicated. As with the other oddities, you can work around this too (by tracking the number of queries for a session, for instance), but it adds yet more to the list of items you need to plan to handle when using RavenDB.
So, after evaluating RavenDB and uncovering some of its oddities, would I use RavenDB again? Absolutely.
However, I would be judicious as to when and how it should be used and would plan the functionality of my application around its idiosyncrasies. RavenDB is opinionated software. If you are going to use it, you need to fully take into account the opinions backed into the core of the framework.
As a note, the experiences documented relate to version 2.5 of RavenDB. Version 3.0 is nearing its release date, but is not officially released yet (and thus was not used for the evaluation).