NoSQL

March 15th, 2012

It’s new, cool (anti-establishment) and the big guys are using it! It’s built for performance, reliability and scalability, and to top it off, it’s free. Why shouldn’t you build your application/business with it?

Unfortunately, before jumping in feet first, I think it is necessary to slow down for a minute and fully understand the advantages of a NoSQL solution along with the tradeoffs you MAY be forced to make.

First, as I stated NoSQL is increasingly being hyped as a next-generation database that fixes all the performance, scalability, and complexity problems that you might encounter when using relational databases. However, while NoSQL delivers these powerful capabilities, it does require a number of very serious compromises that can also be detrimental to a business. To obtain the high-performance and scalability, NoSQL implementations remove what some may consider to be unwanted and unnecessary functionality of the relational database. The problem is that the removal of this functionality may come at a high cost for many normal business requirements.

For example, a key premise of most NoSQL databases is to remove atomicity, consistency, isolation, durability (ACID) in favor of Basically Available data with Soft state that becomes Eventually consistent (BASE). This essentially means that when you ask a question if you wait long enough you will eventually get a complete and accurate answer, but in the (quick) meantime you may get results that are only partially correct. This may be acceptable when related to a 140 character Tweet, but certainly is unacceptable when related to a series of ATM transactions.

Another example relates to aggregates. Some NoSQL implementations offer limited ability to perform SUM(), MAX(), AVG(), or GROUP BY operations. NoSQL implementations are architected to provide highly efficient CRUD operations against objects, documents, or graphs. This makes normal day-to-day operations very scalable for end users of highly specialized applications. But if management suddenly requests the total number of orders placed by customers referred by partners in Colorado to evaluate tax liability, NoSQL may not be able to easily provide the answer.

There are obviously many advantages to using a NoSQL solution, otherwise the big players wouldn’t have spent the time, effort and resources on developing the specialized solutions they felt they needed to solve their problems. However, as someone who is extremely experienced with database design and performance issues I also know that there are many poorly designed databases in applications and conversely numerous ways to improve the performance of those databases. Therefore, I think it is important to look at the big picture and fully understand the applications that NoSQL and SQL databases are best suited for and use each as just one tool in your toolbox. Therefore, when it is time to build your application, you want to reach into that toolbox to grab the best tool for the job, not merely NoSQL because it is new and cool or, for that matter, SQL, because that is what you have always been using. Having said that, you also have to understand your application and that the current flavors of NoSQL databases are designed to solve specialized extremely large data issues and are not a one size fits all solution. Finally, I also know that it is fairly easy to obtain great performance from relational databases and unless you collect massive data like Google and Facebook, a relational database may still be your best choice.

Database Clustering

February 3rd, 2012

As a startup planning a site architecture, you essentially have two options:

1. Initially purchase the hardware to deliver the performance you want for the expected traffic/business requirements

2. Develop an architecture that allows you to easily add capacity as the business grows – initially conserving cash and only making expenditures when needed and when revenues should be able to support the growth.

Adding capacity on the front-end is pretty easy, usually by scaling out with commodity hardware. If done correctly, this can be done easily and quickly.

The more difficult problem is with the back-end database. The most common form of scaling when needed for the database has been to scale up – increase the number and speed of the processors and add memory. This can get expensive very quickly as these powerful servers are not cheap. In addition, for redundancy, you will also need a second failover box. Yet, when your business gets to this point, it is also a good problem to have as it means that the business is doing something right.

Over the past several years database clustering has improved immensely both from the standpoint of reliability and management, but also with respect to performance. MS SQL now provides for 16 node clusters and MySQL (5.1) allows for up to 48 data nodes. MySQL provides the ability for self-healing of nodes and the ability to add nodes when requirements dictate.

The use of clustering (including geographic replication and multi-site clustering), particularly with MySQL, provides increased flexibility in meeting increased business requirements and greater fault tolerance, all with the added benefit of using commodity hardware and is something that needs to be considered very seriously when designing the initial architecture for a startup.

As a former consultant with Oracle and a heavy user of MySQL I am extremely interested in how Oracle will handle MySQL with the recent purchase of Sun.

Obviously, Oracle would like to develop some type of revenue stream, but I think that is going to be difficult. However, the best that Oracle may be able to do is to continue to support the product, but do so in a manner which provides a clear upgrade path to Oracle and attempts to protect current revenue as much as possible.

First and foremost, MySQL and InnoDB are now part of the same company. Oracle has done a nice job of maintaining InnoDB so I think that this can only be beneficial to MySQL.

Another thing that Oracle has to recognize is that MySQL has been instrumental in spreading database technology to the masses. You can hardly signup for a hosting account without also gaining access to MySQL. I couldn’t even fathom a guess at how many sites use MySQL. Many, if not most, of these installations are individuals and small businesses. Not only have people and companies done this because it is easy, but they are cost sensitive. Attempting to now charge licensing fees for something that has been free up until now will result in a large backlash and is probably undoable.

In addition, just because Oracle now owns MySQL, doesn’t necessarily mean they own the intellectual power that created it. In fact, many of the MySQL developers previously left Sun and several new forks of MySQL already exist including Drizzle, Percona and Monty. Postgres could also become much more popular as part of a backlash. Any attempt by Oracle to limit access to MySQL will only result in developers rallying and creating something new and even better.

Therefore, if I was Larry, what I would attempt to do is to first develop a clear upgrade path from MySQL to Oracle. When should you upgrade, under what circumstances? What are the clear benefits? Make the low-end costs reasonable. Then I would attempt to slow down and limit the innovation and addition of features to MySQL. Certainly, MySQL doesn’t scale as well as Oracle, but the addition of features such as Replication and Clustering to MySQL certainly makes someone take it seriously and consider it for a lot more scenarios than previously.

In any case, it will certainly be interesting to watch. As for me, I personally think that MySQL will continue to be available and will continue to get stronger. I also don’t have any reservations in continuing to use it, as long as I recognize its limitations and fully understand the environment in which it will be used.