Database Clustering

February 3rd, 2012

As a startup planning a site architecture, you essentially have two options:

1. Initially purchase the hardware to deliver the performance you want for the expected traffic/business requirements

2. Develop an architecture that allows you to easily add capacity as the business grows – initially conserving cash and only making expenditures when needed and when revenues should be able to support the growth.

Adding capacity on the front-end is pretty easy, usually by scaling out with commodity hardware. If done correctly, this can be done easily and quickly.

The more difficult problem is with the back-end database. The most common form of scaling when needed for the database has been to scale up – increase the number and speed of the processors and add memory. This can get expensive very quickly as these powerful servers are not cheap. In addition, for redundancy, you will also need a second failover box. Yet, when your business gets to this point, it is also a good problem to have as it means that the business is doing something right.

Over the past several years database clustering has improved immensely both from the standpoint of reliability and management, but also with respect to performance. MS SQL now provides for 16 node clusters and MySQL (5.1) allows for up to 48 data nodes. MySQL provides the ability for self-healing of nodes and the ability to add nodes when requirements dictate.

The use of clustering (including geographic replication and multi-site clustering), particularly with MySQL, provides increased flexibility in meeting increased business requirements and greater fault tolerance, all with the added benefit of using commodity hardware and is something that needs to be considered very seriously when designing the initial architecture for a startup.