Some of the attendees who stopped by our booth asked us whether their scale-out databases needs to be backed up, so I reflected on this conundrum for a couple of days. Yes, these databases keep multiple copies of the data so why do they need backup? I know I have a good answer, but how do I answer in the terms that the need for backup will be self-explanatory? Does the built-in replication of these products alone ensure business continuity in all failure scenarios? To answer this question, let’s look back at the evolution of computer systems.
- Before the advent of disk RAID, it was a single disk system with no redundancy whatsoever. RAID offered data protection against disk failures by maintaining redundant data. Since the mid-1980s, RAID has been the industry standard against disk failures. But having additional copies of the data in the disk controller did not alleviate the need to have regular backups of your applications.
- To improve the redundancy at the system level, clustering technology ensured the application availability in case of system failures. However, clustering technology did not alleviate the need to have regular backups.
- To protect against site failures, storage vendors implemented synchronous and asynchronous replications. People still perform backups.
So how do these features stack up against scale-out databases that we all started to love so dearly lately?
Features | Traditional IT | Scale-out databases/File Systems |
---|---|---|
Parity Based Protection | RAID-5/6 | Erasure Coding |
Clusters | Operating System Based Clustering | Clustering functionality built into the application |
Site-wide replication | Synchronous replication of storage | Replication between racks in a data center Caveat: Not all databases understand the data center topology. For example, Cassandra supports different snitches that enable replication at various levels |
Geo-Replication | Asynchronous replication to two or more geographical locations | Datacenter wide replication |
This new generation of scale-out databases and file systems provide the same level of availability that of traditional systems with a fraction of the cost. However backup and snapshotting technologies provide a point-in-time copy for different sets of use cases. To start with, backups and snapshots provide protection against data corruption and unauthorized modification of data.
However, snapshots also enable other use cases such as forensics, business intelligence, etc. So do you still need a backup? The answer is: it depends on your IT risk assessment and the regulatory requirements. If you are already doing backups, you may as well continue to do backups for these new applications too.
Related resource- OpenStack, NoSQL, & the Need for Scale