cloud_magnifying_660Cloud computing democratizes big data – any enterprise can now work with unstructured data at a huge scale.

At first glance, it isn’t obvious why the unstructured data methods of the new big data world are even necessary. Even if new methods bring new business value, why not stay on-premise? Why bother with cloud databases?

The big data label

Big data is one of those new, shiny labels, like SDN, DevOps and cloud computing, that is both hard to ignore and hard to understand. There is no single “big data” type – it is a collective label stuck on unstructured data, the technology stack it inhabits, and the new business processes that are growing up around it.

For instance, the discipline of big data analytics is about getting business value out of large data sets. Data scientists work with resources and processes to turn data into useful information. The classic RDBMS (Relational DataBase Management System) can handle a lot of data, and has been doing so for decades. Why can’t a data scientist stick with structured data in an RDBMS? Which is best – RDBMS or NoSQL?

Structured or unstructured data?

The technical stack an enterprise chooses is dictated by the type of data they need to store, and the type of data is dictated by business requirements.

The RDBMS is good for managing structured, highly relational data and will continue to be the software of choice for many requirements.

For the growing amount of unstructured data produced by social media, sensor networks, and federated analytics data-and for constantly changing data that needs to be replicated to other operating sites or mobile workers-NoSQL technologies better fit those use-cases. Unstructured data can be terabytes or even petabytes in size.

On-premise relational technology stack

The RDBMS is the type of storage software that has been dominant for decades. All data in an RDBMS is structured – clean, ordered and easy to understand. That makes it good for some work but bad at others. RDBMS products are also well known; a generation of DB administrators is experienced in RDBMS care and feeding.

One big problem with an RDBMS is when it gets too busy. When the quantity of data starts filling up the disk, and the queries are thrashing the CPU and the result sets choke the RAM, more resources are required to keep the DBMS working. There is only one way to scale, and that’s “up.” Scaling out doesn’t work because a relational database service only has one front door. And the only way to scale up is to buy a bigger box.

Scaling up does not cure RDBMS problems. Even the biggest computer, with its huge IT budget-gobbling price tag, only solves the resource problem. The IT department still has to solve other problems like HA fail-over, disaster recovery and storing data where it’s needed.

If the infrastructure is on-premise, there are traditional problems to overcome. Managing on-premise RDBMS is expensive and time consuming. An on-premise  MySQL, Oracle or SQLServer database service is propped up by an overloaded IT department with a queue of work and inflexible hardware.  If an enterprise rents  Microsoft Azure DatabaseGoogle Cloud SQL or Amazon RDS these infrastructure headaches go away. By Nick Hardiman Read more