Start Small, Grow Tall-Debunking Three Big Data Myths
Harness Big Data, the common refrain in many business and technology publications goes, and you shall be able to better acquire and serve customers, optimize all sorts of business processes, and even open up entirely new markets with data-driven products.
Sometimes customers I speak with retort: “I don’t think my company can jump into this yet.” They then rattle off three reasons why their organization is not ready for Big Data.
Thanks to the emergence of an open-source framework called Hadoop, though, all three objections are easily refuted. Today, notably with the advent of Hadoop 2.0, there is no excuse for the average enterprise to delay the journey of turning the wealth of information surrounding their business into additional profit.
Let’s debunk those three common myths one by one.
Myth No. 1: We Don’t Have ‘Big’ Data, Only ‘Small’ Data
Our experience: Most enterprises have a data management problem.
While a typical enterprise data warehouse may be in the 10 to 50 terabyte range, the objection that “our organization only has small amounts of data, not petabytes” overlooks a few key points.
First, I have yet to meet a large organization that does not have a data management problem, even at terabyte scale. The problem might be that existing enterprise data warehouses are reaching capacity limits, that older data which has been archived sits “in the dark,” meaning it cannot be queried for insights, or that a lot of potentially valuable data is simply not collected or discarded too soon. Considered from this perspective, most organizations will quickly admit that they are not extracting the full value from even the few terabytes of data they do collect, whether it’s from point of sale transactions, online interactions, or sensor-based observations.
Second, the average Fortune 1000 enterprise does in fact have over a petabyte of data in aggregate. It is just that the data is scattered around the enterprise in many fragmented silos. Not everyone reaches the data management sophistication or size of a Shell, with 40 petabytes across the organization.
Third and most importantly, the main reason companies have small data not big data, lies with the high cost of legacy solutions. Companies run out of budget before they run out of valuable data or data sources. For instance, the fully loaded cost of a terabyte of data in a traditional Enterprise Data Warehouse (EDW) easily exceeds $100,000. Therefore, many organizations did not gather or keep a lot of data since the basic architecture already cost so much.
Myth No. 2: We Don’t Have the Big Budgets Required for Big Data
Our experience: Hadoop provides at least a 10x price performance improvement.
At the core of this myth are scale-up in-memory computing solutions promoted by vendors like Oracle and SAP. Double-digit million dollar marketing budgets have made some customers believe that SAP’s HANA is the go-to solution for Big Data. In reality, HANA is a point solution for a specific problem, namely speeding up analyses on structured data coming out of an SAP system. Solutions like HANA easily cost upwards of $500,000 dollars per terabyte to deploy.
Using a Hadoop solution, though, companies of any size can quickly build an inexpensive landing zone for all their data that scales out as the data grows. A modest budget of $50,000 should suffice for experimenting with Hadoop, and a few hundred thousand dollars are usually sufficient for a decent full-scale production. Spend a million dollars and you are well on your way towards a petabyte-scale Hadoop cluster. (A petabyte is 1,024 terabytes.)
Myth No. 3: We Don’t Have the Data Scientists Required for Big Data
Our experience: You don’t need data scientists to start realizing substantial value.
Data scientists are similar to white unicorns: rare, highly coveted creatures with legendary powers. But organizations who refuse to embark on the Big Data journey because they lack expensive specialists are getting the argument backwards. By Juergen Urbanski read more