DARPA_Big_DataEvery year, the amount and type of data companies must manage increases. Commonly called Big Data, this information—everything from social media posts, audio and images to transaction records, sensor data and video—continues growing unabated. According to IDC, data is growing at 40 percent per year and will continue to do so into the next decade.

Companies are struggling with how to efficiently and cost effectively collect and store this fast-growing data. But the real benefit lies in being able to analyze it in ways that can improve product quality, speed decision-making, boost customer service, and optimize business processes. And it works; according to a Dell survey, 89 percent of companies with big data initiatives report significant improvements in corporate decision-making. A report by McKinsey Global Institute estimated that retailers using data analytics across their organizations at scale could increase their operating margins by more than 60 percent, and that healthcare organizations could reduce costs by 8 percent by leveraging data analytics.

Achieving these types of benefits requires an IT infrastructure that is fully scalable, flexible, and cost-effective. While it’s possible to analyze data to some extent using traditional IT architectures, companies quickly run into roadblocks that can limit the amount of data they analyze and the value they receive from that data analysis. All of this puts a big strain on traditional IT infrastructures—not only the amount of storage required, but processing power and networking bandwidth.

One of the biggest problems is that traditional architectures require data to be reduced to a relational database format, which limits the size, speed and scale of data processing. “You end up having to throw data away or age it out because relational database can only handle so much data, which means that you can only analyze a subset of the data,” says Mike Matchett, a senior analyst at Taneja Group.

Converged infrastructure systems provide many of the resources required for effective big data analytics, from the ability to handle Hadoop to storage scalability. Getting the biggest data analytics payoff from a converged infrastructure requires three capabilities:

  1. Hadoop. Hadoop, open source software for distributed computing, is critical for analyzing Big Data. It’s far and away one of the best ways to handle fast-growing data processing, storage and analysis.

“The Hadoop ecosystem allows you to keep all of your raw data because you can scale out as data is added by adding more nodes with more local disk,” Matchett explains. “So if you have an analysis that takes four hours, it will still take four hours if you double from 100 to 200 terabytes even get to 2 petabytes with 1,000 nodes.”

  1. Storage. Lots of data requires lots of storage, and when data stores keep expanding at a fast rate, it’s important to have as scalable a storage architecture as possible. While some converged infrastructure systems still use traditional storage arrays—with increasing amounts of flash array—Matchett stresses the importance of using a converged infrastructure with an embedded scalable storage solution.
  2. Optimized for Big Data. While converged infrastructures in general provide easier and faster scalability than traditional architectures, the optimum environment for Big Data analytics is a system that allows you to scale computing power separately from storage. “The idea is to be able to scale storage and compute nodes separately,” Matchett says. Aim for as much memory, and as little cost and power, as possible, he adds, since companies doing analytics will scale out to hundreds or even thousands of nodes. Source