Hadoop-elephant-300x248Selecting a Hadoop distribution can seem pretty daunting. But it all comes down to which platform best suits your Big Data needs.

The thought of evaluating whether a particular technology is right for your company can conjure up feelings of stress, anxiety and ambiguity, especially when fielding ROI questions from the CIO and CFO. As more and more companies evaluate Hadoop, they are finding themselves in a similar situation, if not worse. Hadoop has great promise, but it can be a difficult technology to understand and it is moving at a rapid pace.

Assuming Hadoop solves the problem at hand, how do you decide which Hadoop distribution to pick? They all look similar from the outside; all of them package more than a dozen open source software components, work on commodity hardware and can pretty much run similar sets of analytical workloads. Yet there is a marked difference in terms of what you get for your money. When evaluating Hadoop distributions, here are some questions to ask.

Can It Stand Alone?

At the heart of the matter lies the question of whether you are buying licensed software or buying services for free software. Although the promise of support services via “hand-holding” and “community-based support” feels invaluable when you start off on an unknown journey with a new technology, you need to recognize that it is a piece of technology that will be going into your production environment, and you should hold it to the same standards as any other technology in your data center. Enterprise-grade products remove the need to rely on third-party support.

Is It Reliable?

A major weakness that has been pointed out about Hadoop technology is that the NameNode that is used to locate and keep track of all of the other nodes related to a certain data set is a single point of failure. In other words, if the NameNode fails, all of the data in the other nodes is lost because it can’t be found without the NameNode. While Hadoop is still working on correcting this issue with its version 2.0, some platforms offer alternatives that eliminate the NameNode and its vulnerability.

Survey Analysis: Customers Rate Their BI Platform Functionality

Download Now

Planned upgrades to Apache Hadoop also require outages, which could lead to contention as departments have projects they want to complete and don’t want to wait for the system to update and reboot. Some Hadoop distributions have also come up with alternatives to this problem with rolling upgrades. Read more