 
12 Very Important Tools For Hadoop Users
When it comes to Big Data analysis, Hadoop is at the forefront of things. Almost every company worth mentioning is looking for people specialised in Hadoop. These tools manage various aspects of big data analysis using Hadoop.
1. Ambari
The Apache Ambari ...
 
Enjoy machine learning with Mahout on Hadoop
"Mahout" is a Hindi term for a person who rides an elephant. The elephant, in this case, is Hadoop -- and Mahout is one of the many projects that can sit on top of Hadoop, although you do not always ...
 
How Accurate is Mahout for Summing Numbers?
A question was recently posted on the Mahout mailing list suggesting that the Mahout math library was "unwashed" because it didn't use Kahan summation.  My feeling is that this complaint is not founded and Mahout is considerably more washed than ...
 
Apache Mahout is moving on from MapReduce
Apache Mahout, a machine learning library for Hadoop since 2009, is joining the exodus away from MapReduce. The project’s community has decided to rework Mahout to support the increasingly popular Apache Spark in-memory data-processing framework, as well as the H2O engine for ...
 
Why Apache Spark is a Crossover Hit for Data Scientists
Spark is a compelling multi-purpose platform for use cases that span investigative, as well as operational, analytics.
Data science is a broad church. I am a data scientist — or so I’ve been told — but what I do is actually ...
 
Integrating Hadoop into Business Intelligence and Data Warehousing
Information from SAS and TDWI Research
The purpose of this report is to accelerate users’ understanding of the many new products and practices based on Hadoop technologies that have emerged in recent years. While Hadoop usage is a minority practice today, ...
 
Vital Hadoop tools for crunching Big Data
Today, the most popularly term in IT world is ‘Hadoop’. Within a short span of time, Hadoop has grown massively and has proved to be useful for a large collection of diverse projects. The Hadoop community is fast evolving and ...
 
Yahoo SAMOA, Open Source Platform for Mining Big Data Streams
Yahoo SAMOA (Scalable Advanced Massive Online Analysis) is a framework for mining big data streams and applying distributed machine learning algorithms. You can think of SAMOA as Mahout for streaming.
SAMOA (Scalable Advanced Massive Online Analysis) is a framework for mining ...
 
9 Truths Lead To Big Data’s Future
Now and then I find myself thinking about the big principles of big data; that is, not about Hadoop vs. relational databases or Mahout vs. Weka, but rather about fundamental wisdom that frames our vision of "the new currency" of ...


















