How To Choose The Best Tool For Your Big Data Project

By Angus Kidman Trying to choose the right tool for a big data project? This chart (and three simple rules) can help guide you through the options. This chart is based on one shown by Microsoft Research senior research program manager Wenming ...

08 April 2014 Analytics, Big Data, Cassandra, Cloudera, Couchbase, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Predictive Analytics

Impala and SQL on Hadoop

The origins of Impala can be found in F1 – The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business. One of many differences between MapReduce and Impala is in Impala the intermediate data moves from process to process directly instead of storing it ...

05 April 2014 Analytics, Big Data, Cloudera, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Pig, Predictive Analytics, Splunk

Selecting the right SQL-on-Hadoop engine to access big data

With SQL-on-Hadoop technologies, it's possible to access big data stored in Hadoop by using the familiar SQL language. Users can plug in almost any reporting or analytical tool to analyze and study the data. Before SQL-on-Hadoop, accessing big data was ...

01 April 2014 Analytics, Big Data, Cloudera, Couchbase, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Pig, Predictive Analytics, SAS, Splunk

A guide to NoSQL offerings

Amazon Web Services: DynamoDB is a NoSQL database service that makes it simple and cost-effective to store and retrieve any amount of data and serve any level of request traffic. Users simply tell the service how many requests need to ...

29 March 2014 Analytics, Big Data, Couchbase, Google, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Pig

Apache Mahout is moving on from MapReduce

Apache Mahout, a machine learning library for Hadoop since 2009, is joining the exodus away from MapReduce. The project’s community has decided to rework Mahout to support the increasingly popular Apache Spark in-memory data-processing framework, as well as the H2O engine for ...

28 March 2014 Analytics, Big Data, Cloudera, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Predictive Analytics

Apache Falcon-Data Governance for Hadoop

Apache Falcon is a data governance engine that defines, schedules, and monitors data management policies. Falcon allows Hadoop administrators to centrally define their data pipelines, and then Falcon uses those definitions to auto-generate workflows in Apache Oozie. InMobi is one of ...

27 March 2014 Analytics, Big Data, Cassandra, Couchbase, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Pig, Predictive Analytics

How to Contribute to HBase and Hadoop2

By Nick Dimiduk In case you haven’t heard, Hadoop2 is on the way! There are loads more new features than I can begin to enumerate, including lots of interesting enhancements to HDFS for online applications like HBase. One of the most ...

24 March 2014 Analytics, Big Data, Cloudera, Couchbase, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Pig, Predictive Analytics

HBase BlockCache Showdown

The HBase BlockCache is an important structure for enabling low latency reads. As of HBase 0.96.0, there are no less than three different BlockCache implementations to choose from. But how to know when to use one over the other? There’s ...

22 March 2014 Analytics, Big Data, Cloudera, Couchbase, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Pig, Predictive Analytics, SAS

Apache Tez 0.3 Released

The Apache Tez community has voted to release 0.3 of the software. Apache™ Tez is a replacement of MapReduce that provides a powerful framework for executing a complex topology of tasks. Tez 0.3.0 is an important release towards making the software ...

13 March 2014 Analytics, Big Data, Cassandra, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Pig, Predictive Analytics, Splunk

Avoiding Split Brainedness in HA Hadoop Clusters

The US Patent Office recently granted Zettaset a patent for the underlying technology in its Hadoop high availability that prevents a "split-brain" situation where multiple master nodes think they're in control of the Hadoop cluster. It's a feather in the ...