6 big data trends in 2014

Data are being generated by every device imaginable. Big data are arriving from multiple sources at an alarming velocity, volume, variety and veracity. It is estimated that 2.5 quintillion bytes of data are created each day—so much that 90 percent of ...

05 May 2014 Analytics, Big Data, Cloud Computing, Cloudera, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Pig, Predictive Analytics

Snappy compression with Pig and native MapReduce

Assuming you have installed Hadoop on your cluster, if not please follow http://code.google.com/p/hadoop-snappy/ This is the machine config of my cluster nodes, though the steps that follow could be followed with your installation/machine configs pkommireddi@pkommireddi-wsl:/tools/hadoop/pig-0.9.1/lib$ uname -a Linux pkommireddi-wsl 2.6.32-37-generic #81-Ubuntu SMP Fri ...

03 May 2014 Analytics, Big Data, Cloudera, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Pig, Predictive Analytics

A New Python Client for Impala

The new Python client for Impala will bring smiles to Pythonistas! As a data scientist, I love using the Python data stack. I also love using Impala to work with very large data sets. But things that take me out of ...

02 May 2014 Analytics, Big Data, Cloudera, Couchbase, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Pig, Predictive Analytics, SAS

Cloudera, MongoDB partner to mash up NoSQL, Hadoop

Hadoop specialist Cloudera announced a strategic partnership with MongoDB this week that will allow Cloudera customers to store Hadoop data in their NoSQL MongoDB databases. The move is a huge win for MongoDB, which is quickly emerging as one of ...

01 May 2014 Analytics, Big Data, Cloudera, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Pig, Predictive Analytics, Splunk

Apache Ambari 1.5.1 is Released!

Apache Ambari community proudly released version 1.5.1. This is the result of constant, concerted collaboration among the Ambari project’s many members. This release represents the work of over 30 individuals over 5 months and, combined with the Ambari 1.5.0 release, ...

26 April 2014 Analytics, Big Data, Cloudera, Couchbase, Google, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Predictive Analytics

Apache Hive Updated with SQL-on-Hadoop Features

The Apache Hive community has voted on and released version 0.13. This is a significant release that represents a major effort from over 70 members who worked diligently to close out over 1080 JIRA tickets. Hive 0.13 also delivers the third and ...

23 April 2014 Analytics, Big Data, Cloudera, Couchbase, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Pig, Predictive Analytics

Hadoop or Warehousing, or Both?

One of the thornier questions facing enterprise executives in these days of broad infrastructural change is how to deal with Big Data. On the surface, it may seem like a no-brainer: No matter how big the data load becomes, there ...

21 April 2014 Analytics, Big Data, Cassandra, Cloud Computing, Cloudera, Couchbase, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Predictive Analytics

Using Scala To Work With Hadoop

Cloudera has a great toolkit to work with Hadoop. Specifically it is focused on building distributed systems and services on top of the Hadoop Ecosystem. http://cloudera.github.io/cdk/docs/0.2.0/cdk-data/guide.html And the examples are in Scala!!!! Here is how you you work with generic stuff on the ...

10 April 2014 Analytics, Big Data, Cassandra, Cloudera, Couchbase, Google, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Predictive Analytics

Impala and SQL on Hadoop

The origins of Impala can be found in F1 – The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business. One of many differences between MapReduce and Impala is in Impala the intermediate data moves from process to process directly instead of storing it ...

05 April 2014 Analytics, Big Data, Cloudera, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Pig, Predictive Analytics, Splunk

How to Contribute to HBase and Hadoop2

By Nick Dimiduk In case you haven’t heard, Hadoop2 is on the way! There are loads more new features than I can begin to enumerate, including lots of interesting enhancements to HDFS for online applications like HBase. One of the most ...