6 big data trends in 2014
Data are being generated by every device imaginable. Big data are arriving from multiple sources at an alarming velocity, volume, variety and veracity.
It is estimated that 2.5 quintillion bytes of data are created each day—so much that 90 percent of ...
Snappy compression with Pig and native MapReduce
Assuming you have installed Hadoop on your cluster, if not please follow http://code.google.com/p/hadoop-snappy/
This is the machine config of my cluster nodes, though the steps that follow could be followed with your installation/machine configs
pkommireddi@pkommireddi-wsl:/tools/hadoop/pig-0.9.1/lib$ uname -a
Linux pkommireddi-wsl 2.6.32-37-generic #81-Ubuntu SMP Fri ...
A New Python Client for Impala
The new Python client for Impala will bring smiles to Pythonistas!
As a data scientist, I love using the Python data stack. I also love using Impala to work with very large data sets. But things that take me out of ...
Cloudera, MongoDB partner to mash up NoSQL, Hadoop
Hadoop specialist Cloudera announced a strategic partnership with MongoDB this week that will allow Cloudera customers to store Hadoop data in their NoSQL MongoDB databases. The move is a huge win for MongoDB, which is quickly emerging as one of ...
Apache Ambari 1.5.1 is Released!
Apache Ambari community proudly released version 1.5.1. This is the result of constant, concerted collaboration among the Ambari project’s many members. This release represents the work of over 30 individuals over 5 months and, combined with the Ambari 1.5.0 release, ...
Apache Hive Updated with SQL-on-Hadoop Features
The Apache Hive community has voted on and released version 0.13. This is a significant release that represents a major effort from over 70 members who worked diligently to close out over 1080Â JIRA tickets.
Hive 0.13 also delivers the third and ...
Hadoop or Warehousing, or Both?
One of the thornier questions facing enterprise executives in these days of broad infrastructural change is how to deal with Big Data. On the surface, it may seem like a no-brainer: No matter how big the data load becomes, there ...
Using Scala To Work With Hadoop
Cloudera has a great toolkit to work with Hadoop. Â Specifically it is focused on building distributed systems and services on top of the Hadoop Ecosystem.
http://cloudera.github.io/cdk/docs/0.2.0/cdk-data/guide.html
And the examples are in Scala!!!!
Here is how you you work with generic stuff on the ...
Impala and SQL on Hadoop
The origins of Impala can be found in F1 – The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business.
One of many differences between MapReduce and Impala is in Impala the intermediate data moves from process to process directly instead of storing it ...
How to Contribute to HBase and Hadoop2
By Nick Dimiduk
In case you haven’t heard, Hadoop2 is on the way! There are loads more new features than I can begin to enumerate, including lots of interesting enhancements to HDFS for online applications like HBase. One of the most ...






