The Hadoop Ecosystem: HDFS, Yarn, Hive, Pig, HBase and growing

Hadoop is the leading open-source software framework developed for scalable, reliable and distributed computing. With the world producing data in the zettabyte range there is a growing need for cheap, scalable, reliable and fast computing to process and make sense ...

Snappy compression with Pig and native MapReduce

Assuming you have installed Hadoop on your cluster, if not please follow http://code.google.com/p/hadoop-snappy/ This is the machine config of my cluster nodes, though the steps that follow could be followed with your installation/machine configs pkommireddi@pkommireddi-wsl:/tools/hadoop/pig-0.9.1/lib$ uname -a Linux pkommireddi-wsl 2.6.32-37-generic #81-Ubuntu SMP Fri ...