Hadoop-5 Undeniable Truths
Everyone knows sensationalist headlines can be distracting or inaccurate. The real problem with such overblown headlines is this: Superficial debates are slowing down the true potential of Hadoop, big data, and the evolution of traditional databases.
At Qubole we often receive calls from ...
Hadoop 2.0 and YARN Architecture
What is Hadoop 2.0 & YARN?
First off, a big kudos to Hortonworks for the great webinar by Arun C. Murthy ( who by the way was one of the primary people in building/releasing YARN) in which this post is based ...
An introduction to Apache Hadoop for big data
Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project being built and used by a global community of contributors and users. It ...
6 sparkling features of Apache Spark!
What is Apache Spark? Why there is a serious buzz going-on about this? If you are into BigData analytics business then, should you really care about Spark? Hope this post will help to answer some of these questions which might have ...
Altiscale Hadoop-as-a-Service Delivers Apache Hive 0.13
Altiscale, Inc., a leading innovator in Hadoop-as-a-Service (HaaS) solutions, has announced the availability of Apache Hiveâ„¢ 0.13 on its HaaS platform, just weeks since its general software release to the industry. For data scientists and businesses that rely on insights ...
Actian, HP Vertica Join SQL-On-Hadoop Bandwagon
Actian on Tuesday joined the long list of companies that have introduced a way to support SQL access and querying on top of Hadoop. The announcement comes just a week after HP upgraded SQL-on-Hadoop functionality it introduced late last year ...
Can Super-Fast Apache Spark Light Up Hadoop?
it the Hadoop Swiss Army knife of cluster computing frameworks. The Apache Software Foundation just rolled out Apache Spark v1.0, which it's calling a "super-fast, open-source, large-scale Relevant Products/Services data Relevant Products/Services processing and advanced analytics Relevant Products/Services engine."
That's a ...
Enjoy machine learning with Mahout on Hadoop
"Mahout" is a Hindi term for a person who rides an elephant. The elephant, in this case, is Hadoop -- and Mahout is one of the many projects that can sit on top of Hadoop, although you do not always ...
5 tips to get started with big data
Everyone seems to be talking about "big data" these days. Do you wonder what you’re missing out on? Let’s take a look at how you can get started with Big Data.
Learn what it is, and what it is not. While ...
Snappy compression with Pig and native MapReduce
Assuming you have installed Hadoop on your cluster, if not please follow http://code.google.com/p/hadoop-snappy/
This is the machine config of my cluster nodes, though the steps that follow could be followed with your installation/machine configs
pkommireddi@pkommireddi-wsl:/tools/hadoop/pig-0.9.1/lib$ uname -a
Linux pkommireddi-wsl 2.6.32-37-generic #81-Ubuntu SMP Fri ...






