5 tips to get started with big data
Everyone seems to be talking about "big data" these days. Do you wonder what you’re missing out on? Let’s take a look at how you can get started with Big Data.
Learn what it is, and what it is not. While ...
Snappy compression with Pig and native MapReduce
Assuming you have installed Hadoop on your cluster, if not please follow http://code.google.com/p/hadoop-snappy/
This is the machine config of my cluster nodes, though the steps that follow could be followed with your installation/machine configs
pkommireddi@pkommireddi-wsl:/tools/hadoop/pig-0.9.1/lib$ uname -a
Linux pkommireddi-wsl 2.6.32-37-generic #81-Ubuntu SMP Fri ...
Cloudera, MongoDB partner to mash up NoSQL, Hadoop
Hadoop specialist Cloudera announced a strategic partnership with MongoDB this week that will allow Cloudera customers to store Hadoop data in their NoSQL MongoDB databases. The move is a huge win for MongoDB, which is quickly emerging as one of ...
Bringing the Best of Apache Hive 0.13 to CDH Users
More than 300 bug fixes and stable features in Apache Hive 0.13 have already been backported into CDH 5.0.0.
Last week, the Hive community voted to release Hive 0.13. We’re excited about the continued efforts and progress in the project and ...
Apache Ambari 1.5.1 is Released!
Apache Ambari community proudly released version 1.5.1. This is the result of constant, concerted collaboration among the Ambari project’s many members. This release represents the work of over 30 individuals over 5 months and, combined with the Ambari 1.5.0 release, ...
Using Apache Hadoop and Impala together with MySQL for data analysis
Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from ...
Apache Hive Updated with SQL-on-Hadoop Features
The Apache Hive community has voted on and released version 0.13. This is a significant release that represents a major effort from over 70 members who worked diligently to close out over 1080Â JIRA tickets.
Hive 0.13 also delivers the third and ...
How Accurate is Mahout for Summing Numbers?
A question was recently posted on the Mahout mailing list suggesting that the Mahout math library was "unwashed" because it didn't use Kahan summation. Â My feeling is that this complaint is not founded and Mahout is considerably more washed than ...
10 Hot Hadoop Startups to Watch in 2025
It's no secret that data volumes are growing exponentially. What's a bit more mysterious is figuring out how to unlock the value of all of that data. A big part of the problem is that traditional databases weren't designed for ...
Top 7 Tips to Succeed with Big Data
Today all the businesses are focusing and investing on big data Analytics to offer reliable services and to get profits. Big data is playing vital role in making the better business decisions by enabling data scientists and other users to ...






