Best resources to learn and understand Hadoop
Here are some best resource to learn and understand Hadoop.
Tutorials
Free videos - MapR Academia
Udacity course
Hortonworks Sandbox
Hadoop Ecosystem
Running Hadoop Map-Reduce
Hadoop Screencasts
Reza Shiftehfar's blog I
Reza Shiftehfar's blog II
Reza Shiftehfar's blog III
Reza Shiftehfar's blog IV
Reza Shiftehfar's blog V
Reza Shiftehfar's blog VI
Reza Shiftehfar's blog ...
7 Facts About Hadoop That You Should Know
Where there is Big Data, there is Hadoop and vice versa. With Big Data analytics becoming as big as they have, Hadoop has become a mainstay in the technology industry.
Hereare a few facts that you should keep in mind when ...
5 technologies that will help big data cross the chasm
We’re on the cusp of a real turning point for big data. Its applications are becoming clearer, its tools are getting easier and its architectures are maturing in a hurry. It’s no longer just about log files, clickstreams and tweets. ...
A New Python Client for Impala
The new Python client for Impala will bring smiles to Pythonistas!
As a data scientist, I love using the Python data stack. I also love using Impala to work with very large data sets. But things that take me out of ...
Apache Ambari 1.5.1 is Released!
Apache Ambari community proudly released version 1.5.1. This is the result of constant, concerted collaboration among the Ambari project’s many members. This release represents the work of over 30 individuals over 5 months and, combined with the Ambari 1.5.0 release, ...
Apache Hive Updated with SQL-on-Hadoop Features
The Apache Hive community has voted on and released version 0.13. This is a significant release that represents a major effort from over 70 members who worked diligently to close out over 1080Â JIRA tickets.
Hive 0.13 also delivers the third and ...
How to Run a Simple Apache Spark App in CDH 5
Getting started with Spark (now shipping inside CDH 5) is easy using this simple example.
Apache Spark is a general-purpose, cluster computing framework that, like MapReduce in Apache Hadoop, offers powerful abstractions for processing large datasets. For various reasons pertaining to ...
How Accurate is Mahout for Summing Numbers?
A question was recently posted on the Mahout mailing list suggesting that the Mahout math library was "unwashed" because it didn't use Kahan summation. Â My feeling is that this complaint is not founded and Mahout is considerably more washed than ...
Google BigQuery and Datastore Connectors for Hadoop
Users of Google’s cloud platform should find it easier to run Hadoop jobs directly against data in Google BigQuery and Google Cloud Datastore from now on.
we are making it easier for you to run Hadoop jobs directly against your data ...
Using Scala To Work With Hadoop
Cloudera has a great toolkit to work with Hadoop. Â Specifically it is focused on building distributed systems and services on top of the Hadoop Ecosystem.
http://cloudera.github.io/cdk/docs/0.2.0/cdk-data/guide.html
And the examples are in Scala!!!!
Here is how you you work with generic stuff on the ...






