Hadoop-elephant-300x248History of Hadoop

Spotlight on the early history of Hadoop

The history of Hadoop: From 4 nodes to the future of data
Big Ideas: Demystifying Hadoop

What is MapReduce?

“Cluster Computing and MapReduce Lecture” series in YouTube

http://www.youtube.com/watch?v=yjPBkvYh-ss

http://www.youtube.com/watch?v=-vD6PUdf3Js

http://www.youtube.com/watch?v=5Eib_H_zCEY

http://www.youtube.com/watch?v=1ZDybXl212Q

http://www.youtube.com/watch?v=BT-piFBP4fE

http://labs.google.com/papers/mapreduce.html

http://code.google.com/edu/parallel/mapreduce-tutorial.html 

What is Hadoop?

http://radar.oreilly.com/2012/02/what-is-apache-hadoop.html
http://gigaom.com/cloud/what-it-really-means-when-someone-says-hadoop
http://www.ibm.com/developerworks/data/library/techarticle/dm-1209hadoopbigdata/

What is HDFS?

The paper covers most of the HDFS features except for the HDFS federation which was introduced in 0.23 release and HDFS High Availability feature which will be included in the coming Hadoop release 0.24.

HDFS as comic for the young.

HDFS Federation was introduced in 0.23 release to have multiple NameNodes in a cluster.

About HDFS from `The Architecture of Open Source Applications`.

MapReduce Algorithms

 

http://www.cloudera.com/videos/mapreduce_algorithms

http://www.umiacs.umd.edu/~jimmylin/MapReduce-book-final.pdf
http://atbrox.com/2011/11/09/mapreduce-hadoop-algorithms-in-academic-papers-5th-update-%E2%80%93-nov-2011/

What MapReduce (Hadoop) can’t solve

http://blog.zillabyte.com/post/10814100500/hadoop-doesnt-solve-all-problems

Hadoop HelloWorld

http://hadoop.apache.org/common/docs/r0.20.205.0/mapred_tutorial.html

Setting up a Hadoop Cluster (Ubuntu)

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

Setting up Hadoop (Windows)

http://hortonworks.com/blog/hadoop-in-windows/
http://hortonworks.com/blog/installing-hadoop-on-windows/

http://v-lad.org/Tutorials/Hadoop/00%20-%20Intro.html
http://blogs.msdn.com/b/avkashchauhan/

Benchmarking and Stress Testing an Hadoop Cluster

Benchmarking and Stress Testing an Hadoop Cluster With TeraSort, TestDFSIO & Co.

Testing Hadoop Jobs
Advice on QA Testing Your MapReduce Jobs

Hadoop Tutorial

http://developer.yahoo.com/hadoop/tutorial/

Hadoop Streaming

Writing an Hadoop MapReduce Program in Python
Hardware for Hadoop
Hardware Recommendations for Apache Hadoop
How-to: Select the Right Hardware for Your New Hadoop Cluster
Hadoop/HBase Capacity Planning
Best Practices for Selecting Apache Hadoop Hardware

Comparing Hadoop Appliances

Hardware Recommendations For Apache Hadoop

Books

Hadoop – The Definitive Guide (would recommend it – my review here)

Pro Hadoop (Didn’t get a chance)

Hadoop in Action (Didn’t get a chance)

Public big data sets

https://delicious.com/pskomoroch/dataset
http://wiki.gephi.org/index.php/Datasets
http://stackoverflow.com/questions/10843892/download-large-data-for-hadoop
http://datamob.org/datasets
http://konect.uni-koblenz.de/
http://snap.stanford.edu/data/
http://archive.ics.uci.edu/ml/
https://bitly.com/bundles/hmason/1
http://www.inside-r.org/howto/finding-data-internet
http://goo.gl/Jecp6
http://ftp3.ncdc.noaa.gov/pub/data/noaa/1990/
http://data.cityofsantacruz.com/
http://bitly.com/bundles/hmason/1

Big data challenges

https://www.kaggle.com/

What are Hadoop/MR alternatives?

http://gigaom.com/cloud/why-the-days-are-numbered-for-hadoop-as-we-know-it/
BSP vs MapReduce – http://arxiv.org/abs/1203.2081 

General (uncategorized)

http://academy.mapr.com/

http://www.cloudera.com/resources/?type=Training
http://www.cloudera.com/blog/2012/01/hadoop-world-2011-videos-and-slides-available/