Hadoop Resources
Spotlight on the early history of Hadoop
The history of Hadoop: From 4 nodes to the future of data
Big Ideas: Demystifying Hadoop
What is MapReduce?
“Cluster Computing and MapReduce Lecture” series in YouTube
http://www.youtube.com/watch?v=yjPBkvYh-ss
http://www.youtube.com/watch?v=-vD6PUdf3Js
http://www.youtube.com/watch?v=5Eib_H_zCEY
http://www.youtube.com/watch?v=1ZDybXl212Q
http://www.youtube.com/watch?v=BT-piFBP4fE
http://labs.google.com/papers/mapreduce.html
http://code.google.com/edu/parallel/mapreduce-tutorial.html
What is Hadoop?
http://radar.oreilly.com/2012/02/what-is-apache-hadoop.html
http://gigaom.com/cloud/what-it-really-means-when-someone-says-hadoop
http://www.ibm.com/developerworks/data/library/techarticle/dm-1209hadoopbigdata/
What is HDFS?
The paper covers most of the HDFS features except for the HDFS federation which was introduced in 0.23 release and HDFS High Availability feature which will be included in the coming Hadoop release 0.24.
HDFS as comic for the young.
HDFS Federation was introduced in 0.23 release to have multiple NameNodes in a cluster.
About HDFS from `The Architecture of Open Source Applications`.
MapReduce Algorithms
http://www.cloudera.com/videos/mapreduce_algorithms
http://www.umiacs.umd.edu/~jimmylin/MapReduce-book-final.pdf
http://atbrox.com/2011/11/09/mapreduce-hadoop-algorithms-in-academic-papers-5th-update-%E2%80%93-nov-2011/
What MapReduce (Hadoop) can’t solve
http://blog.zillabyte.com/post/10814100500/hadoop-doesnt-solve-all-problems
Hadoop HelloWorld
http://hadoop.apache.org/common/docs/r0.20.205.0/mapred_tutorial.html
Setting up a Hadoop Cluster (Ubuntu)
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
Setting up Hadoop (Windows)
http://hortonworks.com/blog/hadoop-in-windows/
http://hortonworks.com/blog/installing-hadoop-on-windows/
http://v-lad.org/Tutorials/Hadoop/00%20-%20Intro.html
http://blogs.msdn.com/b/avkashchauhan/
Benchmarking and Stress Testing an Hadoop Cluster
Benchmarking and Stress Testing an Hadoop Cluster With TeraSort, TestDFSIO & Co.
Testing Hadoop Jobs
Advice on QA Testing Your MapReduce Jobs
Hadoop Tutorial
http://developer.yahoo.com/hadoop/tutorial/
Hadoop Streaming
Writing an Hadoop MapReduce Program in Python
Hardware for Hadoop
Hardware Recommendations for Apache Hadoop
How-to: Select the Right Hardware for Your New Hadoop Cluster
Hadoop/HBase Capacity Planning
Best Practices for Selecting Apache Hadoop Hardware
Hardware Recommendations For Apache Hadoop
Books
Hadoop – The Definitive Guide (would recommend it – my review here)
Pro Hadoop (Didn’t get a chance)
Hadoop in Action (Didn’t get a chance)
Public big data sets
https://delicious.com/pskomoroch/dataset
http://wiki.gephi.org/index.php/Datasets
http://stackoverflow.com/questions/10843892/download-large-data-for-hadoop
http://datamob.org/datasets
http://konect.uni-koblenz.de/
http://snap.stanford.edu/data/
http://archive.ics.uci.edu/ml/
https://bitly.com/bundles/hmason/1
http://www.inside-r.org/howto/finding-data-internet
http://goo.gl/Jecp6
http://ftp3.ncdc.noaa.gov/pub/data/noaa/1990/
http://data.cityofsantacruz.com/
http://bitly.com/bundles/hmason/1
Big data challenges
What are Hadoop/MR alternatives?
http://gigaom.com/cloud/why-the-days-are-numbered-for-hadoop-as-we-know-it/
BSP vs MapReduce – http://arxiv.org/abs/1203.2081
General (uncategorized)
http://www.cloudera.com/resources/?type=Training
http://www.cloudera.com/blog/2012/01/hadoop-world-2011-videos-and-slides-available/