Spark is a compelling multi-purpose platform for use cases that span investigative, as well as operational, analytics.
Data science is a broad church. I am a data scientist — or so I’ve been told — but what I do is actually ...
Sqoop is a tool designed to transfer data between Hadoop and relational databases. You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform ...
Hadoop-2.3.0 is the first release for the year 2014, and brings a number of enhancements to the core platform, in particular to HDFS. There are a lot of bug fixes and small changes in this one - you can read ...
Hadoop is designed to run on large clusters of commodity servers – in many cases spanning many physical racks of servers. A physical rack is in many cases a single point of failure (for example, having typically a single switch ...
Hadoop MapReduce jobs are divided into a set of map tasks and reduce tasks that run in a distributed fashion on a cluster of computers. Each task work on a small subset of the data it has been assigned so ...
History of Hadoop
Spotlight on the early history of Hadoop
The history of Hadoop: From 4 nodes to the future of data
Big Ideas: Demystifying Hadoop
What is MapReduce?
"Cluster Computing and MapReduce Lecture" series in YouTube
What is Hadoop?
What is HDFS?
The paper covers most of ...