Using Scala To Work With Hadoop
Cloudera has a great toolkit to work with Hadoop. Â Specifically it is focused on building distributed systems and services on top of the Hadoop Ecosystem.
http://cloudera.github.io/cdk/docs/0.2.0/cdk-data/guide.html
And the examples are in Scala!!!!
Here is how you you work with generic stuff on the ...
MongoDB 2.6 Released
In the five years since the initial release of MongoDB, and after hundreds of thousands of deployments, we have learned a lot. The time has come to take everything we have learned and create a basis for continued innovation over ...
10 Big Data Analytics Use Cases for Healthcare IT
Big data means a lot of things to a lot of different people, but what is becoming increasingly clear as the largest market players strategies start to unfold, big data is about real-time analysis and data driven decision-making. Now Big ...
Impala and SQL on Hadoop
The origins of Impala can be found in F1 – The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business.
One of many differences between MapReduce and Impala is in Impala the intermediate data moves from process to process directly instead of storing it ...
Selecting the right SQL-on-Hadoop engine to access big data
With SQL-on-Hadoop technologies, it's possible to access big data stored in Hadoop by using the familiar SQL language. Users can plug in almost any reporting or analytical tool to analyze and study the data. Before SQL-on-Hadoop, accessing big data was ...
Configure Eclipse for MapReduce
1. Download load eclipse Europa or Indigo
2. Download Hadoop eclipse plugin eg: hadoop-eclipse-plugin-1.0.3.jar
3. Copy jar in eclipse plugin folder
4. Open eclipse
5. Add Map/Reduce server
6. Add New DFS Location
Location name: localhost
Map/Reduce Master:
port: 9001
DFS Master
port: 9000
Finish
7. New -> others -> Map/Reducer Project
-> ...
How to Contribute to HBase and Hadoop2
By Nick Dimiduk
In case you haven’t heard, Hadoop2 is on the way! There are loads more new features than I can begin to enumerate, including lots of interesting enhancements to HDFS for online applications like HBase. One of the most ...
HBase BlockCache Showdown
The HBase BlockCache is an important structure for enabling low latency reads. As of HBase 0.96.0, there are no less than three different BlockCache implementations to choose from. But how to know when to use one over the other? There’s ...
What Can GPFS on Hadoop Do For You?
The Hadoop Distributed File System (HDFS) is considered a core component of Hadoop, but it’s not an essential one. Lately, IBM has been talking up the benefits of hooking Hadoop up to the General Parallel File System (GPFS). IBM has ...
Using Oozie 4.4.0 with Hadoop 2.2
The current version of Oozie (4.0.0) doesn’t build correctly when you try and target Hadoop 2.2. The Oozie team have a fix going into release 4.0.1 (see OOZIE-1551), but until then you can hack the Maven files to get it ...






