How to Run a Simple Apache Spark App in CDH 5

Getting started with Spark (now shipping inside CDH 5) is easy using this simple example. Apache Spark is a general-purpose, cluster computing framework that, like MapReduce in Apache Hadoop, offers powerful abstractions for processing large datasets. For various reasons pertaining to ...

Using Scala To Work With Hadoop

Cloudera has a great toolkit to work with Hadoop.  Specifically it is focused on building distributed systems and services on top of the Hadoop Ecosystem. http://cloudera.github.io/cdk/docs/0.2.0/cdk-data/guide.html And the examples are in Scala!!!! Here is how you you work with generic stuff on the ...

Selecting the right SQL-on-Hadoop engine to access big data

With SQL-on-Hadoop technologies, it's possible to access big data stored in Hadoop by using the familiar SQL language. Users can plug in almost any reporting or analytical tool to analyze and study the data. Before SQL-on-Hadoop, accessing big data was ...
1 11 12 13 14 15