Apache Ambari 1.5.1 is Released!
Apache Ambari community proudly released version 1.5.1. This is the result of constant, concerted collaboration among the Ambari project’s many members. This release represents the work of over 30 individuals over 5 months and, combined with the Ambari 1.5.0 release, ...
Using Apache Hadoop and Impala together with MySQL for data analysis
Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from ...
Apache Hive Updated with SQL-on-Hadoop Features
The Apache Hive community has voted on and released version 0.13. This is a significant release that represents a major effort from over 70 members who worked diligently to close out over 1080Â JIRA tickets.
Hive 0.13 also delivers the third and ...
How to Run a Simple Apache Spark App in CDH 5
Getting started with Spark (now shipping inside CDH 5) is easy using this simple example.
Apache Spark is a general-purpose, cluster computing framework that, like MapReduce in Apache Hadoop, offers powerful abstractions for processing large datasets. For various reasons pertaining to ...
How Accurate is Mahout for Summing Numbers?
A question was recently posted on the Mahout mailing list suggesting that the Mahout math library was "unwashed" because it didn't use Kahan summation. Â My feeling is that this complaint is not founded and Mahout is considerably more washed than ...
Cassandra-Database Solution for modern day applications?
Cassandra is a one stop choice for data driven organizations dealing with real-time Big Data operations for their core functionalities. Now what makes it so dear to the developers and organizations dealing huge databases is a bunch of features that ...
Apache Spark is now part of MapR’s Hadoop distribution
Hadoop vendor MapR is getting in early on the Apache Spark action, too, announcing on Thursday that it’s adding the Spark stack to its Hadoop distribution as part of a partnership with Spark startup Databricks (Ion Stoica, the co-founder and CEO of ...
Using Scala To Work With Hadoop
Cloudera has a great toolkit to work with Hadoop. Â Specifically it is focused on building distributed systems and services on top of the Hadoop Ecosystem.
http://cloudera.github.io/cdk/docs/0.2.0/cdk-data/guide.html
And the examples are in Scala!!!!
Here is how you you work with generic stuff on the ...
10 Big Data Analytics Use Cases for Healthcare IT
Big data means a lot of things to a lot of different people, but what is becoming increasingly clear as the largest market players strategies start to unfold, big data is about real-time analysis and data driven decision-making. Now Big ...
Apache Spark-3 Real-World Use Cases
By Alex Woodie
The Hadoop processing engine Spark has risen to become one of the hottest big data technologies in a short amount of time. And while Spark has been a Top-Level Project at the Apache Software Foundation for barely a ...






