Using Apache Hadoop and Impala together with MySQL for data analysis
Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from ...
How to Run a Simple Apache Spark App in CDH 5
Getting started with Spark (now shipping inside CDH 5) is easy using this simple example.
Apache Spark is a general-purpose, cluster computing framework that, like MapReduce in Apache Hadoop, offers powerful abstractions for processing large datasets. For various reasons pertaining to ...
Google BigQuery and Datastore Connectors for Hadoop
Users of Google’s cloud platform should find it easier to run Hadoop jobs directly against data in Google BigQuery and Google Cloud Datastore from now on.
we are making it easier for you to run Hadoop jobs directly against your data ...
10 Big Data Analytics Use Cases for Healthcare IT
Big data means a lot of things to a lot of different people, but what is becoming increasingly clear as the largest market players strategies start to unfold, big data is about real-time analysis and data driven decision-making. Now Big ...
Impala and SQL on Hadoop
The origins of Impala can be found in F1 – The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business.
One of many differences between MapReduce and Impala is in Impala the intermediate data moves from process to process directly instead of storing it ...
Pivotal Launches Its Big Data Suite
When Pivotal was spun out of VMware and EMC, many people were excited about a well-funded entity, chock full of some of the coolest modern tech, and without the hang-ups of having to think about existing products or revenue streams. ...
A guide to NoSQL offerings
Amazon Web Services: DynamoDB is a NoSQL database service that makes it simple and cost-effective to store and retrieve any amount of data and serve any level of request traffic. Users simply tell the service how many requests need to ...
Apache Falcon-Data Governance for Hadoop
Apache Falcon is a data governance engine that defines, schedules, and monitors data management policies. Falcon allows Hadoop administrators to centrally define their data pipelines, and then Falcon uses those definitions to auto-generate workflows in Apache Oozie.
InMobi is one of ...
Pivotal juices Hadoop with in-memory database and SQL querying
Pivotal, an EMC/VMware spin-off that has big plans to deliver big data analytics through platform as a service, has whisked the drapes off Pivotal HD 2.0, its commercially supported enterprise-grade distribution of Hadoop.
But Pivotal's ambitions for HD don't simply involve ...
HBase BlockCache Showdown
The HBase BlockCache is an important structure for enabling low latency reads. As of HBase 0.96.0, there are no less than three different BlockCache implementations to choose from. But how to know when to use one over the other? There’s ...






