Using Apache Hadoop and Impala together with MySQL for data analysis
Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from ...
How to Run a Simple Apache Spark App in CDH 5
Getting started with Spark (now shipping inside CDH 5) is easy using this simple example.
Apache Spark is a general-purpose, cluster computing framework that, like MapReduce in Apache Hadoop, offers powerful abstractions for processing large datasets. For various reasons pertaining to ...
Google BigQuery and Datastore Connectors for Hadoop
Users of Google’s cloud platform should find it easier to run Hadoop jobs directly against data in Google BigQuery and Google Cloud Datastore from now on.
we are making it easier for you to run Hadoop jobs directly against your data ...
Impala and SQL on Hadoop
The origins of Impala can be found in F1 – The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business.
One of many differences between MapReduce and Impala is in Impala the intermediate data moves from process to process directly instead of storing it ...
Pivotal Launches Its Big Data Suite
When Pivotal was spun out of VMware and EMC, many people were excited about a well-funded entity, chock full of some of the coolest modern tech, and without the hang-ups of having to think about existing products or revenue streams. ...
Apache Falcon-Data Governance for Hadoop
Apache Falcon is a data governance engine that defines, schedules, and monitors data management policies. Falcon allows Hadoop administrators to centrally define their data pipelines, and then Falcon uses those definitions to auto-generate workflows in Apache Oozie.
InMobi is one of ...
Pivotal juices Hadoop with in-memory database and SQL querying
Pivotal, an EMC/VMware spin-off that has big plans to deliver big data analytics through platform as a service, has whisked the drapes off Pivotal HD 2.0, its commercially supported enterprise-grade distribution of Hadoop.
But Pivotal's ambitions for HD don't simply involve ...
HBase BlockCache Showdown
The HBase BlockCache is an important structure for enabling low latency reads. As of HBase 0.96.0, there are no less than three different BlockCache implementations to choose from. But how to know when to use one over the other? There’s ...
What Can GPFS on Hadoop Do For You?
The Hadoop Distributed File System (HDFS) is considered a core component of Hadoop, but it’s not an essential one. Lately, IBM has been talking up the benefits of hooking Hadoop up to the General Parallel File System (GPFS). IBM has ...
Pivotal Brings In-Memory Analysis To Hadoop
Pivotal, the EMC spin-off company pursuing modern application development in the context of cloud computing and big-data analysis, on Monday released Pivotal HD 2.0, an update of its Hadoop distribution incorporating an in-memory database and a battery of new analysis ...






