Using Apache Hadoop and Impala together with MySQL for data analysis

Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from ...

24 April 2014 Analytics, Big Data, Cloud Computing, Cloudera, Couchbase, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Pig, Predictive Analytics, SAS, Splunk

How to Run a Simple Apache Spark App in CDH 5

Getting started with Spark (now shipping inside CDH 5) is easy using this simple example. Apache Spark is a general-purpose, cluster computing framework that, like MapReduce in Apache Hadoop, offers powerful abstractions for processing large datasets. For various reasons pertaining to ...

22 April 2014 Analytics, Big Data, Cloudera, Couchbase, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Predictive Analytics, Splunk

Google BigQuery and Datastore Connectors for Hadoop

Users of Google’s cloud platform should find it easier to run Hadoop jobs directly against data in Google BigQuery and Google Cloud Datastore from now on. we are making it easier for you to run Hadoop jobs directly against your data ...

18 April 2014 Analytics, Big Data, Cloud Computing, Google, Hadoop News, Hadoop Tutorials, HBase, Hive, MapReduce News, MongoDB News, NoSQL News, Predictive Analytics, SAS

10 Big Data Analytics Use Cases for Healthcare IT

Big data means a lot of things to a lot of different people, but what is becoming increasingly clear as the largest market players strategies start to unfold, big data is about real-time analysis and data driven decision-making. Now Big ...

07 April 2014 Analytics, Big Data, Big Data Use Cases, Cloudera, Couchbase, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Predictive Analytics

Impala and SQL on Hadoop

The origins of Impala can be found in F1 – The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business. One of many differences between MapReduce and Impala is in Impala the intermediate data moves from process to process directly instead of storing it ...

05 April 2014 Analytics, Big Data, Cloudera, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Pig, Predictive Analytics, Splunk

Pivotal Launches Its Big Data Suite

When Pivotal was spun out of VMware and EMC, many people were excited about a well-funded entity, chock full of some of the coolest modern tech, and without the hang-ups of having to think about existing products or revenue streams. ...

03 April 2014 Analytics, Big Data, Cloudera, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Predictive Analytics

A guide to NoSQL offerings

Amazon Web Services: DynamoDB is a NoSQL database service that makes it simple and cost-effective to store and retrieve any amount of data and serve any level of request traffic. Users simply tell the service how many requests need to ...

29 March 2014 Analytics, Big Data, Couchbase, Google, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Pig

Apache Falcon-Data Governance for Hadoop

Apache Falcon is a data governance engine that defines, schedules, and monitors data management policies. Falcon allows Hadoop administrators to centrally define their data pipelines, and then Falcon uses those definitions to auto-generate workflows in Apache Oozie. InMobi is one of ...

27 March 2014 Analytics, Big Data, Cassandra, Couchbase, Hadoop News, Hadoop Tutorials, HBase, Hive, Impala, MapReduce News, MongoDB News, NoSQL News, Pig, Predictive Analytics

Pivotal juices Hadoop with in-memory database and SQL querying

Pivotal, an EMC/VMware spin-off that has big plans to deliver big data analytics through platform as a service, has whisked the drapes off Pivotal HD 2.0, its commercially supported enterprise-grade distribution of Hadoop. But Pivotal's ambitions for HD don't simply involve ...

25 March 2014 Analytics, Big Data

HBase BlockCache Showdown

The HBase BlockCache is an important structure for enabling low latency reads. As of HBase 0.96.0, there are no less than three different BlockCache implementations to choose from. But how to know when to use one over the other? There’s ...