A New Python Client for Impala
The new Python client for Impala will bring smiles to Pythonistas!
As a data scientist, I love using the Python data stack. I also love using Impala to work with very large data sets. But things that take me out of ...
Cloudera, MongoDB partner to mash up NoSQL, Hadoop
Hadoop specialist Cloudera announced a strategic partnership with MongoDB this week that will allow Cloudera customers to store Hadoop data in their NoSQL MongoDB databases. The move is a huge win for MongoDB, which is quickly emerging as one of ...
Will Big Data Analytics Replace BI?
John Leonard of Computing.co.uk recently asked, “In 10 years’ time will big data analytics have replaced traditional relational business intelligence systems? That was one of the questions that Computing asked UK IT decision-makers in the quantitative survey that formed part of our recent big data ...
Bringing the Best of Apache Hive 0.13 to CDH Users
More than 300 bug fixes and stable features in Apache Hive 0.13 have already been backported into CDH 5.0.0.
Last week, the Hive community voted to release Hive 0.13. We’re excited about the continued efforts and progress in the project and ...
Intro to Machine Learning
Machine learning is sub set of artificial intelligence and it is study of systems that can learn from data. A machine learning system could be trained. Core of machine learning deals with representation and generalization.
Machine learning is a "Field of ...
Apache Ambari 1.5.1 is Released!
Apache Ambari community proudly released version 1.5.1. This is the result of constant, concerted collaboration among the Ambari project’s many members. This release represents the work of over 30 individuals over 5 months and, combined with the Ambari 1.5.0 release, ...
10 Hadoop Hardware Leaders
Hadoop software is designed to orchestrate massively parallel processing on relatively low-cost servers that pack plenty of storage close to the processing power. All the power, reliability, redundancy, and fault tolerance is built into the software, which distributes the data ...
Using Apache Hadoop and Impala together with MySQL for data analysis
Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from ...
Apache Hive Updated with SQL-on-Hadoop Features
The Apache Hive community has voted on and released version 0.13. This is a significant release that represents a major effort from over 70 members who worked diligently to close out over 1080Â JIRA tickets.
Hive 0.13 also delivers the third and ...
How to Run a Simple Apache Spark App in CDH 5
Getting started with Spark (now shipping inside CDH 5) is easy using this simple example.
Apache Spark is a general-purpose, cluster computing framework that, like MapReduce in Apache Hadoop, offers powerful abstractions for processing large datasets. For various reasons pertaining to ...






