9 Questions to Ask Before Kicking off any Big Data Project
What do you get when you combine rebranded analytics systems, a minefield of consultants turned “big data experts,” and insanely expensive “big data servers” that look suspiciously similar to commodity machines?
You get: the most complicated space for any business or ...
5 NoSQL Predictions for 2015
2014 has been another interesting year in Big Data with long-awaited IPOs, massive fund-raising and increased awareness around the data management market. So, what does this mean for the future of Enterprise NoSQL and the impact on enterprises across industries ...
Apache Tajo brings data warehousing to Hadoop
Organizations that want to extract more intelligence from their Hadoop deployments might find help from the relatively little known Tajo open source data warehouse software, which the Apache Software Foundation has pronounced as ready for commercial use.
The new version of ...
MapReduce for C: Run Native Code in Hadoop
Google announced the release of MapReduce for C (MR4C), an open source framework that allows you to run native code in Hadoop.
MR4C was originally developed at Skybox Imaging to facilitate large scale satellite image processing and geospatial data science. We ...
10 ways to query Hadoop with SQL
SQL: old and busted. Hadoop: new hotness. That's the conventional wisdom, but the sheer number of projects putting a convenient SQL front end on Hadoop data stores shows there's a real need for products running SQL queries against data that ...
A New Python Client for Impala
The new Python client for Impala will bring smiles to Pythonistas!
As a data scientist, I love using the Python data stack. I also love using Impala to work with very large data sets. But things that take me out of ...
Intro to Machine Learning
Machine learning is sub set of artificial intelligence and it is study of systems that can learn from data. A machine learning system could be trained. Core of machine learning deals with representation and generalization.
Machine learning is a "Field of ...
Using Apache Hadoop and Impala together with MySQL for data analysis
Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from ...
Apache Spark is now part of MapR’s Hadoop distribution
Hadoop vendor MapR is getting in early on the Apache Spark action, too, announcing on Thursday that it’s adding the Spark stack to its Hadoop distribution as part of a partnership with Spark startup Databricks (Ion Stoica, the co-founder and CEO of ...
How To Choose The Best Tool For Your Big Data Project
Trying to choose the right tool for a big data project? This chart (and three simple rules) can help guide you through the options. This chart is based on one shown by Microsoft Research senior research program manager Wenming Ye ...






