Selecting the right SQL-on-Hadoop engine to access big data

With SQL-on-Hadoop technologies, it's possible to access big data stored in Hadoop by using the familiar SQL language. Users can plug in almost any reporting or analytical tool to analyze and study the data. Before SQL-on-Hadoop, accessing big data was ...

Apache Mahout is moving on from MapReduce

Apache Mahout, a machine learning library for Hadoop since 2009, is joining the exodus away from MapReduce. The project’s community has decided to rework Mahout to support the increasingly popular Apache Spark in-memory data-processing framework, as well as the H2O engine for ...

Apache Falcon-Data Governance for Hadoop

Apache Falcon is a data governance engine that defines, schedules, and monitors data management policies. Falcon allows Hadoop administrators to centrally define their data pipelines, and then Falcon uses those definitions to auto-generate workflows in Apache Oozie. InMobi is one of ...

Avoiding Split Brainedness in HA Hadoop Clusters

The US Patent Office recently granted Zettaset a patent for the underlying technology in its Hadoop high availability that prevents a "split-brain" situation where multiple master nodes think they're in control of the Hadoop cluster. It's a feather in the ...
1 28 29 30 31 32 33