Pivotal juices Hadoop with in-memory database and SQL querying

Pivotal, an EMC/VMware spin-off that has big plans to deliver big data analytics through platform as a service, has whisked the drapes off Pivotal HD 2.0, its commercially supported enterprise-grade distribution of Hadoop.

But Pivotal’s ambitions for HD don’t simply involve delivering Hadoop as a free-form building block, albeit one that’s professionally supported. Rather, HD is intended to be the data fabric of the company’s own Pivotal One, a PaaS offering where companies can develop apps that siphon data in real time from a variety of sources and transform them into actionable information.

HD 2.0 is built on top of Apache Hadoop 2.2, but adds a good deal of proprietary technology — a move that will likely leave open source purists wincing — to make Hadoop the substrate of what Pivotal calls a “business data lake” architecture. One of those proprietary pieces is HAWQ, a SQL query engine designed to perform parallelized queries on data stored in HDFS; another component, GemFire XD, is an in-memory database service designed more for processing of incoming data in real time, as opposed to long-running SQL queries. HD 2.0 also includes GraphLab, a graph analytics algorithms package, and tools to allow programmers using R, Python, and Java to “enable business logic and procedures otherwise cumbersome with SQL.”

Other distributions have done little more than package up Hadoop for easier delivery and provisioning under the assumption that the deploying parties would know best how to make the most of it — an attitude that’s persisted with Red Hat and Hortonworks joining forces for the sake of supporting Hadoop in Red Hat Enterprise Linux. There, the application and data-access sides have largely consisted of the likes of Red Hat’s JBoss data layer. Enterprise developers still have to fit many more of the pieces together themselves.

Pivotal, on the other hand, is using Hadoop as an underlying stratum on which to build its PaaS. To that end, Pivotal One is meant to be directly useful to enterprises needing big data analytics by allowing them to leverage more of the data-access paradigms they’re already familiar with (such as SQL) instead of forcing them to scrap everything and learn the Hadoop way. Again, Hadoop purists aren’t going to be happy with this news, but Pivotal most wants to satisfy its enterprise customers with big data needs.

When InfoWorld’s Eric Knorr pondered the launch of Pivotal back in April 2013, he considered the possibility that Pivotal One was being built as much for Pivotal itself as it was anyone else — that Pivotal Labs (one of the acquisitions used to form Pivotal) would be “developing the bulk of those next-gen big data applications on Pivotal One for its enterprise customers, rather than enterprises developers using Pivotal One themselves.” Read more

Pivotal juices Hadoop with in-memory database and SQL querying

Related Posts