Actian, HP Vertica Join SQL-On-Hadoop Bandwagon
Actian on Tuesday joined the long list of companies that have introduced a way to support SQL access and querying on top of Hadoop. The announcement comes just a week after HP upgraded SQL-on-Hadoop functionality it introduced late last year through its Vertica database.
Actian and HP join Pivotal (with Greenplum-based HAWQ) and InfiniDB among companies extending existing relational database management systems to run on top of Hadoop’s HDFS file system. Actian said it’s going after Hadoop market-share leader Cloudera and its Impala offering, which was introduced last year as a faster, more SQL-compliant alternative to Hive.
The Actian Analytics Platform Hadoop SQL Edition, due out by the end of this month, beats Impala with even faster querying and ISO SQL 92 compliance, according to Actian CTO Mike Hoskins.
“We’re offering full-functioning, SQL-complete functionality running natively on Hadoop, and we’re also the highest-performing SQL database running on Hadoop,” Hoskins told InformationWeek in a phone interview. “If you add those two together, we have an advantage that’s hugely important for customers looking to empower their SQL users.”
Actian has acquired and consolidated into its Actian Analytics Platform technologies including the ParAccel and Vectorwise databases and Pervasive DataRush data-integration software. The new SQL-on-Hadoop option uses what’s now called the Vector engine for parallelized querying on HDFS. Actian’s testing shows its query performance will be as much as 30 times faster than Impala, Hoskins said.
HP introduced SQL-on-Hadoop capabilities on its columnar Vertica database late last year by eliminating its proprietary storage layer so it could work with Hadoop-native file formats including JSON, Parquet, Thrift, and others. In last week’s release, dubbed Dragline, HP eliminated all separation between Hadoop and Vertica clusters.
“That means Vertica can coexist with the Hadoop cluster, and we can access and query against HDFS data leaving it where it is,” said Eamon O’Neill, HP’s Vertica product manager in a phone interview with InformationWeek. Vertica is also capable of doing SQL queries against semi-structured data including clickstreams and Web session data, according to O’Neil.
Actian’s architecture does not require a separate cluster, but it appears to be a step behind HP in that it has to load new data or convert existing data inside Hadoop into its proprietary database storage format to support SQL querying. Actian says support for Hadoop-native file formats are on the roadmap for a future release.
There’s more to the Actian and HP announcements. Actian, for example, boasts 200 connectors to enterprise data systems and YARN-certified data processing and ETL on top of Hadoop. HP enhanced Vertica with live aggregate lookups for enhanced customer personalization analysis, sentiment analysis against short text streams such as Twitter tweets, and improved workload-management features. But the big news for both companies is clearly SQL-on-Hadoop support.
Despite the profusion of options for using SQL against big data, Hive remains the most widely used query tool with Hadoop. On that front Hortonworks says the latest generation of Hive offers greatly improved performance. Nonetheless, Hive and Impala both fall short of relational databases in SQL functionality, according to Forrester analyst Mike Gualtieri.
“Vendors have obsessed about performance, but the question is, can you run the queries you need to run?” Gualtieri told InformationWeek. “Impala still has work to do, but Actian, Pivotal, and Vertica are far more likely to support the queries that companies already have in use.” Read more