SQL-on-Hadoop Engines AtScale, provider of BI (Business Intelligence) on Hadoop, has released its study titled “The Business Intelligence Benchmark for SQL-on-Hadoop engines,” which is a performance test of BI workloads on Hadoop. The report also studies the strengths and weaknesses of Hive, Presto, Impala and Spark SQL, which are the most popular analytical engines for Hadoop.

“As enterprises adopt Hadoop more broadly, business intelligence (BI) and analytical use cases on Hadoop have expanded from strong, but limited, adoption among data scientists,” says John L. Myers, Managing Research Director, Enterprise Management Associates (EMA). “Now, organizations need to make the data within their Hadoop clusters available and ‘business critical’ to a wider business stakeholder audience. BI on Hadoop is a logical use case to help them accomplish that growth in adoption and acceptance.”

The first edition of the benchmark was conducted in February, 2016. Some key findings of this report are:

  • Increased open-source innovation because of:
  • Improved performance due to 2x to 4x improvements in performance in all the engines.
  • Spark 2.0 is significantly faster than spark 1.6.
  • The Impala engine being donated by Cloudera to the Apache Foundation, will help the Hadoop community.
  • Each engine performs differently for different query types, but they are all very closely tied and no single winner can be identified. The “sweet spot” of each engine would depend on the size of raw data, complexity of query and the number of targeted end-users
  • Impala, Hive, Spark and Presto are effective for large data sets, like 6 bn rows of data.
  • Very little query degradation was shown by Presto and Impala during user concurrency testing, showing that they were better suited for concurrent query workload.
  • All SQL-on-Hadoop engines can effectively support BI workloads
  • Spark and Impala are effective for small queries.

“The increasing demand for BI-on-Hadoop workload has truly driven the community to innovate in a short period of time,” says Josh Klahr, Vice President, Product Management, AtScale and who has seen noticed a significant improvement in the benchmark results. “The communities supporting the open source SQL-on-Hadoop projects have been working diligently to advance innovation in this field. We’ve aligned our vision with these open-source engines since day one. We are pleased to see that this bet is paying off: by simply supporting the latest versions of Impala, Spark SQL and Hive, AtScale customers are now querying their Big Data up to 4x faster than six months ago.”

According to the study, AtScale predicts that in the future open-source innovation will out-scale the vendors.

With technology maturing, Business Intelligence is gaining more demand than Data Science and ETL, thereby creating more avenues for the use of Hadoop in enterprises, predicts the “Hadoop Maturity Survey”. Source