BI Professionals Spend 50-90% of Their Time ‘Cleaning’ Raw Data for Analytics
Last year, the NYT shined a light on big data’s “janitor” problem – that data scientists and business intelligence pros spend too much time cleaning, not evaluating data. But how big of an issue is it, really?
Xplenty just wrapped a commissioned study of +200 BI pros and found that a third spend 50-90% of their time just cleaning raw data. This is one of the first reports to tie an actual # to the ETL process.
A few other findings from the study:
– 51% do all of their raw data preparation on-premise vs. 49% in-the-cloud
– But, the majority of those on-premise (51%) do want to move to the cloud
– 55% said that merging data from different sources is their main challenge
Xplenty the big data integration platform that makes it easy to process more data more quickly, today announced the results of a new commissioned study aimed at understanding the challenges business intelligence professionals face in preparing raw data for analytics. The study focused on several areas of the ETL (Extract, Transform and Load) process, including preferences for on-premise or cloud-based solutions, perceived challenges, and the amount of time spent on ETL. 97% of those surveyed said ETL was critical for their business intelligence efforts. For the study, more than 200 BI professionals from across the U.S. were polled between May 1-11, 2015.
Cloud vs. On-Premise
More than half (51%) of BI professionals polled said that they currently leverage on-premise ETL solutions, versus 49% cloud-based. However, of those who said they presently use on-premise ETL tools, 51% said that they were “strongly considering” moving all ETL processes to the cloud.
“While many organizations still rely heavily on existing on-premise IT for ETL, the desire to shift to a more cloud-based model has never been stronger,” said Yaniv Mor, CEO & Co-Founder of Xplenty. “Cloud ETL offers a host of benefits over on-premise, from increased agility in resource deployment to reduced costs. As such, the cloud is an increasingly attractive option from both a performance and operational perspective.”
Data Preparation Challenges
When asked what the biggest challenges were in making data “analytics-ready,” 55% said integrating data from different platforms, followed by transforming, cleansing and formatting incoming data (39%), integrating relational and non-relational data (32%), and the sheer volume of data that needs to be managed (21%) at any given time.
“Reformatting, cleansing and consolidating large volumes of data from multiple sources can be overwhelming,” said Mor. “BI professionals are still struggling with the best approach to shorten the time between integration and analytics. As a result, businesses are often slow to unlock their data’s true potential for revenue or operational improvements.”
Time Spent Prepping Data
Nearly a third of those polled (30%) said that they spend between 50-90% of their time just on ETL alone.
“BI professionals should be spending the majority of their time evaluating data and deciphering patterns gleaned through the analytics process—not readying data for analytics,” said Mor. “The more time they spend making raw data analytics usable, the less time they have to generate real value from it. We have to accelerate Big Data’s ‘time-to-insight,’ boosting efficiency and bringing more immediate answers to an organization so that they can more quickly take advantage of them.”