big dataWhile “big data” can be a misunderstood buzzword in tech, there’s no denying that the recent AI and machine learning push is dependent on the labeling and synthesis of huge amounts of training data. A new trend report by advisory firm Ovum predicts that the big data market—currently at $1.7 billion—will swell to $9.4 billion by 2020.

So what do data insiders see happening in the coming year? TechRepublic spoke to several leaders in this field to find out.

Here are five big data trends to watch in 2017, from the experts.

  1. AI and machine learning will increase the need for for big data analytics

There’s no question that the AI boom depends on data labeling and analysis. “Machine learning has really come along,” said Carla Gentry, a data scientist in Louisville, KY. “2017 will be the year we see more expertise, but still it will struggle, with understanding, proper usage and talent.”

  1. Self-service big data tools hitting the web

With advances in data processing and cloud applications, there is a plethora of free data platforms online that make organizing and synthesizing data easy—even for beginners.

  1. Analytics are still struggling to keep up

But even with all the great tools and data warehouses, analytics remain complicated. “Even with giant data warehouses now available on Big Data like Hadoop and Spark, companies still struggle to transfer data from operational systems to analytical systems,” said Zweben. “that gap and enable the seamless combination of both workloads.”

  1. Data cleansing becoming an industry

In order to get training data into machine learning systems, it must first be cleansed, which means making sure that the information in a database has been checked for errors in format, duplications, etcetera. “Machine learning systems are only as good as the data they train on,” said Zweben, “and the secret is transforming raw operational data into learnable features.” The fact that someone visited an online shoe retailer, for instance, “is useful,” he said. “But knowing they went there today is invaluable.”

  1. Democratization of data source