The global big data market reached $208 billion in 2020 and is projected for a steady compound annual growth rate of 10%, reaching $450 billion by 2026, according to Expert Market Research.

Ranging from companies whose big data solutions are helping doctors better treat cancer, to those whose platforms promise to drastically simplify big data search and analytics, here are 16 of the coolest big data startups of the year (so far).

1. Luk Advisor

Founding Year: 2019

Location: Hong Kong

Funding: USD 64 400

Partner for: Smart Retail, Animal Monitoring

Industries: FinTech, FMCG, Manufacturing, Real Estate

Luk Advisor is a Hong Kong-based startup that facilitates data-driven decision-making for the retail, manufacturing, and real estate industries. The startup leverages AI and data science to develop solutions tailored to client requirements. Its solutions include AniDeep, an animal monitoring system for use at zoos, and a natural language processing (NLP)-driven marketing intelligence system. The startup develops bespoke big data products that are optimized for specific requirements of its clients, improving operational efficiency and revenue strategies.

2. Umbrella Network

Founding Year: 2020

Location: George Town, Cayman Islands

Partner for: Asset Tokenization

Industries: FinTech, Energy, Logistics

Umbrella Network is a Caymanian decentralized startup that connects smart contract developers with mid and long-tail crypto assets as well as real-world financial data. The community-owned network also leverages tokenization to increase its security and a hash tree to improve the scalability of the oracle. Besides, the startup’s Layer 2 technology enables developers to write multiple data points in a single on-chain transaction, reducing the cost of batching data to smart contracts.

3. Dryad Networks

Founding Year: 2020

Location: Eberswalde, Germany

Funding: USD 4 M

Partner for: Wildfire Detection

Industries: Forestry, Agriculture

German startup Dryad Networks offers Silvanet, a forest monitoring system. It uses solar-powered gas sensors, a low-power wide-area network (LoRaWAN), and cloud-based big data tools for ultra-early detection of forest fires during the smoldering phase. The cloud-based platform analyzes sensor data like gas, temperature, humidity, and air pressure measurements for timely wildfire detection. Besides, the startup’s mesh gateway enables large-scale deployments, even in areas without mobile network coverage. While the solution is primarily used for forest monitoring, it also finds use in irrigation monitoring and ecosystem management.


Founding Year: 2020

Location: Daejeon, South Korea

Funding: USD 6 M

Partner for: Whole-Genome Sequencing (WGS) Analytics

Industries: Pharma, BioTech

GENOME INSIGHT is a South Korean startup that provides genomics-driven precision medicine solutions. The startup utilizes bioinformatics to analyze and interpret whole genome sequencing data. Unlike analyzing targeted sequencing data, this reveals all genetic aberrations and enables mutation signature detection. The startup also offers organoid techniques that ensure a long-term culture of patient-derived primary tissues and, in turn, accelerate in vitro drug development. The startup’s solution advances therapeutics for oncology, rare and orphan diseases, and inflammatory diseases.

5. Bornio

Founding Year: 2020

Location: Menlo Park, US

Funding: USD 1,1 M

Partner for: Data Privacy Policy Enforcement

Industries: FinTech, Healthcare, Retail

US-based startup Bornio develops compliance enforcing platform for data management. The rising concerns over data theft and leaks require enterprises to ensure stringent data security and compliance. The startup’s platform automates the data privacy ops lifecycle to simplify collaboration between key stakeholders to define, enforce, and monitor data privacy policies. Also, it provides granular access controls to sensitive data, allowing consumers to access data on a need-to-know basis. Bornio also delivers continuous risk assessment, enabling data scientists to improve security and lower vulnerability reaction times.

6. Airbyte

Airbyte is one of the top start-ups in big data as an open-source data integration engine to consolidate data in the data warehouse, data lakes, and databases. It helps to unify data integration pipelines in one open-source ELT platform that can scale with custom or high-volume needs. Big data companies in 2022 can leverage Airbyte’s long tail of high-quality connectors that can adapt to schema and API changes for effective data management. It is focused on big data engineering, data analytics, data science, and engineering.

7. Bigeye

Bigeye is known as one of the emerging big data companies in 2022 for its data observability. This big data start-up is a data observability platform to help measure, enhance, and communicate data quality at any scale with big data technology. It offers to manage broken dashboards, damaged machine learning models, and fix data-driven depression with tools for managing petabyte-scale data platforms. It leverages auto metrics, auto-thresholds, customization of the templating system, and no-code interface to investigate alerts and root causes efficiently.

Bigeye, founded in 2019 and based in San Francisco, raised $17 million in Series A funding in April and then another $45 million in Series B funding in September, financial resources the company is using to accelerate its product development and expand its go-to-market efforts.

8. Innovaccer

Innovaccer helps to connect and process healthcare data to create unified records and meaningful insights about diseases and procedures. It offers the Innovaccer Health Cloud, data activation platform, innovation toolkit, and intelligent application suite to integrate disparate patient data and achieve better health outcomes efficiently. The big data start-up also helps to enable the rapid development of interoperable solutions through developer tool suites and open APIs.

9. Cribl

Top Executive: Clint Sharp, Co-Founder, CEO

Cribl’s observability data engineering software, including its flagship LogStream system, is used to build pipelines for routing high volumes of telemetry data, including machine log, instrumentation, application and metric data, between operational, storage, analytical and security systems

10. Firebolt

Top Executive: Eldad Farkash, Co-Founder, CEO

Firebolt develops a cloud data warehouse with which the startup is boldly competing against such giants as Snowflake and AWS Redshift (while running on AWS, no less). The company touts the speed at scale, ease of use and more affordable operating model of its technology.

11. Grafana Labs

Top Executive: Raj Dutt, Co-Founder, CEO

Grafana Labs develops the popular Grafana open-source data visualization and analytics platform for building data dashboards and visualizations for metric, log and trace data generated by IT infrastructure, networks, cybersecurity tools and other systems. The analytics and visualizations are used by IT and AppDev managers to monitor IT system performance and track users and events.

12. Yugabyte

Top Executive: Bill Cook, CEO

Yugabyte develops YugabyteDB, a next-generation, distributed relational database designed to handle huge amounts of data spanning multiple geographic regions and availability zones. The database supports global, business-critical applications—such as in cybersecurity and financial services—that require low query latency and extreme resilience against failures.

13. Syncari

Top Executive: Nick Bonfiglio, Co-Founder, CEO

Syncari’s no-code data automation platform helps data professionals unify, clean, manage and distribute trusted customer data across an enterprise. The system relies on a range of data synchronization, unification, governance and access capabilities to perform its tasks.

14. Speedata

Top Executive: Jonathan Friedmann, Co-Founder, CEO

Speedata develops an Analytics Processing Unit (APU) that the company describes as the first dedicated processor for optimizing and accelerating data center and cloud-based database and data analytics workloads.

15. Monte Carlo

Top Executive: Barr Moses, Co-Founder, CEO

Monte Carlo’s data observability software is used to monitor data across IT systems, including in databases, data warehouses and data lakes, to gauge and maintain data quality, reliability and lineage—what the company calls “data health.”

16. Molecula

Top Executive: Higinio Maycotte, CEO

Molecula develops FeatureBase, an enterprise feature store that the company says “simplifies, accelerates and controls” access to big data for real-time analytics and machine learning applications.

Originally published July 15, 2014 7:28 am, updated January 6 2022 for relevance and comprehensiveness.