big data securityBig data can be an asset or a liability. If you analyze your data and keep it secure, you can leverage big data insights and improve your business. However, if your big data isn’t secure, you may face monetary and brand authority losses. As 2019 draws to an end, it’s important to review our past security efforts, and find the spots that require further improvement.

Big Data Security Breaches: 2019 vs 2018

If we compare this year’s data breach statistics to 2018, we discover an alarming increase in the number of breaches. According to It Governance, 421 million records were breached during October 2019. That’s a huge leap from October 2018, during which a mere 44 million records were breached. 

The below table summarizes key changes:

2018 2019
Total amount of records exposed during the year Over 4.5 billion records Over 5.3 billion records
Total amount of records exposed during the biggest data breach Over 500 million records  Over 885 million records 
Average cost of a data breach $3.86 million USD $3.92 million USD
Top cause of the breach 99% of breaches were caused by external attackers 90% of breaches were caused by phishing attacks

 

For the past decade, every year has been named “The Year of the Data Breach”. Some even mock this title, that has been passing around from year to year like a baton in the relay race against breaches. However, there’s no denying that each year more data records get exposed. Each year, attacks get more sophisticated and attackers become more ambitious. 

Data Security Challenges in 2019

Artificial Intelligence (AI) technology does not only give attack bots the ability to attack faster than humans, but also the ability to imitate human communication. The latter enables attackers to trick victims into divulging sensitive and financial data, and into spreading disinformation. 

Open source vulnerabilities have been a well-known security concern for a few years. One that many organizations have taken seriously. Companies like WhiteSource have started offering information through free vulnerability databases. The EU even went as far as offering a prize pool of USD$1 million for its open source bug bounty program.

Advanced Persistent Threats (APT) attacks have started targeting end-users through phishing schemes. Insider threats related to compromised credentials have been gaining more media attention. Surveys reveal that not only non-techie users fall prey to phishing schemes and social engineering—even CSOs fall prey to malicious URLs. While there are insider threats who intentionally seek to damage a targeted organization, the majority of insider threats are victims. 

The risk of insider threats is only exacerbated by the widespread penetration of Internet of Things (IoT) devices. IoT is demanding more distributed and complex network communication architecture. This is true whether it’s a consumer IoT like a smart refrigerator or commercial IoTs like pacemakers and vehicle to infrastructure (V2I) sensors. The more points on the network, the more vulnerable it becomes.  

What Is Big Data Security?

The term big data security refers to practices and tools employed for the purpose of protecting data and analysis processes. The big data perimeter is typically divided into three categories:

  • Incoming data—vulnerable while in transit
  • Storage data—vulnerable while at rest
  • Output data—processed for analysis, vulnerable in use

The goal for big data security is to prevent accidental and intentional breaches, leaks, losses, and exfiltration of huge amounts of data. Big data can be in the form of financial logs, health care data repositories, data lakes and archives, and in-progress business intelligence analyses.

Securing Big Data Analytics in 2020

Standard big data security best practices include: 

  • Encryption—the process of encoding information in a way that renders it useless for attackers. After the data is encrypted, the system generates keys. Only the right key can decrypt the data, and the system rotates keys. This security technique relies on the supposition that attackers won’t be able to re-create the correct decryption key.
  • Tokenization—the data is sent to a third-party mediator, which sends a token to the website. The tokenization system saves the information in a vault, and the website is not storing any financial information. This security technique relies on the supposition that attackers won’t gain access to the tokenization system.
  • Next-Generation Firewall (NGFW)—according to Garner, this is a “deep-packet inspection firewall”. NGFW moves beyond stateful port/protocol inspection and blocking. NGFW is dynamic, and offers features such as application inspection, intrusion prevention, and cloud threat intelligence. 

For endpoint protection, organizations can make use of: 

  • Endpoint Protection Platforms (EPPs)—a passive layer of defense against known threats. Common EPP solutions make use of antivirus and Next-Generation Antivirus (NGAV), encryption, DLP, and NGFW. EPPs typically employ defense techniques such as signature matching, sandboxing, blacklisting and whitelisting.
  • Endpoint Detection and Response (EDR)—an active layer of defense against endpoint threats. EDR solutions usually apply data collection, detection, and analysis techniques. The key goals of EDR are providing real-time threat intelligence, alerts, and forensics. Some EDR solutions provide automated responses and traceback mechanisms.

If an EPP solution includes EDR tools, it gains active defense capabilities. The goal is to ensure that all points in the network are covered, so as to eliminate unauthorized access to your data. The more visibility you gain, the fewer blind spots you’ll have. 

Conclusion

2019 was yet another “Year of the Data Breach” year. The majority of security attacks targeted data. In fact, by looking at security data and patterns, a safe deduction would be that the objective of attacks is almost always data. As attackers gain more advanced and sophisticated tools and techniques, we’ll continue to see an increase in data breaches.

In the digital sphere, data is a valuable commodity that can unlock financial information and credentials. Attackers can ransom data, sell it to the highest bidder, use it to launch another attack, delete it to damage the organization, and manipulate it for the purpose of spreading disinformation. Big data analysis repositories are especially vulnerable, and deserve a well-rounded security approach that covers all types of network points and users.