Stinger and Tez: A primer
What is Stinger?
The Stinger initiative aims to redesign Hive to make it what people want today: Hive is currently used for large batch jobs and works great in that sense; but people also want interactive queries, and Hive is too ...
Top 10 Biggest Big Data Companies by Revenue
Everything about the big data market is big, especially the rate at which the market is growing. According to the Wikibon 2015 Big Data Market Shares report, big data market revenues grew by 22 percent last year alone. Here’s a ...
Big Data 101: A Beginner’s Guide To Big Data terminology
Data science can be confusing enough without all of the complicated lingo and jargon. For many, the terms NoSQL, DaaS and Neural Networking instill nothing more than the hesitant thought, “this sounds data-related.” It can be difficult to tell a ...
5 Enterprise Alternatives to Hadoop
Hadoop's progression from a large scale, batch oriented analytics tool to an ecosystem full of vendors, applications, tools and services has coincided with the rise of the big data market.
While Hadoop has become almost synonymous with the market in which ...
5 Steps for Securing Your Data In Hadoop
Data security remains a top concern for data professionals. To help organizations put up a best defense, Reiner Kappenberger, senior executive focused on big data and Hadoop at HPE Seecurity-Data Security offers five steps on how to best secure data ...
30 Coolest Big Data Business Analytics Vendors
Working with big data remains one of the biggest IT challenges that businesses, government agencies and other organizations face today. It's also one of the biggest opportunities for IT vendors and for solution and strategic service providers. The big data ...
How is fault tolerance handled in Spark streaming?
Spark Streaming components
Data model
All data is modeled as RDDs, built by design with lineage of deterministic operations, i.e. any re-computation always leads to the same result. Essentially the same process (however with a different mechanism) as in Hadoop's fault-tolerance for ...
How to perform capacity planning for a Hadoop cluster
The number of machines, and specs of the machines, depends on a few factors: the volume of data (obviously), the data retention policy (how much can you afford to keep before throwing away), the type of workload you have (data ...
3 Steps to Better Hadoop Management
It makes sense to get excited about the possibilities afforded by Apache™ Hadoop® YARN-based applications such as Spark, Storm, Presto and others to provide substantial business value. However, the actual tasks of managing and maintaining the environment should not get ...
8 Skills You Need to Be a Data Scientist
Interested in landing a job as a data scientist? These are the core set of 8 data science competencies you should develop:
1. Basic Tools: No matter what type of company you’re interviewing for, you’re likely going to be expected to ...






