 
How is fault tolerance handled in Spark streaming?
Spark Streaming components
Data model
All data is modeled as RDDs, built by design with lineage of deterministic operations, i.e. any re-computation always leads to the same result. Essentially the same process (however with a different mechanism) as in Hadoop's fault-tolerance for ...
 
Tools of a Data Scientist
Unlike your typical programmer, who may use a standardised set of tools, data scientists tend to use a wide array of ever changing tools. This is because the data science landscape is evolving rapidly, with many new tools still far ...
 
5 Reasons to Choose Parquet for Spark SQL
It is well-known that columnar storage saves both time and space when it comes to big data processing. Parquet, for example, is shown to boost Spark SQL performance by 10X on average compared to using text, thanks to low-level reader ...
 
10 coolest big data products of 2015
The big data technology market remains one of the fastest growing segments of the IT industry. In November, market research firm IDC said the market for big data-related infrastructure, software and services will grow at a compound annual growth rate ...
 
5 most in-demand skills for Big Data jobs
As the Big Data market grows, so does the demand for skilled workers - are your skills in-demand?
The Big Data market is growing as businesses realise the important of making data driven decisions. The market is predicted to be worth ...
 
Could Microsoft Cosmos Challenge Hadoop?
A new Microsoft data crunching framework is set to launch on the company’s Azure cloud, according to a report from Redmond pundit Mary Jo Foley on ZDNet. Dubbed Cosmos, it’s a potential competitor to both Hadoop and eventually Google’s homegrown ...
 
Actian Brings Graph Analysis To Big Data
Actian announced that it's adding a graph-analysis engine to its big data portfolio, which already includes a SQL-on-Hadoop offering as well as several relational databases and data-integration software.
Graph analysis is applied to uncover networked relationships among people, places, things, and ...
 
Hadoop-5 Undeniable Truths
Everyone knows sensationalist headlines can be distracting or inaccurate. The real problem with such overblown headlines is this: Superficial debates are slowing down the true potential of Hadoop, big data, and the evolution of traditional databases.
At Qubole we often receive calls from ...
 
10 ways to query Hadoop with SQL
SQL: old and busted. Hadoop: new hotness. That's the conventional wisdom, but the sheer number of projects putting a convenient SQL front end on Hadoop data stores shows there's a real need for products running SQL queries against data that ...
 
6 sparkling features of Apache Spark!
What is Apache Spark? Why there is a serious buzz going-on about this? If you are into BigData analytics business then, should you really care about Spark? Hope this post will help to answer some of these questions which might have ...


















