Apache Tez 0.3 Released
The Apache Tez community has voted to release 0.3 of the software.
Apacheâ„¢ Tez is a replacement of MapReduce that provides a powerful framework for executing a complex topology of tasks. Tez 0.3.0 is an important release towards making the software ...
Avoiding Split Brainedness in HA Hadoop Clusters
The US Patent Office recently granted Zettaset a patent for the underlying technology in its Hadoop high availability that prevents a "split-brain" situation where multiple master nodes think they're in control of the Hadoop cluster. It's a feather in the ...
Big Workflow-The Future of Big Data Computing
How can organizations embrace — instead of brace for — the rapidly intensifying collision of public and private clouds, HPC environments and Big Data? The current go-to solution for many organizations is to run these technology assets in siloed, specialized ...
Why Apache Spark is a Crossover Hit for Data Scientists
Spark is a compelling multi-purpose platform for use cases that span investigative, as well as operational, analytics.
Data science is a broad church. I am a data scientist — or so I’ve been told — but what I do is actually ...
MongoDB NoSQL Database Interview Questions
MongoDB is the best free open source NoSQL document oriented database. If you are preparing for the technical interview on MongoDB NoSQL database, must prepare the following MongoDB NoSQL database interview questions. These MongoDB NoSQL database interview questions cover basic ...
Why NoSQL became MORE SQL
One of the key points I raised was about how many folks were just slapping on Big Data badges to the same old same old, another was that Map Reduce really doesn't work they way traditional IT estates behave which ...
Hadoop admin interview questions
Which operating system(s) are supported for production Hadoop deployment?
The main supported operating system is Linux. However, with some additional software Hadoop can be deployed on Windows.
What is the role of the namenode?
The namenode is the "brain" of the Hadoop cluster ...
Integrating Hadoop into Business Intelligence and Data Warehousing
Information from SAS and TDWI Research
The purpose of this report is to accelerate users’ understanding of the many new products and practices based on Hadoop technologies that have emerged in recent years. While Hadoop usage is a minority practice today, ...
Vital Hadoop tools for crunching Big Data
Today, the most popularly term in IT world is ‘Hadoop’. Within a short span of time, Hadoop has grown massively and has proved to be useful for a large collection of diverse projects. The Hadoop community is fast evolving and ...
Introduction to Apache Hive and Pig
Apache Hive is a framework that sits on top of Hadoop for doing ad-hoc queries on data in Hadoop. Hive supports HiveQL which is similar to SQL, but doesn't support the complete constructs of SQL.
Hive coverts the HiveQL query into ...






