Vital Hadoop tools for crunching Big Data
Today, the most popularly term in IT world is ‘Hadoop’. Within a short span of time, Hadoop has grown massively and has proved to be useful for a large collection of diverse projects. The Hadoop community is fast evolving and ...
Bloom Filters in HBase and Chrome
Bloom Filters allows to efficiently check if a particular element/record is there in the set/table or not. It has very minimal impact on the insert operations. The only caveat is that it might return a false positive, Bloom filter might ...
Introduction to Apache Hive and Pig
Apache Hive is a framework that sits on top of Hadoop for doing ad-hoc queries on data in Hadoop. Hive supports HiveQL which is similar to SQL, but doesn't support the complete constructs of SQL.
Hive coverts the HiveQL query into ...
How to Get the Best Out of Big Data Solutions
Data has become the new raw material for businesses. And that’s how it should be in order to meet the dynamic needs of the current age. Thus, access to considerably large amounts of data and information always helps an organisation ...
When to use Pig Latin versus Hive SQL?
Once your big data is loaded into Hadoop, what’s the best way to use that data? You’ll need some way to filter and aggregate the data, and then apply the results for something useful. Collecting terabytes and petabytes of web ...
Introduction to Impala
Impala in terms of Hadoop has got the significance because of its,
Scalability
Flexibility
Efficiency
What’s Impala?
Impala is…
Interactive SQL–Impala is typically 5 to 65 times faster than Hive as it minimized the response time to just seconds, not minutes.
Nearly ANSI-92 standard and compatible with ...
Free Cloudera Impala Book
Get free Cloudera Impala, in PDF format, for free from the Cloudera website, in association with the Strata Conference and Hadoop World. See the below link for the book info from the publisher as well as the link to download ...
Hive Interview Questions
What is Hive?
Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems.
Hive was originally developed at Facebook. It’s now a Hadoop subproject with ...
Managing Multiple Resources in Hadoop 2 with YARN
An overview of Cloudera’s contributions to YARN that help support management of multiple resources, from multi-resource scheduling to node-level enforcement
As Apache Hadoop become ubiquitous, it is becoming more common for users to run diverse sets of workloads on Hadoop, and ...
Big Data for Telco Begins to Unleash Systems of Engagement
In our recently completed Q3 2013 Global State Of Enterprise Architecture Online Survey, big data for real-time analytics moved from the No. 3 most revolutionary technology to the No. 2 position, according to the 116 enterprise architects who participated. This ...






