When to use Pig Latin versus Hive SQL?
Once your big data is loaded into Hadoop, whatâs the best way to use that data? Youâll need some way to filter and aggregate the data, and then apply the results for something useful. Collecting terabytes and petabytes of web ...
HBase Architecture
HBase â The Basics:
HBase is an open-source, NoSQL, distributed, non-relational, versioned, multi-dimensional, column-oriented store which has been modeled after Google BigTable that runs on top of HDFS. ââNoSQLâ is a broad term meaning that the database isnât an RDBMS which ...
Introduction to Impala
Impala in terms of Hadoop has got the significance because of its,
Scalability
Flexibility
Efficiency
Whatâs Impala?
Impala isâŚ
Interactive SQLâImpala is typically 5 to 65 times faster than Hive as it minimized the response time to just seconds, not minutes.
Nearly ANSI-92 standard and compatible with ...
Free Cloudera Impala Book
Get free Cloudera Impala, in PDF format, for free from the Cloudera website, in association with the Strata Conference and Hadoop World. See the below link for the book info from the publisher as well as the link to download ...
Hadoop FS Shell Commands
Hadoop file system (fs) shell commands are used to perform various file operations like copying file, changing permissions, viewing the contents of the file, changing ownership of files, creating directories etc.
The syntax of fs shell command is
hadoop fs <args>
All the ...
Hive Interview Questions
What is Hive?
Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems.
Hive was originally developed at Facebook. Itâs now a Hadoop subproject with ...
MongoDB Interview Questions
What were you trying to solve when you created MongoDB?
We were and are trying to build the database that we always wanted as developers. For pure reporting, SQL and relational is nice, but when building data always wanted something different: ...
5 Steps To Master Big Data and Predictive Analytics
As recently as the past two years, one of the seminal issues regarding Big Data was storage, especially with respect to the exponential growth and size of unstructured data that did not fit into databases (e.g., video feeds, PowerPoint presentations). ...
How to Make Your In-memory NoSQL Datastores Enterprise-Ready
In-memory NoSQL datastores such as open source Redis and Memcached are becoming the de-facto standard for every web/mobile application that cares about its userâs experience. Still, large enterprises have struggled to adopt these databases in recent years due to challenges ...
Splunk-Making Big Money With Big Data
Just look at Splunk (SPLK), which blew past expectations in its latest earnings report. So far in todayâs trading, the shares are up 13% to put the company at an all-time high around $55. The stock has gained almost 90% ...






