When to use Pig Latin versus Hive SQL?
Once your big data is loaded into Hadoop, whatâs the best way to use that data? Youâll need some way to filter and aggregate the data, and then apply the results for something useful. Collecting terabytes and petabytes of web ...
HBase Architecture
HBase â The Basics:
HBase is an open-source, NoSQL, distributed, non-relational, versioned, multi-dimensional, column-oriented store which has been modeled after Google BigTable that runs on top of HDFS. ââNoSQLâ is a broad term meaning that the database isnât an RDBMS which ...
Use Cases Of MongoDB
MongoDB is a relatively new contender in the data storage circle compared to giant like Oracle and IBM DB2, but it has gained huge popularity with their distributed key value store, MapReduce calculation capability and document oriented NoSQL features.
MongoDB has ...
Introduction to Impala
Impala in terms of Hadoop has got the significance because of its,
Scalability
Flexibility
Efficiency
Whatâs Impala?
Impala isâŚ
Interactive SQLâImpala is typically 5 to 65 times faster than Hive as it minimized the response time to just seconds, not minutes.
Nearly ANSI-92 standard and compatible with ...
Free Cloudera Impala Book
Get free Cloudera Impala, in PDF format, for free from the Cloudera website, in association with the Strata Conference and Hadoop World. See the below link for the book info from the publisher as well as the link to download ...
Data Export from Hadoop MapReduce to Database
Hadoop has become a huge part of Data Warehouse in most companies. It is used for a variety of use-cases: Search and Web Indexing, Machine learning, Analytics and Reporting, and so on. Most organizations are building Hadoop clusters in addition ...
Pig Interview Questions
Can you give us some examples how Hadoop is used in real time environment?
Let us assume that we have an exam consisting of 10 Multiple-choice questions and 20 students appear for that exam. Every student will attempt each question. For ...
Hadoop Cluster Commissioning and Decommissioning Nodes
To add new nodes to the cluster:
1. Add the network addresses of the new nodes to the include file.
hdfs-site.xml
<property>
<name>dfs.hosts</name>
<value>/<hadoop-home>/conf/includes</value>
<final>true</final>
</property>
mapred-site.xml
<property>
<name>mapred.hosts</name>
<value>/<hadoop-home>/conf/includes</value>
<final>true</final>
</property>
Datanodes that are permitted to connect to the namenode are specified in a
file whose name is specified by the dfs.hosts property.
Includes file ...
BigData TechCon-Learn HOW TO Master Big Data, Mar 31-Apr 2, Boston
Big Data TechCon, March 31-April 2, Boston, is the âhow-toâ big data event. Use code BIGDATA for $200 discount. www.bigdatatechcon.com
Plan now to attend Big Data TechCon, March 31-April 2 in Boston, to learn HOW-TO accommodate the terabytes and petabytes of data ...
MongoDB Interview Questions
What were you trying to solve when you created MongoDB?
We were and are trying to build the database that we always wanted as developers. For pure reporting, SQL and relational is nice, but when building data always wanted something different: ...






