Top 10 worst big data practices
1. Choosing MongoDB as your big data platform. Why am I picking on MongoDB? I'm not, but for whatever reason, the NoSQL database most abused at this point is MongoDB. While MongoDB has an aggregation framework that tastes like MapReduce ...
Data Science –What’s the big deal about it?
Thomas Davenport, an American academic and publisher for Harvard Business Review, once said that Data Scientist is “the Sexiest Job of the 21st Century”. But why is there such a big hype and mythos about Data Scientists and Data Science?
The ...
Why Extended Attributes are Coming to HDFS
Extended attributes in HDFS will facilitate at-rest encryption for Project Rhino, but they have many other uses, too.
Many mainstream Linux filesystems implement extended attributes, which let you associate metadata with a file or directory beyond common “fixed” attributes like filesize, ...
How to Create a Database in MongoDB
MongoDB is one of the “NoSQL” types of database solutions used to store and query big data. Old SQL developers might find Mongo a bit counterintuitive. With normal, relational databases, you create a database, then tables and then insert your ...
VMware Updates Big Data Extensions with Hadoop 2 Support
VMware Inc. updated its Big Data Extensions (BDE) for its vSphere virtualization platform, including support for Hadoop 2.
BDE's set of integrated management tools -- built into vSphere -- help organizations deploy, run and manage Hadoop. With BDE, vSphere users can ...
Facebook HydraBase adds reliability to Hadoop’s HBase
Facebook's becoming almost as notable for its adventures with open source projects as it is for its social network of more than 1 billion users. The company's latest experiment: revising one of Hadoop's key components to make it more reliable ...
Altiscale Hadoop-as-a-Service Delivers Apache Hive 0.13
Altiscale, Inc., a leading innovator in Hadoop-as-a-Service (HaaS) solutions, has announced the availability of Apache Hive™ 0.13 on its HaaS platform, just weeks since its general software release to the industry. For data scientists and businesses that rely on insights ...
Can Super-Fast Apache Spark Light Up Hadoop?
it the Hadoop Swiss Army knife of cluster computing frameworks. The Apache Software Foundation just rolled out Apache Spark v1.0, which it's calling a "super-fast, open-source, large-scale Relevant Products/Services data Relevant Products/Services processing and advanced analytics Relevant Products/Services engine."
That's a ...
Free eBook-Learn About Microsoft’s Hadoop Implementation
Microsoft Press has just released a new eBook in its continuing line of free offerings. Introducing Microsoft Azure HDInsight presents 5 chapters and 130 pages of content covering Microsoft's foray into Big Data utilizing Apache Hadoop. HDInsight is built on ...
7 Golden Rules for Big Data Projects
It seems every organisation has either jumped or is seriously contemplating jumping onto the Big Data bandwagon. In an industry where the hype is often followed by the despair, I feel somewhat ashamed that the IT Industry that I work ...






