Data Science –What’s the big deal about it?
Thomas Davenport, an American academic and publisher for Harvard Business Review, once said that Data Scientist is “the Sexiest Job of the 21st Century”. But why is there such a big hype and mythos about Data Scientists and Data Science?
The ...
Why Extended Attributes are Coming to HDFS
Extended attributes in HDFS will facilitate at-rest encryption for Project Rhino, but they have many other uses, too.
Many mainstream Linux filesystems implement extended attributes, which let you associate metadata with a file or directory beyond common “fixed” attributes like filesize, ...
How to Create a Database in MongoDB
MongoDB is one of the “NoSQL” types of database solutions used to store and query big data. Old SQL developers might find Mongo a bit counterintuitive. With normal, relational databases, you create a database, then tables and then insert your ...
VMware Updates Big Data Extensions with Hadoop 2 Support
VMware Inc. updated its Big Data Extensions (BDE) for its vSphere virtualization platform, including support for Hadoop 2.
BDE's set of integrated management tools -- built into vSphere -- help organizations deploy, run and manage Hadoop. With BDE, vSphere users can ...
16 NoSQL, NewSQL Databases To Watch
Why are businesses increasingly choosing alternatives to the leading relational database management systems when grappling with new data types and extreme scale?
We put that question to Bryson Koehler, CIO of The Weather Company, which is using a NoSQL database, Riak, ...
Altiscale Hadoop-as-a-Service Delivers Apache Hive 0.13
Altiscale, Inc., a leading innovator in Hadoop-as-a-Service (HaaS) solutions, has announced the availability of Apache Hive™ 0.13 on its HaaS platform, just weeks since its general software release to the industry. For data scientists and businesses that rely on insights ...
Can Super-Fast Apache Spark Light Up Hadoop?
it the Hadoop Swiss Army knife of cluster computing frameworks. The Apache Software Foundation just rolled out Apache Spark v1.0, which it's calling a "super-fast, open-source, large-scale Relevant Products/Services data Relevant Products/Services processing and advanced analytics Relevant Products/Services engine."
That's a ...
Enjoy machine learning with Mahout on Hadoop
"Mahout" is a Hindi term for a person who rides an elephant. The elephant, in this case, is Hadoop -- and Mahout is one of the many projects that can sit on top of Hadoop, although you do not always ...
7 Golden Rules for Big Data Projects
It seems every organisation has either jumped or is seriously contemplating jumping onto the Big Data bandwagon. In an industry where the hype is often followed by the despair, I feel somewhat ashamed that the IT Industry that I work ...
Updated PostgreSQL targets NoSQL market
Embracing the widely used JSON data-exchange format, the new version of the PostgreSQL open-source database takes aim at the growing NoSQL market of nonrelational data stores, notably the popular MongoDB.
The first beta version of PostgreSQL 9.4, released Thursday, includes a ...






