Hadoop YARN adds more application threads for big data users
Even Hadoop's most enthusiastic proponents might admit that its marriage to MapReduce has limited what the open source technology can do. But with the advent of Hadoop 2 and its key component, the Hadoop YARN resource manager, the distributed processing ...
Why Apache Spark is a Crossover Hit for Data Scientists
Spark is a compelling multi-purpose platform for use cases that span investigative, as well as operational, analytics.
Data science is a broad church. I am a data scientist — or so I’ve been told — but what I do is actually ...
Data transfer between MySql and Cassandra using Sqoop
Sqoop is a tool designed to transfer data between Hadoop and relational databases. You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform ...
Top 12 funded Big Data Startup companies
Big data companies are attracting big investments from venture capitalists. But which startups are garnering the most funding? Venture capitalists made more big data investments than ever before in 2012, and a few more deals have already closed in 2013. ...
Apache Hadoop 2.3.0 was released
Hadoop-2.3.0 is the first release for the year 2014, and brings a number of enhancements to the core platform, in particular to HDFS. There are a lot of bug fixes and small changes in this one - you can read ...
SQL is what’s next for Hadoop
Today all the companies are trying to let users run SQL queries from inside Hadoop as it is open-source software framework. Companies are using Hive and HiveQL languages in Hadoop implementation but Hive is mainly depends on MapReduce. Business intelligence ...
Why Big Data Applications Are Already a Reality
The advent of the Internet and World Wide Web changed many things in the technology landscape. Information began to become widely accessible. Individuals began to connect with one another online. Merchants began to market and sell their products digitally, sometimes ...
How MapR’s M7 Platform Improves NoSQL and Hadoop
The M7 Edition. Sounds like a high performance sports car, doesn’t it? In reality, M7 is MapR’s enterprise-grade platform that provides its own unique brand of high-performance, dependability and ease of use to both NoSQL and Hadoop applications. M7 removes ...
How YARN Opens Doors to Easier Programming Tools for Hadoop 2.0 Users
The emergence of YARN for the Hadoop 2.0 platform has opened the door to new tools and applications that promise to allow more companies to reap the benefits of big data in ways never before possible with outcomes possibly never ...
Exploring The Hadoop Network Topology
Hadoop is designed to run on large clusters of commodity servers – in many cases spanning many physical racks of servers. A physical rack is in many cases a single point of failure (for example, having typically a single switch ...






