Cassandra 2.0-The next generation of big data

In 2008, Facebook gave big data users a gift. The social network released Cassandra, its NoSQL, big data distributed data store to open source. Today, with the release of Cassandra 2.0, the gift is more valuable than ever.

Since 2008, under the direction of the Apache Software Foundation (ASF), Cassandra has grown more powerful and faster. Today, according to Apache, “Cassandra powers massive data sets quickly and reliably without compromising performance, whether running in the cloud or partially on-premise in a hybrid data store. Its fully distributed architecture provides unparalleled fault tolerance to ensure applications will not go off-line, and its linear scalability allows them to reach massive sizes while successfully handling thousands of requests per second.”

“In five years, Apache Cassandra has grown into one of the most widely used NoSQL databases in the world and serves as the backbone for some of today’s most popular applications,” said Jonathan Ellis, Apache Cassandra’s VP in a statement.

With high-end users such as eBay, Reddit and Twitter, Cassandra can clearly walk the big data walk as well as talk the talk. Cassandra’s biggest users can’t afford poor data performance.

This newest version includes multiple new features. Perhaps the biggest of them, according to Elli, is that “Cassandra 2.0 makes it easier than ever for developers to migrate from relational databases and become productive quickly.”

More specifically, the new features and improvements include:

Lightweight transactions allow ensuring operation linearizability similar to the serializable isolation level offered by relational databases which prevents conflicts during concurrent requests

Triggers which enable pushing performance-critical code close to the data it deals with, and simplify integration with event-driven frameworks like Storm

CQL (Cassandra Query Language) enhancements such as cursors and improved index support

Improved compaction, keeping read performance from deteriorating under heavy write load

Eager retries to avoid query timeouts by sending redundant requests to other replicas if too much time elapses on the original request

Custom Thrift server implementation based on LMAX Disruptor, a high-performance inter-thread messaging library that achieves lower message processing latencies and better throughput with flexible buffer allocation strategies

The new Cassandra is available for download Wednesday. Like all Apache programs it’s free and is licensed under the Apache 2.0 license.

By Steven J. Vaughan-Nichols Source