Apache Hadoop 2.3.0 was released

HDFS caching lets users explicitly cache certain files or directories in HDFS. DataNodes will then cache the corresponding blocks in off-heap memory through the use of mmap and mlock. Once cached, Hadoop applications can query the locations of cached blocks and place their tasks for memory-locality. Finally, when memory-local, applications can use the new zero-copy read API to read cached data with no additional overhead. Preliminary benchmarks show that optimized applications can achieve read throughput on the order of gigabytes per second.
Another big feature, according to Arun Murthy at Hortonworks, is support for heterogeneous storage hierarchy in HDFS. According to Murthy:
With support for heterogeneous storage classes in HDFS, we now can take advantage of different storage types on the same Hadoop clusters. Hence, we can now make better cost/benefit tradeoffs with different storage media such as commodity disks, enterprise-grade disks, SSDs, Memory etc.
So, be sure to take a look. Hortonworks’ announcement post also includes a look ahead toward 2.4.0, in case 2.3.0 just isn’t enough.






