Big Data Resources

Top Big Data, Data science Books you should read
Big Data Awesome list on Github.
Also read:
Top 12 Hadoop Technology Companies
The Biggest Challenge of Hadoop Analytics: It’s all about Query Performance
Relation Between Big Data Hadoop and Cloud Computing
What Is Hadoop, And How Does It Relate To Cloud?
A Guide to Become a Successful Hadoop Developer in 2023
How To Kick Start Your Career With Hadoop And Big Data Training?
13 Reasons Why System/Data Administrators should do Hadoop Training
Top 10 Tips for Hadoop Administration for Starters
Apache Big Data Stack, from Spark to Hadoop to NiFI, Kafka, … Everything is under the Apache umbrella.
What’s new?
- Apache NIFI, Hortonworks is backing this awesome GUI driven big data project.
- Apache Zeppelin, a very cool Big Data notebook.
- Apache Geode, again from my friends at Pivotal. Â This is an awesome in-memory data grid, commercially known as Gemfire.
- Apache Airavata, multitasking supertool.
- Apache DataFu, best named Apache Big Data Project in my mind.
- Apache Crunch, stays crispy in milk, map reduce and Spark. Â You had me at crunch.
- Apache Falcon, an interesting data management project.
- Apache Flink, the superfast squirrel that came out of nowhere and exploded.
- Apache Tajo, distributed relational datawarehouse on Hadoop. Â Used inCDAP, not sure how this isn’t huge yet.
- Apache Phoenix, fast relational layer over HBase.
- Apache HAWQ, fast MPP SQL on Hadoop, open sourced from Pivotal.
- Apache Giraph,high scalability graphing system, adds to the huge list of graph processing solutions out there.
- Apache Hama, is a BSP framework for Big Data Analytics. Â This one is still being baked, but could be insanely useful. Â I am waiting and watching this one.
- Apache Helix, clustering and partioning solution that works withZookeeper.
- Apache MetaModel, a common interface to a ton of different data sources including HBase, RDBMS and NOSQL stores.
- Apache ORC, yet another file format.  Also,Apache Parquet and Apache Avro.
- Apache MADlib, Machine Learning in SQL on Postgresql, Greenplum and HAWQ.
- Apache Gora, in data memory model.
- Apache Twill, layer over Yarn.
- Apache Accumulo, key-value store on HDFS with cell level security.
- Apache Drill, SQL ontop of NoSQL, Hadoop and RDBMS.
- Apache Chuka, analysis and monitoring for Hadoop.
- Apache Ambari, the slick install, configuration and administration tool for Hadoop.
- Apache Slider, not a small hamburger but a framework on top of Yarn for better clustering.
- Apache Storm, distributed real-time computation framework that is widely used with Hadoop.
- Apache Pig, Â a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs
- RHadoop Installation Guide for Red Hat Enterprise Linux
Graph and Network Analysis
- Presentations and talks on Apache Giraph, an iterative graph processing system built for high scalability
- Serious network analysis using Hadoop and Neo4j
List of resources that will help you learn more about these tools, and more.
- 35 Free Data Sources
- 19 Free Public Data Sets for Your First Data Science Project
- A Case for Database Development
- ​Data Analyst vs Data Scientist – What are the Differences?
- ​The Importance of Data Science Careers
- LucidWorks/banana
- http://systemml.apache.org/
- Apache Geode (incubating) | Home
- Stream Processing with Apache Flink | Coding
- http://www.slideshare.net/SparkSummit/intro-to-spark-development
- http://www.slideshare.net/DataFactZ/introduction-to-spark-datafactz
- http://www.slideshare.net/SparkSummit/using-spark-with-tachyon-by-gene-pang
- http://www.slideshare.net/pwendell/tuning-and-debugging-in-apache-spark
- http://www.slideshare.net/cloudera/spark-devwebinarslides-final?from_m_app=android
- Data Exploration Using Spark
- Apache Eagle – Secure Hadoop in Real Time
- The Hadoop Ecosystem Table
- Zeppelin
- Stock Inference by Pivotal-Open-Source-Hub
- Data Architectures for Robust Decision Making
- A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
- Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of S…
- Building a REST Job Server for interactive Spark as a service by Roma…
- Slides | Databricks
- Spark Summit EU 2015: Lessons from 300 production users
- Spark – The Ultimate Scala Collections by Martin Odersky
- Spark Usage · mongodb/mongo-hadoop Wiki
- Scala School – Java Scala
- Spark SQL and DataFrames – Spark 1.5.1 Documentation
- Configuration – Spark 1.5.1 Documentation
- spark-redis
- Quick Start – Spark 1.5.1 Documentation
- SystemML – developerWorks Open
- www.cs.berkeley.edu/~haoyuan/talks/Tachyon_2014-10-16-Strata.pdf
- Getting Started with Apache Spark
- https://github.com/palantir/atlasdb/wiki






