big data insightsThis is a list of 25 Java Machine learning tools & libraries.

  1. Weka has a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization.
  2. Massive Online Analysis (MOA) is a popular open source framework for data stream mining, with a very active growing community. It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation. Related to the WEKA project, MOA is also written in Java, while scaling to more demanding problems.
  3. The MEKA project provides an open source implementation of methods for multi-label learning and evaluation. In multi-label classification, we want to predict multiple output variables for each input instance. This different from the ‘standard’ case which involves only a single target variable. MEKA is based on the WEKA Machine Learning Toolkit.
  4. The Advanced Data mining And Machine learning System (ADAMS) is a novel, flexible workflow engine aimed at quickly building and maintaining real-world, complex knowledge workflows, released under GPLv3.
  5. Environment for Developing KDD-Applications Supported by Index-Structure (ELKI) is an open source (AGPLv3) data mining software written in Java. The focus of ELKI is research in algorithms, with an emphasis on unsupervised methods in cluster analysis and outlier detection.
  6. Mallet is a java machine learning toolkit for  textual document. Mallet supports classification algorithms like maximum entropy, naive bayes and decision tree for classification.
  7. Encog is an advanced machine learning framework which supports Support Vector Machines,Artificial Neural Networks, Genetic Programming, Bayesian Networks, Hidden Markov Models, Genetic Programming and Genetic Algorithms are supported.
  8. The Datumbox Machine Learning Framework is an open-source framework written in Java which allows the rapid development Machine Learning and Statistical applications. The main focus of the framework is to include a large number of machine learning algorithms & statistical tests and being able to handle medium-large sized datasets.
  9. Deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. It is designed to be used in business environments, rather than as a research tool.
  10. Mahout is a machine learning framework with built in algorithms. Mahout-Samsara helps people create their own math while providing some off-the-shelf algorithm implementations.
  11. Rapid Miner was developed at Technical University of Dortmund, Germany. It provides a GUI and a Java API for developing your own applications. It provides data handling, visualization and modeling with machine learning algorithms.
  12. Apache SAMOA is a machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms and enables development of new ML algorithms without directly dealing with the complexity of underlying distributed stream processing engines (DSPEe, such as Apache Storm, Apache S4, and Apache Samza). Its users can develop distributed streaming ML algorithms once and execute them on multiple DSPEs.
  13. Neuroph simplifies the development of neural networks by providing Java neural network library and GUI tool that supports creating, training and saving neural networks.
  14. Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine learning. It is a framework for building applications, but also includes packaged, end-to-end applications for collaborative filtering, classification, regression and clustering.
  15. Stanford Classifier is a machine learning tool that will take data items and place them into one  of k classes. A probabilistic classifier, like this one, can also give a  probability distribution over the class assignment for a data item. This  software is a Java implementation of a maximum entropy classifier.
  16. io is a Retina API fast, precise and brain like algorithm that enables NLP.
  17. JSAT is a library for quickly getting started with Machine Learning problems. It is developed in my free time, and made available for use under the GPL 3. Part of the library is for self education, as such – all code is self contained. JSAT has no external dependencies, and is pure Java.
  18. N-Dimensional Arrays for Java (ND4J) is a scientific computing libraries for the JVM. They are meant to be used in production environments, which means routines are designed to run fast with minimum RAM requirements.
  19. The Java Machine Learning Library is a set of reference implementations of machine learning algorithms. These algorithms are well documented, both in the source code as on the documentation site.It is mostly written in Java.
  20. Java-ML is a Java API with a collection of machine learning algorithms implemented in Java. It only provides a standard interface for algorithms.
  21. MLlib (Spark) is Apache Spark’s scalable machine learning library. Although Java, the library and the platform support Java, Scala and Python bindings. The library is new and the list of algorithms is long.
  22. H2O  is a machine learning API for smarter applications. It scales statistics, machine learning, and math over big data. H2O is extensible and individual can build blocks using simple math legos in the core.
  23. WalnutiQ is a object oriented model of partial human brain with 1 theorized common learning algorithm (work in progress towards a simplistic model of a strong emotional A.I.)
  24. RankLib is a library of learning to rank algorithms. Currently eight popular algorithms have been implemented.
  25. (Hierarchical Temporal Memory implementation in Java) is a Java port of the Numenta Platform for Intelligent Computing. Source