Last Updated on December 10, 2020
Are you a Java programmer and looking to get started or practice machine learning?
Writing programs that make use of machine learning is the best way to learn machine learning. You can write the algorithms yourself from scratch, but you can make a lot more progress if you leverage an existing open source library.
In this post you will discover the major platforms and open source machine learning libraries you can use in Java.
This section describes Java-based environments or workbenches that can be used for machine learning. They are called environments because they provided graphical user interfaces for performing machine learning tasks, but also provided Java APIs for developing your own applications.
Waikato Environment for Knowledge Analysis (Weka) is a machine learning platform developed by the University of Waikato, New Zealand. It is written in Java and provides a graphical user interface, command line interface and Java API. It is perhaps the most popular Java machine learning library and a great place to start or practice machine learning.
The Konstanz Information Miner (KIME) is an analytics and reporting platform developed by Konstanz University, Germany. It was developed with a focus on pharmaceutical research, but has expanded into general business intelligence. It provides a graphical user interface (based on Eclipse) and a Java API.
RapidMiner used to be called Yet Another Learning Environment (YALE) and was developed at Technical University of Dortmund, Germany. It provides a GUI and a Java API for developing your own applications. It provides data handling, visualization and modeling with machine learning algorithms.
The Environment for DeveLoping KDD-Applications Supported by Index-Structures (ELKI) is a data mining workbench developed in Java by the Ludwig Maximilian University of Munich, Germany. It has a focus on working with data in relational database for tasks such as outlier detection and classification (distance function based methods). It provides a mini GUI, command line interface and Java API.
Practically every project listed on this page is/has a library with a Java API, those projects listed in this section only provide a Java API. They are machine learning libraries in the narrow sense.
The Java Machine Learning Library (Java-ML) provides a collection of machine learning algorithms implemented in Java. It provides a standard interface for each algorithm, no UIs and references to the relevant scientific literature for further reading. It includes methods for data manipulation, clustering, feature selection and classification. Note that at the time of writing, the last release was in 2012.
The Java Statistical Analysis Tool (JSAT) provides pure Java implementations of standard machine learning algorithms for modest sized problems. The author comments that he developed the library partly as a self-education exercise and partly to get things done. Nevertheless the list of algorithms is impressive. It includes classification, regression, ensemble, clustering and feature selection methods.
This section lists Java projects intended for use with Big Data, such as on clusters of machines.
Apache Mahout provides implementations of machine learning algorithms for use on the Apache Hadoop platform (distributed map-reduce). The project provides a focus on clustering and classification algorithms and a popular application driving implementation is its use in collaborative filtering for recommender systems. Reference implementations of algorithms that run on a single node are also included.
Apache Machine Learning Library provides implementations of machine learning algorithms for use on the Apache Spark platform (HDFS, but not map-reduce). Although Java, the library and the platform support Java, Scala and Python bindings. The library is new and the list of algorithms is short, but growing quickly.
Massive Online Analysis (MOA) is an open source platform designed for data stream mining by University of Waikato, New Zealand. Like Weka (developed at the same place), it provides a GUI, command line interface and Java API. It provides a long list of algorithms wit ha focus on classification and support for outlier detection and addressing concept drift. MOA uses the Advanced Data mining And Machine learning System (ADAMS) for managing workflows also developed at the same place.
Scalable Advanced Massive Online Analysis (SAMOA) is a distributed streaming machine learning framework developed by Yahoo!. It is designed to run on Apache Storm and Apache S4. The system can leverage the algorithms provided by the MOA project for tasks like classification.
Natural Language Processing
This section is dedicated to Java libraries and projects for addressing problems from the subfield of machine learning called Natural Language Processing (NLP).
NLP is not my area, so I’ll just point to the key libraries.
- OpenNLP: Apache OpenNLP is a toolkit for processing natural language text. It provides methods for NLP tasks such as tokenization, segmentation, and entity extraction.
- LingPipe: LingPipe is a toolkit for computational linguistics and includes methods for topic classification, entity extraction, clustering, and sentiment analysis.
- GATE: The General Architecture for Text Engineering (GATE) is an open source library for text processing. It provides an array of sub-projects targeted at different use cases.
- MALLET: Machine Learning for Language Toolkit (MALLET) is a Java toolkit fro statistical natural language processing, document classification, clustering, topic modeling and information extraction.
This section lists those libraries for the subfield of machine learning called Computer Vision (CV).
Again, CV is not my area, so I’ll just point to the key libraries.
- BoofCV: BoofCV is an open source library for computer vision and robotics applications. It supports features such as image processing, features, geometric vision, calibration, recognition and image data IO.
Neural Nets are hot again with the development of deep learning methods and faster hardware. This section lists key Java libraries for working with neural networks and deep learning.
- Encog: Encog is a machine learning library that provides algorithms such as SVM, classical neural networks, genetic programming, bayesian networks, HMM and genetic algorithms.
- Deeplearning4j: Deeplearning4j is claimed to be a commercial-grade deep learning library written in Java. It is described as being compatible with Hadoop and provides algorithms including Restricted Boltzmann machines, deep-belief networks and Stacked Denoising Autoencoders.
In this round-up post we have touched on the big name options when selecting a library or platform for machine learning when working in Java.
These are the players and the popular projects, but by no means is this list complete. For example, take a look at this page on MLOSS.org that lists (at the time of writing) 71 Java-based open source machine learning projects. That’s a lot and I’m sure there are more on GitHub and SourceForge.
They key is to think hard about your own project and it’s requirements. Figure out what you need from a library or platform and then pick and learn a project that best fits your needs.
This is very interesting. Thanks for sharing. Please have you worked with MOA before? I have some questions to ask on MOA.
No sorry, I have not worked with MOA before.
How can I use Genetic algorithm in Weka fro classification?
You may have to implement it yourself?
Thank you very much very educational website
You’re very welcome.
Just came across SMILE (http://haifengl.github.io/smile/).
Seems quite complete already. Does anybody have experiences with it?
Makes me think about ditching WEKA from Java Code, which can be a hassle at times.
Thanks for listing the Java based ML libraries/tools. But it looks to me that you have missed http://www.h2o.ai/ . Its a java based ML and deep learning APIs and one call/embed it in a JVM based application, please refer – https://www.linkedin.com/pulse/calling-h2o-from-jvm-applications-raymond-peck
Generally, I would group H2o as MLaaS more than a Java library.
Great Info. But do you know under MLlib (Spark) section, “Machine” spelling is wrong. Sorry to spam, but couldn’t resist to notify.
Jason can you add the Smile library to the list?
I’ve not heard of Smile, thanks for the link.
We are proud to announce that we moved the website for Java-ML from my personal hosting to Sourceforge .
Have you done any kind of programming with the NLP API of the Rapidminer? Or trying to improve any text processing module of the Rapidminer?
Thanks in advance!
Sorry Yonas, I have not.
Hey Jason, i have a java ML project to make for stock market prediction using HMM Model. Which library should i use? How should I proceed? Also, i am totally new to machine learning. Please give me an overview and reply asap!!
Sorry, I am not up to speed on HMM libraries for Java. I cannot give you expert advice.
For getting started in machine learning, I would recommend starting right here:
Hi Jason you are website is so good.Thanks for you post.
I hope you have good knowledge on machine learning project.Then why don’t you start your own product based company.
I have more fun teaching and helping developers get started, thanks Nagaraj.
great post 🙂
First of all , I benefit alot from your posts so I would like to leave a thanks for that.
Second of all, I want to know if there are specific books to learn machine learning in java. I read your post the other day but the books were all in Python and R. Since I am alredy a java programmer, it would be alot easier to learn the algorithms and build applications in java at the first place instead of learning the algorithms in python and then switching back.
Thanks in advance.
I don’t have any material to learn ML in Java at this stage Eyad. Hopefully in the future.
Thank you very much for your sharing. Cheers
You’re welcome Xiaogang.
After 2.4 OpenCV now fully supports Mac and Java and it is much better than BoofCV.
Thanks for the note Adeel.
It’s great tutorial.What is future in this field.How to integrate with EDA Tools like code auto-generation via spec.
Great idea to combine EDA tools with code generation.
Thank you for much useful article. I want to learn Deep learning/ machine learning where a system should detect an object from video and take certain action. what do you suggest? which one should i learn. i am really new and curious . thanks in advance
Perhaps convolutional neural network that evaluates (classifies) each frame of the video.
Thanx for the great post. I need to create a web navigation agent using ML. Are there any existing algorithms that I can use for predict what is the correct page to navigate?
Sorry, I am not familiar with that problem.
Perhaps perform some searches on google scholar to see what is commonly used?
Machine learning has basically evolved from artificial intelligence via pattern recognition and computational learning theory. Machine learning explores the area of algorithms, which can make high end predictions on data.
You may find this site http://ramok.tech/machine-learning/ useful for Computer Vision Application implementation in java.
Thanks for sharing.
I am very much interested in learning java machine language development.
Jason your sample chapters are awesome! Thank you so much! Planning to order for the full bundle!
Hi Sir ,
can you help me for an advance , i want to translate a word enter by user in the field to English and i don’t have a master card for verification api translate google …etc
Perhaps you can find an open source translation library?
This link does not work http://haifengl.github.io/smile/