[New Book] Click to get The Beginner's Guide to Data Science!
Use the offer code 20offearlybird to get 20% off. Hurry, sale ends soon!

Java Machine Learning

Are you a Java programmer and looking to get started or practice machine learning?

Writing programs that make use of machine learning is the best way to learn machine learning. You can write the algorithms yourself from scratch, but you can make a lot more progress if you leverage an existing open source library.

In this post you will discover the major platforms and open source machine learning libraries you can use in Java.

Environments

This section describes Java-based environments or workbenches that can be used for machine learning. They are called environments because they provided graphical user interfaces for performing machine learning tasks, but also provided Java APIs for developing your own applications.

Weka

Waikato Environment for Knowledge Analysis (Weka) is a machine learning platform developed by the University of Waikato, New Zealand. It is written in Java and provides a graphical user interface, command line interface and Java API. It is perhaps the most popular Java machine learning library and a great place to start or practice machine learning.

Weka Explorer Interface with the Iris dataset loaded

Weka Explorer Interface with the Iris dataset loaded

KNIME

The Konstanz Information Miner (KIME) is an analytics and reporting platform developed by Konstanz University, Germany. It was developed with a focus on pharmaceutical research, but has expanded into general business intelligence. It provides a graphical user interface (based on Eclipse) and a Java API.

Screenshot of KNIME

Screenshot of KNIME
Some rights reserved

RapidMiner

RapidMiner used to be called Yet Another Learning Environment (YALE) and was developed at Technical University of Dortmund, Germany. It provides a GUI and a Java API for developing your own applications. It provides data handling, visualization and modeling with machine learning algorithms.

RapidMiner Screenshot

RapidMiner Screenshot

ELKI

The Environment for DeveLoping KDD-Applications Supported by Index-Structures (ELKI) is a data mining workbench developed in Java by the Ludwig Maximilian University of Munich, Germany. It has a focus on working with data in relational database for tasks such as outlier detection and classification (distance function based methods). It provides a mini GUI, command line interface and Java API.

ELKI Screenshot

ELKI Screenshot
Some rights reserved

Libraries

Practically every project listed on this page is/has a library with a Java API, those projects listed in this section only provide a Java API. They are machine learning libraries in the narrow sense.

Java-ML

The Java Machine Learning Library (Java-ML) provides a collection of machine learning algorithms implemented in Java. It provides a standard interface for each algorithm, no UIs and references to the relevant scientific literature for further reading. It includes methods for data manipulation, clustering, feature selection and classification. Note that at the time of writing, the last release was in 2012.

JSAT

The Java Statistical Analysis Tool (JSAT) provides pure Java implementations of standard machine learning algorithms for modest sized problems. The author comments that he developed the library partly as a self-education exercise and partly to get things done. Nevertheless the list of algorithms is impressive. It includes classification, regression, ensemble, clustering and feature selection methods.

Big Data

This section lists Java projects intended for use with Big Data, such as on clusters of machines.

Mahout (Hadoop)

Apache Mahout provides implementations of machine learning algorithms for use on the Apache Hadoop platform (distributed map-reduce). The project provides a focus on clustering and classification algorithms and a popular application driving implementation is its use in collaborative filtering for recommender systems. Reference implementations of algorithms that run on a single node are also included.

MLlib (Spark)

Apache Machine Learning Library provides implementations of machine learning algorithms for use on the Apache Spark platform (HDFS, but not map-reduce). Although Java, the library and the platform support Java, Scala and Python bindings. The library is new and the list of algorithms is short, but growing quickly.

MOA

Massive Online Analysis (MOA) is an open source platform designed for data stream mining by University of Waikato, New Zealand. Like Weka (developed at the same place), it provides a GUI, command line interface and Java API. It provides a long list of algorithms wit ha focus on classification and support for outlier detection and addressing concept drift. MOA uses the Advanced Data mining And Machine learning System (ADAMS) for managing workflows also developed at the same place.

SAMOA

Scalable Advanced Massive Online Analysis (SAMOA) is a distributed streaming machine learning framework developed by Yahoo!. It is designed to run on Apache Storm and Apache S4. The system can leverage the algorithms provided by the MOA project for tasks like classification.

Natural Language Processing

This section is dedicated to Java libraries and projects for addressing problems from the subfield of machine learning called Natural Language Processing (NLP).

NLP is not my area, so I’ll just point to the key libraries.

  • OpenNLP: Apache OpenNLP is a toolkit for processing natural language text. It provides methods for NLP tasks such as tokenization, segmentation, and entity extraction.
  • LingPipeLingPipe is a toolkit for computational linguistics and includes methods for topic classification, entity extraction, clustering, and sentiment analysis.
  • GATE: The General Architecture for Text Engineering (GATE) is an open source library for text processing. It provides an array of sub-projects targeted at different use cases.
  • MALLETMachine Learning for Language Toolkit (MALLET) is a Java toolkit fro statistical natural language processing, document classification, clustering, topic modeling and information extraction.

Computer Vision

This section lists those libraries for the subfield of machine learning called Computer Vision (CV).

Again, CV is not my area, so I’ll just point to the key libraries.

  • BoofCVBoofCV is an open source library for computer vision and robotics applications. It supports features such as image processing, features, geometric vision, calibration, recognition and image data IO.

Deep Learning

Neural Nets are hot again with the development of deep learning methods and faster hardware. This section lists key Java libraries for working with neural networks and deep learning.

  • Encog: Encog is a machine learning library that provides algorithms such as SVM, classical neural networks, genetic programming, bayesian networks, HMM and genetic algorithms.
  • Deeplearning4j: Deeplearning4j is claimed to be a commercial-grade deep learning library written in Java. It is described as being compatible with Hadoop and provides algorithms including Restricted Boltzmann machines, deep-belief networks and Stacked Denoising Autoencoders.

Summary

In this round-up post we have touched on the big name options when selecting a library or platform for machine learning when working in Java.

These are the players and the popular projects, but by no means is this list complete. For example, take a look at this page on MLOSS.org that lists (at the time of writing) 71 Java-based open source machine learning projects. That’s a lot and I’m sure there are more on GitHub and SourceForge.

They key is to think hard about your own project and it’s requirements. Figure out what you need from a library or platform and then pick and learn a project that best fits your needs.

47 Responses to Java Machine Learning

  1. Avatar
    Kayode April 8, 2016 at 1:22 pm #

    This is very interesting. Thanks for sharing. Please have you worked with MOA before? I have some questions to ask on MOA.

    • Avatar
      Jason Brownlee April 8, 2016 at 1:33 pm #

      Thanks Kayode.

      No sorry, I have not worked with MOA before.

      • Avatar
        Irfan Ullah February 5, 2019 at 6:43 pm #

        How can I use Genetic algorithm in Weka fro classification?

  2. Avatar
    Nemanja May 14, 2016 at 5:15 pm #

    Thank you very much very educational website

  3. Avatar
    Marcus May 24, 2016 at 7:31 pm #

    Just came across SMILE (http://haifengl.github.io/smile/).

    Seems quite complete already. Does anybody have experiences with it?

    Makes me think about ditching WEKA from Java Code, which can be a hassle at times.

  4. Avatar
    Gaurav Gupta June 13, 2016 at 2:20 am #

    Thanks for listing the Java based ML libraries/tools. But it looks to me that you have missed http://www.h2o.ai/ . Its a java based ML and deep learning APIs and one call/embed it in a JVM based application, please refer – https://www.linkedin.com/pulse/calling-h2o-from-jvm-applications-raymond-peck

    • Avatar
      Jason Brownlee June 14, 2016 at 8:11 am #

      Thanks Gaurav.

      Generally, I would group H2o as MLaaS more than a Java library.

  5. Avatar
    Swaroop June 25, 2016 at 4:16 am #

    Great Info. But do you know under MLlib (Spark) section, “Machine” spelling is wrong. Sorry to spam, but couldn’t resist to notify.

  6. Avatar
    Douglas Arantes July 1, 2016 at 1:39 am #

    Jason can you add the Smile library to the list?

    http://haifengl.github.io/smile/

  7. Avatar
    Анастасия Ананьева July 15, 2016 at 8:42 pm #

    We are proud to announce that we moved the website for Java-ML from my personal hosting to Sourceforge .

  8. Avatar
    Yonas September 2, 2016 at 8:46 am #

    Have you done any kind of programming with the NLP API of the Rapidminer? Or trying to improve any text processing module of the Rapidminer?

    Thanks in advance!

  9. Avatar
    Gamer Gamer October 23, 2016 at 9:02 am #

    Hey Jason, i have a java ML project to make for stock market prediction using HMM Model. Which library should i use? How should I proceed? Also, i am totally new to machine learning. Please give me an overview and reply asap!!

  10. Avatar
    Nagaraj November 11, 2016 at 4:30 pm #

    Hi Jason you are website is so good.Thanks for you post.

  11. Avatar
    Nagaraj November 11, 2016 at 4:33 pm #

    I hope you have good knowledge on machine learning project.Then why don’t you start your own product based company.

    • Avatar
      Jason Brownlee November 12, 2016 at 7:19 am #

      I have more fun teaching and helping developers get started, thanks Nagaraj.

  12. Avatar
    Michael December 23, 2016 at 1:19 am #

    Hi Jason,

    great post 🙂

  13. Avatar
    Eyad Farouk January 17, 2017 at 11:38 pm #

    Hello jason,
    First of all , I benefit alot from your posts so I would like to leave a thanks for that.
    Second of all, I want to know if there are specific books to learn machine learning in java. I read your post the other day but the books were all in Python and R. Since I am alredy a java programmer, it would be alot easier to learn the algorithms and build applications in java at the first place instead of learning the algorithms in python and then switching back.
    Thanks in advance.

    • Avatar
      Jason Brownlee January 18, 2017 at 10:15 am #

      I don’t have any material to learn ML in Java at this stage Eyad. Hopefully in the future.

  14. Avatar
    Xiaogang February 13, 2017 at 1:56 pm #

    Thank you very much for your sharing. Cheers

  15. Avatar
    Adeel Aslam March 19, 2017 at 2:32 pm #

    After 2.4 OpenCV now fully supports Mac and Java and it is much better than BoofCV.

  16. Avatar
    Avdhesh Yadav March 22, 2017 at 9:22 pm #

    It’s great tutorial.What is future in this field.How to integrate with EDA Tools like code auto-generation via spec.

    Thanks Avdhesh

    • Avatar
      Jason Brownlee March 23, 2017 at 8:50 am #

      Great idea to combine EDA tools with code generation.

  17. Avatar
    gourab April 17, 2017 at 5:16 am #

    Thank you for much useful article. I want to learn Deep learning/ machine learning where a system should detect an object from video and take certain action. what do you suggest? which one should i learn. i am really new and curious . thanks in advance

    • Avatar
      Jason Brownlee April 18, 2017 at 8:25 am #

      Perhaps convolutional neural network that evaluates (classifies) each frame of the video.

  18. Avatar
    erangaz July 29, 2017 at 3:41 pm #

    Thanx for the great post. I need to create a web navigation agent using ML. Are there any existing algorithms that I can use for predict what is the correct page to navigate?

    • Avatar
      Jason Brownlee July 30, 2017 at 7:38 am #

      Sorry, I am not familiar with that problem.

      Perhaps perform some searches on google scholar to see what is commonly used?

  19. Avatar
    Priya January 3, 2018 at 4:56 pm #

    Machine learning has basically evolved from artificial intelligence via pattern recognition and computational learning theory. Machine learning explores the area of algorithms, which can make high end predictions on data.

  20. Avatar
    Blerta March 18, 2018 at 11:38 am #

    Hi ,
    You may find this site http://ramok.tech/machine-learning/ useful for Computer Vision Application implementation in java.

  21. Avatar
    Delkn April 18, 2018 at 9:56 pm #

    I am very much interested in learning java machine language development.

  22. Avatar
    Saravanakumar S November 26, 2018 at 5:26 pm #

    Jason your sample chapters are awesome! Thank you so much! Planning to order for the full bundle!

  23. Avatar
    oussaifi majdi May 19, 2020 at 4:40 pm #

    Hi Sir ,
    can you help me for an advance , i want to translate a word enter by user in the field to English and i don’t have a master card for verification api translate google …etc

    • Avatar
      Jason Brownlee May 20, 2020 at 6:20 am #

      Perhaps you can find an open source translation library?

  24. Avatar
    Dzung Nguyen February 25, 2021 at 5:51 am #

    This link does not work http://haifengl.github.io/smile/

Leave a Reply