Last Updated on August 16, 2020
Machine learning is an iterative process rather than a linear process that requires each step to be revisited as more is learned about the problem under investigation. This iterative process can require using many different tools, programs and scripts for each process.
A machine learning workbench is a platform or environment that supports and facilitates a range of machine learning activities reducing or removing the need for multiple tools.
Some statistical and machine learning work benches like R provide very advanced tools but require a lot of manual configuration in the form of scripts and programming. The tools can also be fragile, written by and for academics rather than written to be robust and used in production environments.
Kick-start your project with my new book Machine Learning Mastery With Weka, including step-by-step tutorials and clear screenshots for all examples.
What is Weka
The Weka machine learning workbench is a modern platform for applied machine learning. Weka is an acronym which stands for Waikato Environment for Knowledge Analysis. It is also the name of a New Zealand bird the Weka.
Five features of Weka that I like to promote are:
- Open Source: It is released as open source software under the GNU GPL. It is dual licensed and Pentaho Corporation owns the exclusive license to use the platform for business intelligence in their own product.
- Graphical Interface: It has a Graphical User Interface (GUI). This allows you to complete your machine learning projects without programming.
- Command Line Interface: All features of the software can used from the command line. This can be very useful for scripting large jobs.
- Java API: It is written in Java and provides a API that is well documented and promotes integration into your own applications. Note that the GNU GPL means that in turn your software would also have to be released as GPL.
- Documentation: There books, manuals, wikis and MOOC courses that can train you how to use the platform effectively.
The main reason I promote Weka is because a beginner can go through the process of applied machine learning using the graphical interface without having to do any programming. This is a big deal because getting a handle on the process, handling data and experimenting with algorithms is what a beginner should be learning about, not learning yet another scripting language.
Introduction to the Weka GUI
Now I want to show of the graphical user interface a bit and encourage you to download and have a play with Weka. The workbench provides three main ways to work on your problem: The Explorer for playing around and trying things out, the Experimenter for controlled experiments, and the KnowledgeFlow for graphically designing a pipeline for your problem.
The explorer is where you play around with your data and think about what transforms to apply to your data, what algorithms you want to run in experiments.
The Explorer interface is divided into 5 different tabs:
- Preprocess: Load a dataset and manipulate the data into a form that you want to work with.
- Classify: Select and run classification and regression algorithms to operate on your data.
- Cluster: Select and run clustering algorithms on your dataset.
- Associate: Run association algorithms to extract insights from your data.
- Select Attributes: Run attribute selection algorithms on your data to select those attributes that are relevant to the feature you want to predict.
- Visualize: Visualize the relationship between attributes.
This interface is for designing experiments with your selection of algorithms and datasets, running experiments and analyzing the results.
The tools for analyzing results are very powerful, allowing you to consider and compare results that are statistically significant over multiple runs.
Need more help with Weka for Machine Learning?
Take my free 14-day email course and discover how to use the platform step-by-step.
Click to sign-up and also get a free PDF Ebook version of the course.
Applied machine learning is a process and the Knowledge Flow interface allows you to graphically design that process and run the designs that you create. This includes the loading and transforming of input data, running of algorithms and the presentation of results.
It’s a powerful interface and metaphor for solving complex problems graphically.
Tips for Getting Started
Here are some tips for getting up and running fast:
Download Weka Right Now
It supports the three main platforms: Windows, OS X and Linux. Find the distribution for your platform, download it, install it and start it up. You might have to install Java first. The installation includes many standard experimental datasets (in the data directory) that you can load and practice on.
Read the Weka Documentation
The download includes a PDF manual (WekaManual.pdf) that can get you up to speed very quickly. It is very details and comprehensive with screenshots. There is plenty of supplemetry documentation online, check out:
Don’t forget the book. If you get into Weka, then buy the book. It provides an introduction to applied machine learning as well as an introduction to the Weka platform itself. Highly recommended.
Extensions and Plugins for Weka
There are a lot of plugin algorithm, extends and even platforms that build on Weka:
Online Courses on Weka
There are two online courses that teach data mining with Weka:
- Data Mining with Weka. You can watch all the videos for this course for free on YouTube.
- More Data Mining with Weka.
Have you used Weka? Leave a comment and share your experiences.