Last Updated on

When you are getting started in Weka, you may feel overwhelmed.

There are so many datasets, so many filters and so many algorithms to choose from.

There is too much choice. There are too many things you could be doing.

Structured process is key.

I have talked about process and the need for tasks like spot checking algorithms to overcome the overwhelm and start learning useful things about your problem. In this post I want to give you a simplified version of this process that you can use to practice applied machine learning.

Discover how to prepare data, fit models, and evaluate their predictions, all without writing a line of code in my new book, with 18 step-by-step tutorials and 3 projects with Weka.

## Problem Solving Template

This template a streamlined process that focuses on learning about the problem, a good solution, and doing so very quickly.

It is organized into the six-steps of applied machine learning. Each step is broken down into specific questions for you to answer by using the Weka Explorer and the Weka Experimenter graphical user interfaces.

The six-steps of the process and their objectives are as follows:

- Problem Definition
- Data Analysis
- Data Preparation
- Evaluate Algorithms
- Improve Results
- Present Results

In the following sections I will summarize the key questions to answers for each step of the process. You might like to print out these questions or copy them into a document to create your own template document.

### Need more help with Weka for Machine Learning?

Take my free 14-day email course and discover how to use the platform step-by-step.

Click to sign-up and also get a free PDF Ebook version of the course.

## 1. Problem Definition

The objective of the problem definition is to understand and clearly describe the problem that is being solved.

### Problem Description

- What is an informal description of the problem?
- What is a formal description of the problem?
- What assumptions do you have about the problem?

### Provided Data

- What constraints were imposed to select the data?
- Define each attribute in the provided dataset.

## 2. Data Analysis

The objective of data analysis is to understand the information available that will be used to develop a model.

- What data types are the attributes?
- Are there missing or corrupted values?
- Review the distributions of the attributes, what do you notice?
- Review the distributions of the class values, what do you notice?
- Review the attribute distributions with class values in the histograms, what do you notice?
- Review pairwise scatter plots of attributes, what do you notice?

## 3. Data Preparation

The objective of data preparation is to discover and expose the structure in the dataset.

- Normalize the dataset
- Standardize the dataset
- Square the dataset
- Discretize attributes (if integer)
- Remove and/or replace missing values (if present)
- Create transforms of the dataset to test assumptions raised in the Problem Definition

## 4. Evaluate Algorithms

The objective of evaluating algorithms is to develop a test harness and baseline accuracy from which to improve.

- Explore different classification algorithms
- Design and run a spot-check experiment
- Review and interpret the algorithm rankings
- Review and interpret the algorithm accuracy
- Repeat process as needed

## 5. Improve Results

The objective of improving the results is to leverage results to develop more accurate models.

### Algorithm Tuning

- Explore different algorithm configurations
- Design and run a algorithm tuning experiment
- Review and interpret the algorithm rankings
- Review and interpret the algorithm accuracy
- Repeat process as needed

### Ensemble Methods

- Explore different ensemble methods
- Design and run a algorithm ensemble experiment
- Review and interpret the ensemble rankings
- Review and interpret the ensemble accuracy
- Repeat process as needed
- Can you improve results with other meta algorithms, such as thresholding?
- Can you improve results by using other algorithms in the same family as algorithms that are performing well?

## 6. Present Results

The objective of presenting the results is to describe problem and solution so that it can be understood by third parties.

Complete the following section to summarize the problem and solution.

- What is the Problem?
- What is the Solution?
- What were the Findings?
- What are the Limitations?
- What are the Conclusions?

## How To Use

There are a number of interesting datasets in the “*data*” directory of the Weka installation. There are also many datasets on the UCI machine learning repository that you can download and work on.

Select a problem and work through it using this template. You will be surprised at how much you learn and how much a structured process like this can help to keep you focused.

## Summary

In this post you learned about a structured template for working the process of applied machine learning. This template can be printed and used step-by-step to work through a problem in the Weka Machine Learning Workbench.

Answering the specific questions in each step of the template will quickly build up a deeper understanding of the problem and your solution to it, as it unfolds. This is invaluable, like a scientists notebook in the lab.

Good and useful process, Jason. Thanks for sharing.

Thanks.