Machine Learning Algorithm Recipes in scikit-learn

By Jason Brownlee on August 21, 2019 in Python Machine Learning 29

You have to get your hands dirty.

You can read all of the blog posts and watch all the videos in the world, but you’re not actually going to start really get machine learning until you start practicing.

The scikit-learn Python library is very easy to get up and running. Nevertheless I see a lot of hesitation from beginners looking get started. In this blog post I want to give a few very simple examples of using scikit-learn for some supervised classification algorithms.

Kick-start your project with my new book Machine Learning Mastery With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Scikit-Learn Recipes

You don’t need to know about and use all of the algorithms in scikit-learn, at least initially, pick one or two (or a handful) and practice with only those.

In this post you will see 5 recipes of supervised classification algorithms applied to small standard datasets that are provided with the scikit-learn library.

The recipes are principled. Each example is:

Standalone: Each code example is a self-contained, complete and executable recipe.
Just Code: The focus of each recipe is on the code with minimal exposition on machine learning theory.
Simple: Recipes present the common use case, which is probably what you are looking to do.
Consistent: All code example are presented consistently and follow the same code pattern and style conventions.

The recipes do not explore the parameters of a given algorithm. They provide a skeleton that you can copy and paste into your file, project or python REPL and start to play with immediately.

These recipes show you that you can get started practicing with scikit-learn right now. Stop putting it off.

Logistic Regression

Logistic regression fits a logistic model to data and makes predictions about the probability of an event (between 0 and 1).

This recipe shows the fitting of a logistic regression model to the iris dataset. Because this is a mutli-class classification problem and logistic regression makes predictions between 0 and 1, a one-vs-all scheme is used (one model per class).

# Logistic Regression
from sklearn import datasets
from sklearn import metrics
from sklearn.linear_model import LogisticRegression
# load the iris datasets
dataset = datasets.load_iris()
# fit a logistic regression model to the data
model = LogisticRegression()
model.fit(dataset.data, dataset.target)
print(model)
# make predictions
expected = dataset.target
predicted = model.predict(dataset.data)
# summarize the fit of the model
print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))

# Logistic Regression

from sklearn import datasets

from sklearn import metrics

from sklearn.linear_model import LogisticRegression

# load the iris datasets

dataset = datasets.load_iris()

# fit a logistic regression model to the data

model = LogisticRegression()

model.fit(dataset.data, dataset.target)

print(model)

# make predictions

expected = dataset.target

predicted = model.predict(dataset.data)

# summarize the fit of the model

print(metrics.classification_report(expected, predicted))

print(metrics.confusion_matrix(expected, predicted))

For more information see the API reference for Logistic Regression for details on configuring the algorithm parameters. Also see the Logistic Regression section of the user guide.

Naive Bayes

Naive Bayes uses Bayes Theorem to model the conditional relationship of each attribute to the class variable.

This recipe shows the fitting of an Naive Bayes model to the iris dataset.

# Gaussian Naive Bayes
from sklearn import datasets
from sklearn import metrics
from sklearn.naive_bayes import GaussianNB
# load the iris datasets
dataset = datasets.load_iris()
# fit a Naive Bayes model to the data
model = GaussianNB()
model.fit(dataset.data, dataset.target)
print(model)
# make predictions
expected = dataset.target
predicted = model.predict(dataset.data)
# summarize the fit of the model
print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))

# Gaussian Naive Bayes

from sklearn import datasets

from sklearn import metrics

from sklearn.naive_bayes import GaussianNB

# load the iris datasets

dataset = datasets.load_iris()

# fit a Naive Bayes model to the data

model = GaussianNB()

model.fit(dataset.data, dataset.target)

print(model)

# make predictions

expected = dataset.target

predicted = model.predict(dataset.data)

# summarize the fit of the model

print(metrics.classification_report(expected, predicted))

print(metrics.confusion_matrix(expected, predicted))

For more information see the API reference for the Gaussian Naive Bayes for details on configuring the algorithm parameters. Also see the Naive Bayes section of the user guide.

k-Nearest Neighbor

The k-Nearest Neighbor (kNN) method makes predictions by locating similar cases to a given data instance (using a similarity function) and returning the average or majority of the most similar data instances. The kNN algorithm can be used for classification or regression.

This recipe shows use of the kNN model to make predictions for the iris dataset.

# k-Nearest Neighbor
from sklearn import datasets
from sklearn import metrics
from sklearn.neighbors import KNeighborsClassifier
# load iris the datasets
dataset = datasets.load_iris()
# fit a k-nearest neighbor model to the data
model = KNeighborsClassifier()
model.fit(dataset.data, dataset.target)
print(model)
# make predictions
expected = dataset.target
predicted = model.predict(dataset.data)
# summarize the fit of the model
print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))

# k-Nearest Neighbor

from sklearn import datasets

from sklearn import metrics

from sklearn.neighbors import KNeighborsClassifier

# load iris the datasets

dataset = datasets.load_iris()

# fit a k-nearest neighbor model to the data

model = KNeighborsClassifier()

model.fit(dataset.data, dataset.target)

print(model)

# make predictions

expected = dataset.target

predicted = model.predict(dataset.data)

# summarize the fit of the model

print(metrics.classification_report(expected, predicted))

print(metrics.confusion_matrix(expected, predicted))

For more information see the API reference for the k-Nearest Neighbor for details on configuring the algorithm parameters. Also see the k-Nearest Neighbor section of the user guide.

Classification and Regression Trees

Classification and Regression Trees (CART) are constructed from a dataset by making splits that best separate the data for the classes or predictions being made. The CART algorithm can be used for classification or regression.

This recipe shows use of the CART model to make predictions for the iris dataset.

# Decision Tree Classifier
from sklearn import datasets
from sklearn import metrics
from sklearn.tree import DecisionTreeClassifier
# load the iris datasets
dataset = datasets.load_iris()
# fit a CART model to the data
model = DecisionTreeClassifier()
model.fit(dataset.data, dataset.target)
print(model)
# make predictions
expected = dataset.target
predicted = model.predict(dataset.data)
# summarize the fit of the model
print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))

# Decision Tree Classifier

from sklearn import datasets

from sklearn import metrics

from sklearn.tree import DecisionTreeClassifier

# load the iris datasets

dataset = datasets.load_iris()

# fit a CART model to the data

model = DecisionTreeClassifier()

model.fit(dataset.data, dataset.target)

print(model)

# make predictions

expected = dataset.target

predicted = model.predict(dataset.data)

# summarize the fit of the model

print(metrics.classification_report(expected, predicted))

print(metrics.confusion_matrix(expected, predicted))

For more information see the API reference for CART for details on configuring the algorithm parameters. Also see the Decision Tree section of the user guide.

Support Vector Machines

Support Vector Machines (SVM) are a method that uses points in a transformed problem space that best separate classes into two groups. Classification for multiple classes is supported by a one-vs-all method. SVM also supports regression by modeling the function with a minimum amount of allowable error.

This recipe shows use of the SVM model to make predictions for the iris dataset.

# Support Vector Machine
from sklearn import datasets
from sklearn import metrics
from sklearn.svm import SVC
# load the iris datasets
dataset = datasets.load_iris()
# fit a SVM model to the data
model = SVC()
model.fit(dataset.data, dataset.target)
print(model)
# make predictions
expected = dataset.target
predicted = model.predict(dataset.data)
# summarize the fit of the model
print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))

# Support Vector Machine

from sklearn import datasets

from sklearn import metrics

from sklearn.svm import SVC

# load the iris datasets

dataset = datasets.load_iris()

# fit a SVM model to the data

model = SVC()

model.fit(dataset.data, dataset.target)

print(model)

# make predictions

expected = dataset.target

predicted = model.predict(dataset.data)

# summarize the fit of the model

print(metrics.classification_report(expected, predicted))

print(metrics.confusion_matrix(expected, predicted))

For more information see the API reference for SVM for details on configuring the algorithm parameters. Also see the SVM section of the user guide.

Summary

In this post you have seen 5 self-contained recipes demonstrating some of the most popular and powerful supervised classification problems.

Each example is less than 20 lines that you can copy and paste and start using scikit-learn, right now. Stop reading and start practicing. Pick one recipe and run it, then start to play with the parameters and see what effect that has on the results.

29 Responses to Machine Learning Algorithm Recipes in scikit-learn

DR Venugopala Rao Manneni April 7, 2016 at 5:31 pm #

Thanks for these Jason. Can you also please give the same for Neural networks (MLP)

Reply
Ajinkya June 12, 2016 at 8:48 am #

Thanks for this informative tutorial.
Can you please explain how logistic regression is used for classification where more than 2 classes are involved.?
Thanks

Reply
- Jason Brownlee June 14, 2016 at 8:14 am #
  
  Great question Ajinkya.
  
  Generally, you can take an algorithm designed for binary (two-class) classification and turn it into a multi-class classification algorithm by using the one-vs-all meta algorithm. You create n models, where n is the number of classes. Each model makes a prediction to provide a vector of predictions and the final prediction can be taken as the model for the class that had the highest probability.
  
  This can be used with logistic regression and is very popular with support vector machines.
  
  More on the one-vs-all meta algorithm here:
  https://en.wikipedia.org/wiki/Multiclass_classification
  
  Reply
Nicolas November 23, 2016 at 1:12 am #

Hey

Thank you very much for these helpful examples! I searched a lot until I found this website. You actually saved me a lot of time and nerves with doing an assignment for my ML course at my university 🙂

Keep up the great work!

Reply
- Jason Brownlee November 23, 2016 at 9:00 am #
  
  I’m very glad to hear that Nicolas.
  
  Reply
  - Ash October 24, 2018 at 2:11 am #
    
    Hi Jason, How do which algorithm I can use to compare nearest match for a “String” value and then also test its accuracy. e.g. my data has value FR for country but I need FRA, how do I ensure that I predict FRA and provide a accurate predicted match to the end users? Sorry very basic question but new to ML hence the question.
    
    Reply
    - Jason Brownlee October 24, 2018 at 6:31 am #
      
      Sorry, I don’t have material on string matching/similarity algorithms.
      
      Reply
Gill Bates February 11, 2017 at 3:18 am #

Dear Jason,
Great job.
Can you please show how to implement other algorithms or “how to catch fish”?
Tks.

Reply
- Jason Brownlee February 11, 2017 at 5:05 am #
  
  Which algorithms Gill?
  
  Reply
lalit April 6, 2017 at 9:32 pm #

Test data should not be used for training. Here you are using full training data as test data which is wrong

Reply
- Jason Brownlee April 9, 2017 at 2:39 pm #
  
  Yes, I agree. These are just examples on how to fit models in sklearn.
  
  Reply
- Adi Usman October 20, 2019 at 9:39 am #
  
  Thanks for the wonderful beginners’s tutorial. It actually got started. Could you please explain how to interpret the reslts results?
  
  Reply
  - Jason Brownlee October 21, 2019 at 6:12 am #
    
    Thanks.
    
    Sure, which part?
    
    Reply
Brian Tremaine July 28, 2017 at 3:17 am #

Thank you for this tutorial, very helpfull.

I have run the MNIST character recognition using Naive Bayes (GaussianNB) and the results were very poor compared to nearest neighbors. Is the an sklearn function for Bayes that uses priors? I’ve searched but haven’t found anything,

Thanks,
Brian

Reply
- Jason Brownlee July 28, 2017 at 8:33 am #
  
  I would expect that naive Bayes in sklearn would use priors.
  
  The only time priors are dropped is when they add nothing to the equation (e.g. both classes have the same number of obs).
  
  Reply
Jarrell R Dunson October 24, 2017 at 6:53 am #

Question…I’m trying the code for sklearn.naive_bayes import GaussianNB

but this doesn’t seem to work from Python 3.5 or 3.6 …

Is this only to run in Python 2?

Reply
- Jason Brownlee October 24, 2017 at 3:57 pm #
  
  No. It works with py2 and py3.
  
  Perhaps double check your version of sklearn?
  
  Reply
Jarrell R Dunson October 25, 2017 at 12:51 am #

Thanks… upgraded sklearn, and it works

Reply
- Jason Brownlee October 25, 2017 at 6:48 am #
  
  Glad to hear it!
  
  Reply
DG March 1, 2018 at 8:46 am #

Thanks for the info, can you post similar examples for cluster analysis or K-means using quantitative and qualitative data?

Reply
- Jason Brownlee March 1, 2018 at 3:08 pm #
  
  Thanks for the suggestion.
  
  Reply
Jesús Martínez April 18, 2018 at 1:20 am #

Awesome. Scikit-learn is great. Thanks for sharing!

Reply
- Jason Brownlee April 18, 2018 at 8:10 am #
  
  Thanks.
  
  Reply
Fredrick Ughimi February 11, 2019 at 4:08 am #

Hello Jason, thanks for the time and efforts you put into all this. Very streamlined informative tutorial. More grease.

Reply
- Jason Brownlee February 11, 2019 at 8:01 am #
  
  Thanks.
  
  Reply
Jim March 3, 2019 at 9:42 am #

Hi Jason,

For logistic regression, I got warnings suggesting that I set both the solver and the multi_class arguments. So I used model = LogisticRegression(solver=”newton-cg”, multi_class=”ovr”) and this got rid of them.

Could you share any thoughts on what these two arguments are doing?

Thanks,
Jim

Reply
- Jason Brownlee March 4, 2019 at 6:55 am #
  
  Yes, great question, you can learn more here:
  https://machinelearningmastery.com/how-to-fix-futurewarning-messages-in-scikit-learn/
  
  Reply
SIYABONGA September 18, 2021 at 5:07 am #

HI
How can I plot the scatter plot of the class predicted by kNN classifier

Thank You

Regards
Siya

Reply
- Adrian Tam September 19, 2021 at 6:21 am #
  
  Do you think the example from scikit-learn documentation helps? https://scikit-learn.org/stable/auto_examples/neighbors/plot_classification.html
  
  Reply

Navigation

Machine Learning Algorithm Recipes in scikit-learn

Scikit-Learn Recipes

Logistic Regression

Naive Bayes

k-Nearest Neighbor

Classification and Regression Trees

Support Vector Machines

Summary

Discover Fast Machine Learning in Python!

Develop Your Own Models in Minutes

Finally Bring Machine Learning To
Your Own Projects

More On This Topic

29 Responses to Machine Learning Algorithm Recipes in scikit-learn

Leave a Reply Click here to cancel reply.

Navigation

Scikit-Learn Recipes

Logistic Regression

Naive Bayes

k-Nearest Neighbor

Classification and Regression Trees

Support Vector Machines

Summary

Discover Fast Machine Learning in Python!

Develop Your Own Models in Minutes

Finally Bring Machine Learning To Your Own Projects

More On This Topic

29 Responses to Machine Learning Algorithm Recipes in scikit-learn

Leave a Reply Click here to cancel reply.

Finally Bring Machine Learning To
Your Own Projects