Supervised and Unsupervised Machine Learning Algorithms

By Jason Brownlee on October 3, 2023 in Machine Learning Algorithms 273

What is supervised machine learning and how does it relate to unsupervised machine learning?

In this post you will discover supervised learning, unsupervised learning and semi-supervised learning. After reading this post you will know:

About the classification and regression supervised learning problems.
About the clustering and association unsupervised learning problems.
Example algorithms used for supervised and unsupervised problems.
A problem that sits in between supervised and unsupervised learning called semi-supervised learning.

Kick-start your project with my new book Master Machine Learning Algorithms, including step-by-step tutorials and the Excel Spreadsheet files for all examples.

Let’s get started.

Supervised and Unsupervised Machine Learning Algorithms
Photo by US Department of Education, some rights reserved.

Supervised Machine Learning

The majority of practical machine learning uses supervised learning.

Supervised learning is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output.

Y = f(X)

The goal is to approximate the mapping function so well that when you have new input data (x) that you can predict the output variables (Y) for that data.

It is called supervised learning because the process of an algorithm learning from the training dataset can be thought of as a teacher supervising the learning process. We know the correct answers, the algorithm iteratively makes predictions on the training data and is corrected by the teacher. Learning stops when the algorithm achieves an acceptable level of performance.

Get your FREE Algorithms Mind Map

Sample of the handy machine learning algorithms mind map.

I've created a handy mind map of 60+ algorithms organized by type.

Download it, print it and use it.

Also get exclusive access to the machine learning algorithms email mini-course.

Supervised learning problems can be further grouped into regression and classification problems.

Classification: A classification problem is when the output variable is a category, such as “red” or “blue” or “disease” and “no disease”. Sometimes these categories are represented by numbers but their value carries no meaning. They are just labels.
Regression: A regression problem is when the output variable is a real number value, such as “dollars” or “weight”.

Some common types of problems built on top of classification and regression include recommendation and time series prediction respectively.

Some popular examples of supervised machine learning algorithms are:

Linear regression for regression problems.
Random forest for classification and regression problems.
Support vector machines for classification problems.

Unsupervised Machine Learning

Unsupervised learning is where you only have input data (X) and no corresponding output variables.

The goal for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data.

These are called unsupervised learning because unlike supervised learning above there is no correct answers and there is no teacher. Algorithms are left to their own devises to discover and present the interesting structure in the data.

Unsupervised learning problems can be further grouped into clustering and association problems.

Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior.
Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.

Some popular examples of unsupervised learning algorithms are:

k-means for clustering problems.
Apriori algorithm for association rule learning problems.
LDA for topic modeling of text passages, i.e., discover and associate keywords to text.

Semi-Supervised Machine Learning

Problems where you have a large amount of input data (X) and only some of the data is labeled (Y) are called semi-supervised learning problems.

These problems sit in between both supervised and unsupervised learning.

A good example is a photo archive where only some of the images are labeled, (e.g. dog, cat, person) and the majority are unlabeled.

Many real world machine learning problems fall into this area. This is because it can be expensive or time-consuming to label data as it may require access to domain experts. Whereas unlabeled data is cheap and easy to collect and store.

You can use unsupervised learning techniques to discover and learn the structure in the input variables.

You can also use supervised learning techniques to make best guess predictions for the unlabeled data, feed that data back into the supervised learning algorithm as training data and use the model to make predictions on new unseen data.

The recent development of language models in machine learning is a good example of semi-supervised machine learning: For a given sentence, the learning algorithm is to predict word N+1 based on words 1 to N from the sentence. The label (Y) can be derived from the input (X).

Summary

In this post you learned the difference between supervised, unsupervised and semi-supervised learning. You now know that:

Supervised: All data is labeled and the algorithms learn to predict the output from the input data.
Unsupervised: All data is unlabeled and the algorithms learn to inherent structure from the input data.
Semi-supervised: Some data is labeled but most of it is unlabeled and a mixture of supervised and unsupervised techniques can be used.

Do you have any questions about supervised, unsupervised or semi-supervised learning? Leave a comment and ask your question and I will do my best to answer it.

273 Responses to Supervised and Unsupervised Machine Learning Algorithms

Omot August 20, 2016 at 2:32 pm #

Thanks for this post. That was helpful. My question is how does one determine the correct algorithm to use for a particular problem in supervised learning? Also,can a network trained by unsupervised learning be tested with new set of data (testing data) or its just for the purpose of grouping?

Reply
- Jason Brownlee August 21, 2016 at 6:15 am #
  
  Hi Omot, it is a good idea to try a suite of standard algorithms on your problem and discover what algorithm performs best.
  
  Normally, an unsupervised method is applied to all data available in order to learn something about that data and the broader problem. You could say cluster a “training” dataset and later see what clusters new data is closest to if you wanted to avoid re-clustering the data.
  
  Reply
  - SABARISH V April 7, 2018 at 2:44 pm #
    
    sir, does k-means clustering can be implemented in MATLAB to predict the data for unsupervised learning.
    
    Reply
    - Jason Brownlee April 8, 2018 at 6:13 am #
      
      k-means is a clustering algorithm. It is not used to make predictions, instead it is used to group data. Learn more here:
      https://en.wikipedia.org/wiki/K-means_clustering
      
      Reply
      - Ankit January 10, 2021 at 11:58 am #
        
        Hello
        
        Could clustering be used to create a dependent categorical variable from a number of numerical independent variables?
        I am faced with a problem where i have a dataset with multiple independent numerical columns but i am not sure whether the dependent variable is correct.
      - Jason Brownlee January 10, 2021 at 1:09 pm #
        
        Sure. Try it and see if it helps.
    - Ella Brown January 7, 2019 at 9:14 pm #
      
      Hi, Sabarish v!
      here you can better understand about k-algorithm, explained very well
      
      https://blog.carbonteq.com/practical-image-recognition-with-tensorflow/
      
      Reply
      - Jason Brownlee January 8, 2019 at 6:47 am #
        
        Thanks for sharing.
  - Tarun September 7, 2018 at 8:49 am #
    
    Which of the following is a supervised learning problem?
    A) Grouping people in a social network.
    B) Predicting credit approval based on historical data
    C) Predicting rainfall based on historical data
    D) all of the above
    
    Reply
    - Jason Brownlee September 7, 2018 at 1:56 pm #
      
      I’d rather not do your homework for you.
      
      This framework can help you figure whether any problem is a supervised learning problem:
      https://machinelearningmastery.com/how-to-define-your-machine-learning-problem/
      
      Reply
    - Saloni December 18, 2018 at 10:49 pm #
      
      B
      
      Reply
    - Vamsi March 7, 2019 at 5:21 pm #
      
      B and C
      
      Reply
    - Sriharsha Arangi April 16, 2019 at 5:01 am #
      
      B and C
      
      Reply
- angel November 22, 2016 at 9:58 am #
  
  I need help in solving a problem. I have utilized all resources available and the school can’t find a tutor in this subject. My question is this: What is the best method to choose if you want to train an algorithm that can discriminate between patients with hypertension and patients with hypertension and diabetes. Please help me understand!
  
  Reply
  - Jason Brownlee November 23, 2016 at 8:48 am #
    
    Hi Angel, this sounds like a problem specific problem.
    
    In general, we cannot know which data representation is best or which algorithm is best, they must be discovered empirically:
    https://machinelearningmastery.com/a-data-driven-approach-to-machine-learning/
    
    I teach a process for working through predictive modeling problems methodically that you may find useful:
    https://machinelearningmastery.com/start-here/#process
    
    Reply
- Seho Kim October 16, 2018 at 11:14 am #
  
  very informing article that tells differences between supervised and unsupervised learning!
  thanks!
  
  Reply
  - Jason Brownlee October 16, 2018 at 2:35 pm #
    
    Thanks.
    
    Reply
- ZNIBER MOHAMMED April 16, 2019 at 8:40 pm #
  
  You can optimize your algorithm or compare between algorithms using Cross validation which in the case of supervised learning tries to find the best data to use for training and testing the algorithm.
  
  Reply
Pragya Poonia August 23, 2016 at 1:08 pm #

This content is really helpful. Can you give some examples of all these techniques with best description?? or a brief introduction of Reinforcement learning with example??

Reply
- Jason Brownlee August 24, 2016 at 8:19 am #
  
  Take a look at this post for a good list of algorithms:
  https://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/
  
  Reply
Naveen October 10, 2016 at 8:16 pm #

Hi Jason,

Thank you for summary on types of ML algorithms
How can one use clustering or unsupervised learning for prediction on a new data. I have clustered the input data into clusters using hierarchical clustering, Now I want to check the membership of new data with the identified clusters. How is it possible. Is there an algorithm available in R?

Reply
- Jason Brownlee October 11, 2016 at 7:23 am #
  
  Hi Naveen, generally I don’t use unsupervised methods much as I don’t get much value from them in practice.
  
  You can use the cluster number, cluster centroid or other details as an input for modeling.
  
  Reply
Tashrif October 25, 2016 at 9:03 am #

Could you please give me a real world example of supervised, unsupervised, and semi supervised learning?

Reply
- Jason Brownlee October 26, 2016 at 8:25 am #
  
  Hi Tashrif,
  
  Supervised would be when you have a ton of labeled pictures of dogs and cats and you want to automatically label new pictures of dogs and cats.
  
  Unsupervised would be when you want to see how the pictures structurally relate to each other by color or scene or whatever.
  
  Semi-supervised is where you have a ton of pictures and only some are labelled and you want to use the unlabeled and the labelled to help you in turn label new pictures in the future.
  
  Reply
Frank M November 12, 2016 at 7:38 am #

This was a really good read, so thanks for writing and publishing it.

Question for you. I have constructed a Random Forest model, so I’m using supervised learning, and I’m being asked to run an unlabeled data set through it. But I won’t have the actual results of this model, so I can’t determine accuracy on it until I have the actual result of it.

So my question is… how can I run a set of data through a ML model if I don’t have labels for it?

For further clarity and context, I’m running a random forest model to predict a binary classification label. I get the first few data points relatively quickly, but the label takes 30 days to become clear.

Maybe none of this makes sense, but I appreciate any direction you could possibly give.

Many thanks,

Frank

Reply
- Jason Brownlee November 14, 2016 at 7:30 am #
  
  Thanks Frank. Great question.
  
  You will need to collect historical data to develop and evaluate your model.
  
  Once created, it sounds like you will need to wait 30 days before you can evaluate the ongoing performance of the model’s predictions.
  
  Reply
Ann November 17, 2016 at 8:29 pm #

Hi Jason,
Have done a program to classify if a customer(client) will subscribe for term deposit or not..
dataset used: bank dataset from uci machine learning repository
algorithm used: 1. random forest algorithm with CART to generate decision trees and 2.random forest algorithm with HAC4.5 to generate decision trees.

my question is how do i determine the accuracy of 1 and 2 and find the best one???

am really new to this field..please ignore my stupidity
thanks in advance

Reply
- Jason Brownlee November 18, 2016 at 8:21 am #
  
  Hi Ann, great work!
  
  You can compare each algorithm using a consistent testing methodology. For example k-fold cross validation with the same random number seeds (so each algorithm gets the same folds).
  
  Here is more info on comparing algorithms:
  https://machinelearningmastery.com/how-to-evaluate-machine-learning-algorithms/
  
  I hope that helps as a start.
  
  Reply
Nihad Almahrooq December 1, 2016 at 6:17 pm #

Hi Jason, greater work you are making I wish you the best you deserving it.

My question: I want to use ML to solve problems of network infrastructure data information. You know missing, typo, discrepancy. Fundamentals in knowledge and expertise are essential though need some ML direction and research more. Can you provide or shed light off that? And how? If you prefer we can communicate directly at nkmahrooq@hotmail.com

Thanks and please forgive me if the approach seems awkward as startup and recently joint your connections it’s may be rushing!

Reply
- Jason Brownlee December 2, 2016 at 8:14 am #
  
  Hi Nihad, that is an interesting application.
  
  Machine learning might not be the best approach for fixing typos and such. Nevertheless, the first step would be to collect a dataset and try to deeply understand the types of examples the algorithm would have to learn.
  
  This post might help you dive deeper into your problem:
  https://machinelearningmastery.com/how-to-define-your-machine-learning-problem/
  
  I hope this helps as a start, best of luck.
  
  Reply
- the Casebook Web Application built for lawyers,paralegals & law students. January 1, 2022 at 6:57 am #
  
  Thanks for the tutorial , have been implementing your machine learning master to law on the Casebook Web Application built for lawyers,paralegals & law students.
  
  Reply
  - James Carmichael January 1, 2022 at 12:03 pm #
    
    You are very welcome Nihad! Thank you for letting me know about your application and how you have made use of our materials!
    
    Regards,
    
    Reply
Nischay December 24, 2016 at 8:11 am #

Splendid work! A helpful measure for my semester exams. Thanks!!

Reply
- Jason Brownlee December 26, 2016 at 7:37 am #
  
  Thanks Nischay.
  
  Reply
Sam January 1, 2017 at 4:11 am #

hello Jason, greater work you are making I wish you the best you deserving it.
I want to find an online algorithm to cluster scientific workflow data to minimize run time and system overhead so it can map these workflow tasks to a distributed resources like clouds .The clustered data should be mapped to these available resources in a balanced way that guarantees no resource is over utilized while other resource is idle.

I came a cross a horizontal clustering ,vertical clustering but these technique are static and user should determine the number of clusters and number of tasks in each cluster in advance …

Reply
- Jason Brownlee January 1, 2017 at 5:25 am #
  
  Hi Sam,
  
  Thanks for your support.
  
  Off-the-cuff, this sounds like a dynamic programming or constraint satisfaction problem rather than machine learning.
  
  Reply
Marcus January 6, 2017 at 6:55 am #

Hi Jason, this post is really helpful for my Cognitive Neural Network revision!

I have a question of a historical nature, relating to how supervised learning algorithms evolved:
Some early supervised learning methods allowed the threshold to be adjusted during learning. Why is that not necessary with the newer supervised learning algorithms?

Is this because they (e.g. the Delta Rule) adjust the weights on a running basis to minimize error, which supersedes the need for threshold adjustment? Or is there something more subtle going on in the newer algorithms that eliminates the need for threshold adjustment? Thank you in advance for any insight you can provide on this.

Reply
- Jason Brownlee January 6, 2017 at 9:14 am #
  
  I don’t think I have enough context Marcus. It sounds like you may be referring specifically to stochastic gradient descent.
  
  I’m not really an algorithm historian, I’d refer you to the seminal papers on the topic.
  
  Reply
David Lehmann February 17, 2017 at 3:52 am #

Hi Jason – Thanks so much for the informative post. I think I am missing something basic. Once a model is trained with labeled data (supervised), how does additional unlabeled data help improve the model? For example, how do newly uploaded pictures (presumably unlabeled) to Google Photos help further improve the model (assuming it does so)? Or how does new voice data (again unlabeled) help make a machine learning-based voice recognition system better? i understand conceptually how labeled data could drive a model but unclear how it helps if you don’t really know what the data represents.

Thanks! Dave

Reply
- Jason Brownlee February 17, 2017 at 10:01 am #
  
  Great question Dave.
  
  Generally, we can use unlabelled data to help initialize large models, like deep neural networks.
  
  More specifically, we can label unlabelled data, have it corroborate the prediction if needed, and use that as input to update or retrain a model to make be better for future predictions.
  
  Does that help?
  
  Reply
  - Dave Lehmann February 18, 2017 at 2:50 am #
    
    yes thanks. So the data ultimately needs to be labeled to be useful in improving the model? Keeping with the Google Photos use case, all the millions of photos uploaded everyday then doesn’t help the model unless someone manually labels them and then runs those through the training? Guess I was hoping there was some way intelligence could be discerned from the unlabeled data (unsupervised) to improve on the original model but that does not appear to be the case right? thanks again for the help – Dave
    
    Reply
    - Jason Brownlee February 18, 2017 at 8:43 am #
      
      There very well may be, I’m just not across it.
      
      Reply
      - Amit Mukherjee July 12, 2018 at 5:45 pm #
        
        For a business which uses machine learning, would it be correct to think that there are employees who manually label unlabeled data to overcome the problem raised by Dave? The amount of unlabeled data in such cases would be much smaller than all the photos in Google Photos.
      - Jason Brownlee July 13, 2018 at 7:35 am #
        
        It is a good approach, e.g. to use local or remote labor to prepare/label a first-cut dataset.
Rohit Thakur March 20, 2017 at 10:48 pm #

Can you write a blog post on Reinforcement Learning explaining how does it work, in context of Robotics ?

Reply
- Jason Brownlee March 21, 2017 at 8:40 am #
  
  I hope to cover the topic in the future Rohit.
  
  Reply
Hansa April 12, 2017 at 8:05 pm #

Hi Jason,

I am trying to solve machine learning problem for Incidents in Health & safety industry.
I want to recommend the corrective or preventive actions based on the Incident happening at given site.
I am trying to understand which algorithm works best for this.
Could you please share your thoughts.

Regards,
Hansa

Reply
- Jason Brownlee April 13, 2017 at 9:59 am #
  
  This framework may help you frame your problem:
  https://machinelearningmastery.com/how-to-define-your-machine-learning-problem/
  
  This process will help you work through it:
  https://machinelearningmastery.com/start-here/#process
  
  Reply
Aigerim April 18, 2017 at 11:08 pm #

I need help in solving a problem. I am writing thesis about Unsupervised Learning of Morphology of Turkish language. It is my first thesis about this area. My question is this: I have to write math model of morphology and I am trying to understand which algorithm works best for this. Could you please give me same important information. Please help me understand!

Reply
- Jason Brownlee April 19, 2017 at 7:53 am #
  
  You must answer this question empirically.
  
  See this post:
  https://machinelearningmastery.com/a-data-driven-approach-to-machine-learning/
  
  Reply
lilya April 23, 2017 at 7:06 pm #

Hi Jason,

please I need help in solving my problem which is : i want to do supervised clustering of regions ( classify regions having as response variable : frequence of accidents ( numeric response) and explanatory variables like : density of population , density of the trafic) i want to do this using Random forest is it possible ?

Reply
- Jason Brownlee April 24, 2017 at 5:34 am #
  
  I do not have clustering examples sorry.
  
  Reply
Nuwan C April 27, 2017 at 12:14 pm #

Hi Json,
Thnc for the article and it is wonderful help for a beginner and I have a little clarification about the categorization.

I saw some articles devide supervice learning and unsupervise and reinforcement.

Is that same meaning of semi supervising and reinforcement gives?

Reply
- Jason Brownlee April 28, 2017 at 7:28 am #
  
  No, reinforcement learning is something different again.
  
  See more here:
  https://en.wikipedia.org/wiki/Reinforcement_learning
  
  Reply
violet May 5, 2017 at 3:09 pm #

Good one! Thanks a lot. Jason, you did great!It was so simplified. But I will love to have an insight as simplified as this on Linear regression algorithm in supervised machine. Thanks once more

Reply
- Jason Brownlee May 6, 2017 at 7:35 am #
  
  I’m glad to hear that.
  
  Here is a simplified description of linear regression and other algorithms:
  https://machinelearningmastery.com/start-here/#algorithms
  
  Reply
Anubhav May 23, 2017 at 8:00 pm #

Good one! I an novice to ML. So Timeseries based predictive model will fall under which category Supervised, Unsupervised or Sem-supervised? and why?

Reply
- Jason Brownlee May 24, 2017 at 4:54 am #
  
  Time series forecasting is supervised learning.
  
  Reply
  - Fred October 31, 2018 at 5:26 am #
    
    What are 10 difficulties or problems faced anyone want to get data mining about in this topic “Prediction of Portuguese students’ performance on mathematics class in high schools”?
    
    Reply
    - Jason Brownlee October 31, 2018 at 6:31 am #
      
      Sounds like a homework question, I recommend thinking through it yourself Fred.
      
      Reply
Balaji June 20, 2017 at 2:36 am #

HI Jasan
simple and easy to understand contents.
I am an ML enthusiast looking for material that groups important and most used algorithms in to supervised and unsupervised.

e.g
Supervised – Regression, Classification, Decision tree etc..
Unsupervised – Cluster, etc..

do you have ?

Reply
- Jason Brownlee June 20, 2017 at 6:40 am #
  
  This might help:
  https://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/
  
  Reply
- alex March 28, 2019 at 11:59 pm #
  
  You could look at this video about unsupervised learning. It shows some examples were unsupervised learning is typically used. https://www.youtube.com/watch?v=YulpnydYxg8
  
  Reply
Eashan Roy July 1, 2017 at 5:18 am #

Given data on how 1000 medical patients respond to an experiment drug( such as effectiveness of treatment, side effects) discover whether there are different categories or types of patients in terms of how they respond to the drug and if so what these categories are.

Is this supervised or unsupervised learning ?

Reply
- Jason Brownlee July 1, 2017 at 6:38 am #
  
  Sounds like unsupervised to me.
  
  Reply
  - David July 3, 2017 at 5:37 pm #
    
    I have over 1million sample input queries.. I want to classify into genuine or malicious query.. Every query consist of keywords but there are some specific keywords that may help identify malicious query or not. However not every of the possible malicious keyword may consider the whole query malicious… I’m not sure how to present my problem here but Let me ask this first… Is it possible to have 2 levels of classification(supervised) and 1 level of clustering(unsupervised) in solving a problem like this..?
    
    Reply
    - Jason Brownlee July 6, 2017 at 9:59 am #
      
      You need a high-quality training dataset first.
      
      Then this process may help:
      https://machinelearningmastery.com/start-here/#process
      
      Reply
  - Farah July 26, 2020 at 4:24 am #
    
    can we use k means and random forest algorithm for detection of phishing websites for thesis using weka??? kindly reply as soon as possible
    
    Reply
    - Jason Brownlee July 26, 2020 at 6:25 am #
      
      I recommend testing a suite of different algorithm and discover what works best for your specific dataset.
      
      Reply
Sana August 2, 2017 at 12:22 am #

Thanks Jason it is really helpful me in my semester exam

Reply
- Jason Brownlee August 2, 2017 at 7:54 am #
  
  I’m glad to hear that.
  
  Reply
Blessing August 12, 2017 at 11:59 am #

Hi Jason, thank you for the post. I have a question. Does an unsupervised algorithm search for a final hypothesis and if so, what is the hypothesis used for. Are target functions involved in unsupervised learning? What does an unsupervised algorithm actually do?
I understand supervised learning as an approach where training data is fed into an algorithm to learn the hypothesis that estimates the target function. However, for an unsupervised learning, for example, clustering, what does the clustering algorithm actually do? what does “concept learning” mean when it comes to unsupervised machine learning? I noticed that most books define concept learning with respect to supervised learning. Thank you

Reply
- Jason Brownlee August 13, 2017 at 9:45 am #
  
  I don’t like unsupervised methods in general – I don’t find their results objective – I don’t think they are falsifiable therefore I can’t judge if they’re useful.
  
  They work by applying a methodology/process to data to get an outcome, then it is up to the practitioner to interpret the results – hopefully objectively.
  
  You’ll notice that I don’t cover unsupervised learning algorithms on my blog – this is the reason.
  
  Reply
abhi August 30, 2017 at 2:25 pm #

Hi Jason,

I am following your Tutorials from Last couple of weeks. Thanks for such awesome Tutorials for beginners.

I have one problem for which I want to use ML algorithm. I tried Cats and Dogs for small dataset and I can predict correct output with Binary Cross entropy.

Now To apply to my own dataset problem I want to classify images as Weather they are Cat or Dog or any other(if I provide Lion image). But all I get is only 0 & 1 for cat and dog class.

Model.predict should give me different output if image is not cat or dog.
Also , How Can I get % prediction that says. Yes this image is quite similar to cat/dot with test result accuracy as 80% or more. If I provide mountain/lion image then it should give me output as it is 10% or less than 50% so I can say it is not cat or dog but something other??

Reply
- Jason Brownlee August 30, 2017 at 4:19 pm #
  
  You will need to change your model from a binary classification model to a multiclass classification model.
  
  See this model as an example:
  https://machinelearningmastery.com/multi-class-classification-tutorial-keras-deep-learning-library/
  
  Reply
Keval September 15, 2017 at 12:28 am #

I am wondering where does a scoring model fit into this structure? I am trying to define my problem as an ML problem, however, I do not have any labeled data as I am just starting to work with the data. The output variable in my case is a score that is calculated based on select features from the dataset. How would you classify this problem and what techniques would you suggest exploring?

Reply
- Jason Brownlee September 15, 2017 at 12:14 pm #
  
  This post will help you define your predictive modeling problem:
  https://machinelearningmastery.com/how-to-define-your-machine-learning-problem/
  
  Reply
Ares September 23, 2017 at 3:02 pm #

Hii Jason .. Thank you for the post… I am new to Machine Learning…How should i start with Machine learning.. Should i study all the concepts first or should i code algorithms which i study simultaneously ??? Thanks

Reply
- Jason Brownlee September 24, 2017 at 5:12 am #
  
  My best advice for getting started is here:
  https://machinelearningmastery.com/start-here/#getstarted
  
  It is not for everyone, but seems to work well for developers that learn by doing.
  
  Reply
Ahmed Fathy September 27, 2017 at 3:23 am #

i have some of images about mango diseases. i want to make segmentation, feature extraction, classification … what is the best and common algorithms for this issue ??

Reply
- Jason Brownlee September 27, 2017 at 5:48 am #
  
  Perhaps you can use feature selection methods to find out:
  https://machinelearningmastery.com/an-introduction-to-feature-selection/
  
  Reply
scott October 13, 2017 at 9:51 am #

Hey there, Jason – Good high-level info. Truthfully, I found the grammar and spelling errors distracting. They make software for that. 😉

Reply
- Jason Brownlee October 13, 2017 at 2:54 pm #
  
  Thanks for the feedback Scott.
  
  Reply
SANDEEP S KUMAR November 4, 2017 at 12:14 pm #

what are the examples of semi supervised learning algorithms

Reply
Aditi Kadam November 6, 2017 at 5:57 am #

Thank you! It was a great explanation

Reply
- Jason Brownlee November 7, 2017 at 9:43 am #
  
  Thanks, I’m glad it helped.
  
  Reply
Vinu Nair December 7, 2017 at 5:07 pm #

Hi Jason,
Good work.Could you please help me to find a algorithm for below mentioned problem .
We have number of record groups which have been grouped manually . We needs to automate these grouping by analysis on this history data.

Reply
- Jason Brownlee December 8, 2017 at 5:35 am #
  
  This post will help you frame your data as a predictive modeling problem:
  https://machinelearningmastery.com/how-to-define-your-machine-learning-problem/
  
  Reply
Vinu Nair December 7, 2017 at 10:12 pm #

Could you please share some algorithm for finding matching patterns

Reply
- Jason Brownlee December 8, 2017 at 5:38 am #
  
  Thanks for the suggestion.
  
  Reply
Haneen December 20, 2017 at 9:52 pm #

you are amazing, Thank you so match

Reply
- Jason Brownlee December 21, 2017 at 5:25 am #
  
  You’re welcome.
  
  Reply
Ali December 21, 2017 at 10:44 pm #

What is supervised and unsupervised learning? Which learning techniques could be better in particular machine learning domain? Which technique has limitations and why?

Reply
- Jason Brownlee December 22, 2017 at 5:32 am #
  
  Did this post help explain the difference?
  
  Reply
Krishna December 27, 2017 at 3:31 pm #

Very helpful to understand what is supervised and unsupervised learning. Its very better when you explain with real time applications lucidly.

Reply
- Jason Brownlee December 28, 2017 at 5:18 am #
  
  Thanks.
  
  Reply
Pulkit Aggarwal December 28, 2017 at 11:00 pm #

Hello, Sir Jason I’m new to Machine Learning and want to learn it from the scratch.Please guide me to do so.

Reply
- Jason Brownlee December 29, 2017 at 5:23 am #
  
  You can start here:
  https://machinelearningmastery.com/start-here/
  
  Reply
MUHAMMAD OMER January 12, 2018 at 10:16 pm #

hello,
What kind of data we use reinforcement learning?
guide me.

Reply
- Jason Brownlee January 13, 2018 at 5:32 am #
  
  I hope to cover RL in detail this year.
  
  Reply
Ritesh K. Patel (PhD) January 13, 2018 at 6:25 am #

SPEECHLESS LEARNING, MACHINE LEARNING EXPLANATIONS ARE SO EASYILY COVERED, EVEN A HISTORY PROFESSOR CAN USE IT. THANKING YOU FOR YOUR TIME AND CONSIDERATION. DR. RITESH PATEL GTU MBA SECTION HEAD GUJARAT TECHNOLOGICAL UNIVERSITY AHMEDABAD 9909944890 CUG PERSONAL 9687100199 AP_CGS@GTU.EDU.IN

Reply
- Jason Brownlee January 13, 2018 at 7:49 am #
  
  Thanks.
  
  Reply
Glad January 18, 2018 at 8:23 am #

Nice one, but I need more explanation on unsupervised learning please

Reply
- Jason Brownlee January 18, 2018 at 10:15 am #
  
  What questions do you have about unsupervised learning exactly?
  
  Reply
Glad January 22, 2018 at 6:58 am #

Examples of unsupervised machine learning

Reply
- Jason Brownlee January 23, 2018 at 7:47 am #
  
  Thanks for the suggestion.
  
  Reply
Fernando January 24, 2018 at 9:35 pm #

Hi Jason,

My problem is related to NLP and sentiment analysis.

I have a dataset with a few columns. One of them is a free text and another one is a sentiment score, from 1 (negative) to 10 (positive).

I’m trying to apply a sentiment analysis to the text field and see how well it works comparing with the sentiment score field. For this purpose, I’ve run some off-the-self sentiment analysis tools, such as Polyglot, but they didn’t work very well. That’s why I’ve decided to address this as a classification problem (negative, neutral or positive).

In order to do this, I’ve got 1, 2 and 3-grams and I’ve used them as features to train my model. I tried with SVM and also getting the most representative grams for each of these classes using z-score, but the results were worst than with Polyglot.

Any suggestion?

Thanks!

Reply
- Jason Brownlee January 25, 2018 at 5:54 am #
  
  This tutorials will get you started:
  https://machinelearningmastery.com/develop-word-embedding-model-predicting-movie-review-sentiment/
  
  Reply
Satyam January 25, 2018 at 4:09 am #

What are some widely used Python libraries for Supervised Learning?

Reply
- Jason Brownlee January 25, 2018 at 5:58 am #
  
  scikit-learn.
  
  Reply
Mayur February 1, 2018 at 12:11 am #

What will be the best algorithm to use for a Prediction insurance claim project?

Reply
- Jason Brownlee February 1, 2018 at 7:23 am #
  
  Try this process:
  https://machinelearningmastery.com/start-here/#process
  
  Reply
Sri February 7, 2018 at 1:13 pm #

Hi Jason,

you are awesome. Sorry if my question is meaningless. In simple what is relation between Big Data, Machine Learning, R, Python, Spark, Scala and Data Science?

Thanks,
Sri.

Reply
- Jason Brownlee February 8, 2018 at 8:21 am #
  
  You can probably look up definitions of those terms. Why are you asking exactly?
  
  Reply
Chibuzor February 9, 2018 at 2:43 am #

Hello Jason,
That was a good one, keep it up,
Please, what is your advised for a corporation that wants to use machine learning for archiving big data, developing AI that will help detect accurately similar interpretation and transform same into a software program.
Secondly, Beside these two areas, are there other areas you think AI will be helpful for industrialists. Let me know you take.

Chibuzor

Reply
- Jason Brownlee February 9, 2018 at 9:13 am #
  
  I’m not sure how these methods could help with archiving.
  
  Perhaps this post will help you define your problem as a supervised learning problem:
  https://machinelearningmastery.com/how-to-define-your-machine-learning-problem/
  
  Reply
Grant Morgan February 10, 2018 at 7:10 pm #

Hi,
Interesting read.

Do you have a suggestion for where for a given input (image) choosing a particular point p gives a reward r. the goal is to maximize r. There may me multiple points that return the same maximum r value, so I don’t see standard a cnn training methods working. It does not matter which one is returned the reward is the same. Each trial is separate so reinforcement learning does not seem correct.

Reply
- Jason Brownlee February 11, 2018 at 7:55 am #
  
  Sounds like a multimodal optimization problem. If you only need one result, one of a range of stochastic optimization algorithms can be used.
  
  If you need all points, then a multimodal optimizaiton could be used, like a niching genetic algorithm (I did my masters on these).
  
  Reply
Bilal Khan February 12, 2018 at 6:11 am #

Very Helping Material i was preparing for my exams and i have completely understood the whole concept it was very smoothly explained JAZAKALLA (Means May GOD give you HIS blessing )

Reply
- Jason Brownlee February 12, 2018 at 8:34 am #
  
  I’m glad it helped.
  
  Reply
Marwa February 20, 2018 at 4:59 pm #

you can give me an explanation about the classes of unsupervised methods: by block, by pixel, by region which used in the segmentation.

Reply
- Jason Brownlee February 21, 2018 at 6:37 am #
  
  Sorry, I don’t follow. Perhaps you can provide more context?
  
  Reply
nikhil February 26, 2018 at 4:03 pm #

sir, can you tell real time example on supervised,unsupervised,semisupervised

Reply
- Jason Brownlee February 27, 2018 at 6:24 am #
  
  Linear regression is supervised, clustering is unsupervised, autoencoders can be used in an semisupervised manner.
  
  Reply
Shivani March 12, 2018 at 4:06 pm #

Sir, thank u for such a great information.
But how can we use unsupervised learning for any type of clustering?

Reply
- Jason Brownlee March 13, 2018 at 6:23 am #
  
  Sorry, I don’t have material on clustering. I may cover it in the future.
  
  Reply
Kristy March 13, 2018 at 4:06 am #

Thanks for posting this. This is a great summary! Very straightforward explanations.

Reply
- Jason Brownlee March 13, 2018 at 6:32 am #
  
  I’m glad it helped.
  
  Reply
Charalampos March 14, 2018 at 1:30 am #

First of all very nice and helpfull report, and then my question.

I have an unsupervised dataset with people and i want to find some paterns about their behaviour for future marketing. I am using clustering algorythms but then if i want to train a model for future predictions (for a new entry in the dataset, or for a new transaction of an already registered person in the dataset) should i use these clusters as classes to train the model as supervised classification? Or how can i do this? i am confused.

Thank you in advance!!

Reply
- Jason Brownlee March 14, 2018 at 6:28 am #
  
  Perhaps start with a clear idea of the outcomes you require and work backwards:
  https://machinelearningmastery.com/how-to-define-your-machine-learning-problem/
  
  Reply
  - Charalampos March 14, 2018 at 7:14 pm #
    
    Thank you for your reply, but this couldnt help me too much..
    
    Some people, after a clustering method in a unsupervised model ex. k-means use the k-means prediction to predict the cluster that a new entry belong. But some other after finding the clusters, train a new classifier ex. as the problem is now supervised with the clusters as classes, And use this classifier to predict the class or the cluster of the new entry. I cant understand the difference bettween these two methods. I dont know if you understand my point but i would appreciate if you try to explain it to me..
    
    Reply
    - Jason Brownlee March 15, 2018 at 6:27 am #
      
      Sorry, I don’t have material on clustering, I cannot give you good advice.
      
      Reply
pavan March 22, 2018 at 12:08 am #

Thank You for the giving better explanation.

Reply
- Jason Brownlee March 22, 2018 at 6:24 am #
  
  I’m glad it helped.
  
  Reply
Emran March 27, 2018 at 5:23 am #

given that some students information such as(Name,Address,GPA-1,GPA-2, and Grade),,,,my job is to “divide students based on their grade”…..so my question is the this job is supervise or unsupervised learning? and which Machine learning algorithm is perfect to do this job…

I thing it will be Unsupervised learning but i am confused about what algorithm perfect for this job….(is it clustering)… am i right sir?

Reply
- Jason Brownlee March 27, 2018 at 6:42 am #
  
  This post might help you determine whether it is a supervised learning problem:
  https://machinelearningmastery.com/how-to-define-your-machine-learning-problem/
  
  Reply
Harathi April 6, 2018 at 8:41 am #

Hi Jason,

I have documents with handwritten and machine printed texts. I want to localize the text in the document and find whether the text is handwritten or machine printed. If the text is handwritten, i have to give it to a handwritting recognition algorithm or if it is machine printed, I have to give it to tesseract ocr algorithm.

Can you please suggest me how to do text localization and find whether the text is handwritten or machine printed..

Thanks in advance,
Harathi

Reply
- Jason Brownlee April 6, 2018 at 3:47 pm #
  
  I would recommend looking into computer vision methods. I do not cover this area sorry.
  
  Reply
Saqi April 12, 2018 at 8:25 pm #

hi, im new to machine learning im struck in the machine learning in training the data please help me with this, like Create a Keras neural network for anomaly detection,please can you fix the error i have tried several times no idea what is the problem

stuck at task 3
check in gist url
features = train_both[:,:-1]
labels = train_both[:,:-1]

ths gist url: https://gist.github.com/dcbeafda57395f1914d2aa5b62b08154

Reply
- Jason Brownlee April 13, 2018 at 6:39 am #
  
  I’m eager to help, but I don’t have the capacity to debug your code for you.
  
  Perhaps post on stackoverflow?
  
  Reply
Anfell May 23, 2018 at 3:00 am #

Hi Jason, nice post btw.
I was wondering what’s the difference and advantage/disadvantage of different Neural Network supervised learning methods like Hebb Rule, Perceptron, Delta Rule, Backpropagation, etc and what problems are best used for each of them.

Reply
- Jason Brownlee May 23, 2018 at 6:31 am #
  
  We do not have a mapping of problems to algorithms in machine learning. The best we can do is empirically evaluate algorithms on a specific dataset to discover what works well/best.
  
  Reply
Moti June 4, 2018 at 3:14 am #

I need a brief description in machine learning and how it is applied. Where and when it were required?

Reply
- Jason Brownlee June 4, 2018 at 6:33 am #
  
  This might help:
  https://machinelearningmastery.com/what-is-machine-learning/
  
  Reply
Shreya Gupta June 6, 2018 at 11:53 pm #

Amazing post.. Actual complete definitions are provided.. Thanks for it 🙂

Reply
- Jason Brownlee June 7, 2018 at 6:30 am #
  
  I’m glad it helped.
  
  Reply
Nora July 3, 2018 at 8:27 am #

Hi sir
Thank you advance for your article, it’s very nice and helpful
Iam new in machine learning and i would like to understand what is mean deep learning? Second, distance supervise wether like semisuperviser or not?

Thanks advance

Reply
- Jason Brownlee July 4, 2018 at 8:17 am #
  
  This post explains more about deep learning:
  https://machinelearningmastery.com/what-is-deep-learning/
  
  Reply
Vinay July 9, 2018 at 5:41 pm #

Hello Jason Brownlee,

I was working on a health research project which would detect snore or not from input wav file. Can you please suggest which one i would prefer Supervised learning or Unsupervised learning or Semi-Supervised learning. i’m a iOS Developer and new to ML. Where do i start from?
Your advise will help a lot in my project.

Thanks in Advance!

Reply
- Jason Brownlee July 10, 2018 at 6:43 am #
  
  Supervised.
  
  Start by defining the problem:
  https://machinelearningmastery.com/how-to-define-your-machine-learning-problem/
  
  Reply
diwakar reddy July 11, 2018 at 3:32 pm #

could you explain semi supervised machine learning a bit more with examples.

Reply
- Jason Brownlee July 12, 2018 at 6:23 am #
  
  Thanks for the suggestion. This might be a good place to start:
  https://en.wikipedia.org/wiki/Semi-supervised_learning
  
  Reply
yzdu August 11, 2018 at 9:18 pm #

Dear prof Brownlee:

From my understanding, method based on unsupervised leaning(no labels required) can’t compare with those based on supervised leaning(labels required) since their comparison premise is different. I f one wants to compare them, one should put them under the same problem scenarios,only this way, comparison is reasonable and fair,isn’i it? but provided that the problem scenarios are applictions without labels, they can’t compare with each other since supervised leaning methods need lables to train models,but now there are no labels to be trained, therefore I think it is unreasonable and infeasible to compare method based on unsupervised leaning with those based on supervised leaning,is it right? I want to know your views, thank you!

Reply
- Jason Brownlee August 12, 2018 at 6:32 am #
  
  Yes, they are not comparable. They solve different problems.
  
  Reply
Navdeep Kapur August 22, 2018 at 2:20 am #

Hi Jason,

Your article was very informative and cleared lot of my concepts. I have lot of questions in my mind about Machine Learning. Is it possible you can guide me over Skype call and I am ready to pay.

Reply
Lalam Rajesh September 27, 2018 at 3:59 pm #

Why association rules are part of unsupervised learning?

Reply
- Jason Brownlee September 28, 2018 at 6:05 am #
  
  There is no training/teaching component, the rules are extracted from the data.
  
  Reply
Fasih Ahmed September 27, 2018 at 8:29 pm #

Hello Jason,

Great explanation,
i have a question , I am doing ML in JAVA ,can you suggest me how can i choose best algorithm for my data?
as i am using numeric data (Temperature sensor) which method is best supervised or unsupervised ?
Hope u got my point

Reply
- Jason Brownlee September 28, 2018 at 6:11 am #
  
  I recommend this framework:
  https://machinelearningmastery.com/start-here/#process
  
  Reply
Noel gipson October 10, 2018 at 5:20 pm #

Hello, I am Noel, I am new to machine learning with less experience. I want to make a machine learning model to predict the possibility of any attack or abnormal events/behavior to my system. the model should classify the situation based on the security level of it and give me the predictable cause and solution. What to do on this guys

Reply
- Jason Brownlee October 11, 2018 at 7:49 am #
  
  I recommend following this process for a new project:
  https://machinelearningmastery.com/start-here/#process
  
  Reply
Akshay October 21, 2018 at 10:57 pm #

I’m thankful to you for such a nice article!
I would love to follow you and your articles further.

Reply
- Jason Brownlee October 22, 2018 at 6:20 am #
  
  Thanks.
  
  Reply
Miled Basma Bentaiba October 27, 2018 at 5:13 am #

I never understood what the semi-supervised machine learning is, until I read your post. The issue was whether we can have new labels after processing or we are based only on the first given labels. The example you gave made it all clear. So, the answer is, we don’t have all the labels, that’s why we join unlabeled data.

Thank you for your great posts!

Reply
- Jason Brownlee October 27, 2018 at 6:04 am #
  
  Thanks.
  
  Reply
Kate Weeks November 7, 2018 at 10:28 am #

Hey Jason,

Love your books and articles. Any chance you’ll give us a tutorial on K-Means clustering in the near future?

-Kate

Reply
- Jason Brownlee November 7, 2018 at 2:47 pm #
  
  Thanks for the suggestion Kate.
  
  Reply
Navya Mandava December 3, 2018 at 10:51 pm #

Hi Jason ,

Thanks for clarifying my dough’s between supervised and unsupervised machine learning. But one more dough’s , how can i justify or apply the correct algorithm for particular problem . Is their any easy way to find out best algorithm for problem we get. Could you please let me know ?

Reply
- Jason Brownlee December 4, 2018 at 6:02 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/what-algorithm-config-should-i-use
  
  Reply
Aditya Rana December 6, 2018 at 9:13 am #

Thank you so much for all the time you put in for educating and replying to fellow learners. Thanks for being such an inspiration.

Reply
- Jason Brownlee December 6, 2018 at 1:41 pm #
  
  Thanks, I’m just trying to be useful.
  
  Reply
  - iram December 9, 2018 at 1:35 am #
    
    which learning techniques could be better in particular machine learning domain?
    
    Reply
    - Jason Brownlee December 9, 2018 at 5:33 am #
      
      Good question, perhaps this will help:
      https://machinelearningmastery.com/faq/single-faq/what-algorithm-config-should-i-use
      
      Reply
Muneeb December 16, 2018 at 2:24 am #

The correct classes of training data are called supervisied r unsupervised

Reply
- Jason Brownlee December 16, 2018 at 5:24 am #
  
  Predicting the class is a supervised problem.
  
  Reply
Taniya December 20, 2018 at 11:42 am #

Hi Jason, thanks for this post.
I have a query regarding maximization of benefits and overcome the limitations from different types of regression algorithms in one system. Is it possible to create a data model such that I have ‘ONE’ data repository and 2 machine learning algorithms, say Logistic regression and Random Forest? The data repository is getting populated every minute (like in an information system) but after a span of 15 minutes, it is processed via Logistic Regression, and after the next 15 minutes, it is processed via Random Forest, and so on. My questions would be:
1. Is it possible to create such a system?
2. If yes, would this allow to gain benefits of both algorithms? If no, is there any alternative way to achieve this?

Reply
- Jason Brownlee December 20, 2018 at 1:59 pm #
  
  Sure, I don’t see why not. The question is why would you want to do this?
  
  Reply
  - Taniya December 21, 2018 at 12:53 am #
    
    Well, I wanted to know if that can be regarded as an extension to ensemble modelling.
    
    I think some data critical applications, including IoT communication (let’s say, the domain of signal estimation for 5G, vehicle to vehicle communication) and information systems can make use of a cross check with multiple data models. In this way, the deficiencies of one model can be overcome by the other. Of course it would not be a memory/ hardware efficient solution, but just saying.
    
    If you have seen anything like this, a system where more than one data models are being used in one place, I would really appreciate you sharing it, thanks.
    
    Reply
    - Jason Brownlee December 21, 2018 at 5:30 am #
      
      In an ensemble, the output of two methods would be combined in some way in order to make a prediction.
      
      Reply
btt January 11, 2019 at 2:57 am #

Hello, great job explaining all kind of MLA. but I am confused on where we can put the SVM in the Algorithms Mind Map?

Thanks!

Reply
- Jason Brownlee January 11, 2019 at 7:53 am #
  
  Perhaps under instance based methods?
  
  Reply
Gaurav Khanna February 27, 2019 at 11:58 pm #

I have learned up to machine learning algorithms,
now what is the next step to learn,i.e. which technology should i learn first
e.g. deep learning,opencv,NLP,neural network,or image detection.
plz tell me step by step which one is interlinked and what should learn first.

thanks

Reply
- Jason Brownlee February 28, 2019 at 6:39 am #
  
  Perhaps select a topic that most interests you or a topic that you can apply immediately:
  https://machinelearningmastery.com/start-here/
  
  Reply
Simi March 20, 2019 at 10:58 am #

Hello,
I looked through your post because I have to use the Findex dataset from World Bank to get some information for my thesis on the factors influencing financial and digital inclusion of women. I’m thinking of using K-clustering for this project. I would like to get your input on this.

Thank you

Reply
- Jason Brownlee March 20, 2019 at 2:03 pm #
  
  It really depends on the goals of your project.
  
  Perhaps this framework will help:
  https://machinelearningmastery.com/how-to-define-your-machine-learning-problem/
  
  Reply
Ryan March 27, 2019 at 4:17 am #

Hello Jason

First of all thank you for the post. I’m working on a subject about identifying fake profiles on some social networks, the data that i have is unlabeled so i’m using unsupervised learning, but i need to do also a supervised learning. So my question is: can i label my data using the unsupervised learning at first so I can easily use it for supervised learning??

Reply
- Jason Brownlee March 27, 2019 at 9:06 am #
  
  Unsupervised learning can propose clusters, but you must still label data using an expert.
  
  Reply
Sharanya April 17, 2019 at 5:24 pm #

Hi Jason, the information you provided was really helpful. I have a question, which machine learning algorithm is best suited for forensics investigation?

Reply
- Jason Brownlee April 18, 2019 at 8:21 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/what-algorithm-config-should-i-use
  
  Reply
randhir prasad singh May 30, 2019 at 2:35 am #

Dear Jason,

its been mentioned above that Supervised: ‘All data is labeled’.But its not mentioned that what does it mean that data is labeled or not?
if one get this kind of query while going through purchased e book, is there any support provided???
Note: For now I assume that labeled data mean for certain input X , output is /should be Y.

Regards,
Randhir

Reply
- Jason Brownlee May 30, 2019 at 9:04 am #
  
  A label might be a class or it might be a target quantity.
  
  Reply
akmal June 23, 2019 at 11:20 pm #

Hi Jason,

do you have any algorithm example for supervised learning and unsupervised learning?

thank you

Reply
- Jason Brownlee June 24, 2019 at 6:32 am #
  
  I have many hundreds of examples, perhaps start here:
  https://machinelearningmastery.com/start-here/
  
  Reply
Kellan July 3, 2019 at 2:56 pm #

Hi Jason, thanks for this great post. Do supervised methods use any unlabeled data at all? Or is the performance of the model evaluated on the basis of its classification (for categorical data) of the test data only? I am working on a project where I want to compare the performance of several supervised methods (SVMs, logistic regression, ensemble methods, random forests, and nearest neighbors) and one semi-supervised method (naive Bayes) in identifying a rare outcome, and I have about 2 million labeled records (split between training and test sets) and 200 million unlabeled records.

Reply
- Jason Brownlee July 4, 2019 at 7:38 am #
  
  Supervised learning models are evaluated on unseen data where we know the output.
  
  Reply
shiva manhar July 8, 2019 at 3:38 pm #

Thanks for this amazing post. I have read your many post.

Reply
- Jason Brownlee July 9, 2019 at 8:04 am #
  
  Thanks, I’m glad it helped.
  
  Reply
  - Usman Bukar Usman July 13, 2019 at 4:27 pm #
    
    Thanks for the interested post, is great contribution on machine learning domain God bless you
    
    Reply
    - Jason Brownlee July 14, 2019 at 8:05 am #
      
      Thanks.
      
      Reply
George July 30, 2019 at 8:32 pm #

Hi Jason,
With unlabelled data, if we do kmeans and find the labels, now the data got labels, can we proceed to do supervised learning.
Thanks

Reply
- Jason Brownlee July 31, 2019 at 6:50 am #
  
  k-means will find clusters, not labels.
  
  Labels must be assigned by a domain expert.
  
  Reply
  - George July 31, 2019 at 7:20 am #
    
    Thanks Jason, if they say there is going to be two clusters, then we build kmeans with K as 2, we get two clusters, in this case is this possible to continue supervised learning.
    
    Reply
    - George July 31, 2019 at 9:50 am #
      
      Just to be more explainable,
      
      kmeansmodel = KMeans(n_clusters= 2)
      kmeansmodel.fit(X_train)
      predicted = kmeansmodel.labels_
      kmf2labels = predicted.tolist()
      raw_data[‘labels’] = kmf2labels
      
      Now we get labels as 0 and 1, so can we binary classification now.
      
      Reply
      - Jason Brownlee July 31, 2019 at 2:06 pm #
        
        Yes.
    - Jason Brownlee July 31, 2019 at 2:04 pm #
      
      It may be.
      
      Reply
      - George August 1, 2019 at 10:06 am #
        
        Thanks Jason, whether the supervised classification after unsupervised will improve our prediction results, may I have your comments please?
      - Jason Brownlee August 1, 2019 at 2:12 pm #
        
        It depends on the data and the model.
        
        The best that I can say is: try it and see.
George August 5, 2019 at 10:48 am #

Hi Jason,
The DBSCAN model running into MemoryError(with 32GB RAM and 200,000 records, 60 Columns), may I know is there a solution for this

dbscan_model = DBSCAN(eps=3, min_samples=5, metric=’euclidean’, algorithm=’auto’)
dbscan_model.fit(X_scaled)

I tried like splitting the data based on ONE categorical column, say Employed(Yes and No), so these two dataset splits getting 105,000 and 95000 records, so I build two models, for prediction if the test record is Employed Yes i run the model_Employed_Yes or other, NOT sure is this a good choice to do?
Thanks

Reply
- Jason Brownlee August 5, 2019 at 2:01 pm #
  
  Perhaps try operating on a sample of the dataset?
  
  Perhaps try running on an EC2 instance with more memory?
  
  Perhaps try exploring a more memory efficient implementation?
  
  Reply
Ghazal August 7, 2019 at 3:58 am #

Hi
I used this note in my paper.
How can I reference it?
please help me

Reply
- Jason Brownlee August 7, 2019 at 8:04 am #
  
  Great question, I show how here:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-reference-or-cite-a-book-or-blog-post
  
  Reply
Ghazal August 14, 2019 at 5:32 pm #

Thank you so much.

I have one more question. Do we have the primal SVM function?
what is it?

Reply
- Jason Brownlee August 15, 2019 at 7:58 am #
  
  What is the “primal SVM function”? Do you mean the kernel?
  
  Reply
  - Ghazal August 18, 2019 at 3:27 am #
    
    yes. the kernel
    
    Reply
    - Jason Brownlee August 18, 2019 at 6:49 am #
      
      Perhaps start here:
      https://machinelearningmastery.com/support-vector-machines-for-machine-learning/
      
      Reply
mahesh August 22, 2019 at 3:08 am #

thank you sir, this post is very helpful for me. sir i have a doubt. Is unsupervised learning have dataset or not?

Reply
- Jason Brownlee August 22, 2019 at 6:32 am #
  
  Yes, unsupervised learning has a training dataset only.
  
  Reply
  - Jilani November 19, 2019 at 9:48 pm #
    
    Training or testing?
    
    Reply
    - Jason Brownlee November 20, 2019 at 6:14 am #
      
      Yes.
      
      Reply
Zeeshan September 19, 2019 at 11:27 pm #

Hello sir. Thank you so much for this helping material.
Sir one problem i am facing that how can i identify the best suitable algorithm/model for a scenario.
For example i have an image and i want to find the values of three variables by ML model so which model can i use.

Input: image
Output: concentration of variable 1, 2, 3 in an image.

Reply
- Jason Brownlee September 20, 2019 at 5:43 am #
  
  Perhaps try a range of CNN models for image classification?
  
  This might be a good place to start:
  https://machinelearningmastery.com/start-here/#dlfcv
  
  Reply
Andreas November 8, 2019 at 8:28 am #

i think the solution to unsupervised learning is to make a program that just takes photos from camera and then let the network reconstruct what ever total image that its confronted with by random and use this for method for its training.

as far as i understand the network can reconstruct lots of images from fragments stored in the network. that means by take a snap shot of what camera sees and feed that as training data could pehaps solve unsupervised learning. this way the network automatically aquire it own training data. what i mean is not to classify data directly as that will keep you stuck in the supervised learning limbo.

it will not be enough with one network. the reason is that it takes two players to share information. the network can’t read itself at the same time as it reconstruct as that obliterate the image its reconstructing from.

what you need is a second network that can reconstruct what the first network is showing. its not this simple either. you can not solve the problem by this alone as the network can only output a single image at the time so we need to break down the image into smaller parts and then let one network get a random piece to reconstruct the whole from the total image of the other networks reconstruction.

by randomly trow the ball of part of the image between the networks, you have comunication between them. this way, you can make a dream like process with infinite possible images.
now you need a third network that can get random images received from the two other networks and use the input image data from the camera as images to compare the random suggestions from the two interchanging networks with the reconstruction from the third network from camera image. this way the machine will self classify the data that fits with the external image.

what we need now is to brand these random images labels by marry the sound data or transelation of sound to speach with the random images from the two recursive mirrors secondary network to one primary by a algorithm that can take the repetition of recognized words done by another specialized network and indirectly use the condition for the recognition of the sound data as a trigger to take a snapshot of camera and reconstruct that image and then compare that image by the random recursive mirrors. if it found the image of the target in the camera in the random recursive network, you can then use a conventional algoritm to classify the recognized word with the recognized image.

this way the machine will learn and teach itself information that over time will make it able to recall classified objects you did not teach it.

this is not the solution of the whole problem. you do not have Artificial General Intelligence yet. there is still a big problem left.

you now have to find a way to make the software make comunication with people so that it can learn from their thinking and learn how to say things.

what you have from before is just a very intelligent dream machine that learns.

now we have to reverse the process. now we have to take input data from a person verbally and use the classifications the computer created by itself to reconstruct image in the main network. this way we are half way into letting the network learn from your verbal language by dive into its own network for information to create new and more classifications by itself using its previous methods.

at this point you have created a very clever low iq program that only mirrors your saying like a evolved monkey.

in order to solve this you have to increase the complexity of the networks by take the primary network and make it seconday and then create a new network that can act as the top of the triangle and make 6 seconday network that mimic the main network. these 6 networks will be handles to store parts of information that can make suggestions to compare to the main network output.

this way you have 6 networks that contain pattern where they can compete for the better question or answer. what ever it made the program smarter i don’t know. byond this im clueless. anyway this is just an idea. if this is to complicated, there is no way in the world anyone will ever solve the problem of unsupervised learning that leads to agi.

Reply
- Jason Brownlee November 8, 2019 at 1:48 pm #
  
  Thanks for sharing.
  
  Reply
Nelson November 13, 2019 at 12:08 am #

Are supervised and unsupervised algorithms another way of defining parametric and nonparametric algorithms?

Reply
- Jason Brownlee November 13, 2019 at 5:44 am #
  
  No.
  
  Some supervised algorithms are parametric, some are nonparametric.
  
  Some unsupervised algorithms are parametric, some are nonparametric.
  
  Reply
Shan December 11, 2019 at 1:07 am #

great work,

sir can you give example how supervised learning is used to test software components.
means how to do testing of software with supervised learning . any example will be helpful

Reply
- Jason Brownlee December 11, 2019 at 7:01 am #
  
  Thanks for the suggestion.
  
  Reply
  - Shan December 11, 2019 at 2:35 pm #
    
    Sir can you help me how to do testing with supervised learning. Please give any example. I am facing problem in it
    
    Reply
    - Jason Brownlee December 11, 2019 at 2:46 pm #
      
      Yes, there are hundreds of examples on the blog. Perhaps start here:
      https://machinelearningmastery.com/machine-learning-in-python-step-by-step/
      
      Reply
Skylar January 15, 2020 at 9:49 am #

You did a really good job with this. I like it a lot. 🙂

Reply
- Jason Brownlee January 15, 2020 at 1:40 pm #
  
  Thanks!
  
  Reply
Dinesh Kakkad January 30, 2020 at 3:57 am #

ery informing article that tells differences between supervised and unsupervised learning!
thanks!

Reply
- Jason Brownlee January 30, 2020 at 6:58 am #
  
  You’re welcome!
  
  Reply
Mike Magliozzi March 5, 2020 at 6:38 am #

Hey Jason! Great article! I’m currently working on a Supervised/Unsupervised Learning Project for one of my MBA classes. For the project we have to identify a problem in our workplace that can be solved using Supervised and Unsupervised Learning.

I work for a digital marketing agency that builds and manages marketing campaigns for small to mid size business (PPC, SEO, Facebook Ads, Display Ads, etc). For my unsupervised learning model I was thinking of solving the problem of customer churn before it gets to that point.

I would use K-means Clustering and the features/columns for the model would be:

– the reason for the cancellation
– how many months the client ran with us before cancelling. (Whenever someone cancels with us we choose from a list of cancellation reasons within our CRM.)

The rows would be the type of marketing channel that the client was running.

By clustering this data we would be able to see what types of cancellations to look for at various stages of a customer life cycle, broken down by each marketing channel.

Does this problem make sense for Unsupervised Learning and if so do I need to add more features for it or is two enough?

Thanks for taking the time to read this!

Reply
- Jason Brownlee March 5, 2020 at 6:42 am #
  
  Churn prediction is a supervised learning problem. Clustering could be used as a pre-processing step.
  
  Reply
  - Michael Magliozzi March 5, 2020 at 6:48 am #
    
    I see. Could you expand on what you mean by clustering being used as a pre-processing step?
    
    Reply
    - Jason Brownlee March 5, 2020 at 6:54 am #
      
      Yes, as you describe, you could group customers based on behavior in an unsupervised way, then fit a model on each group or use group membership as an input to a supervised learning model.
      
      It may or may not be helpful, depending on the complexity of the problem and chosen model, e.g. most supervised learning models would do something like this anyway.
      
      Reply
      - Michael Magliozzi March 5, 2020 at 7:01 am #
        
        Ok so outside of the part where I talk about using the Unsupervised Model to predict churn everything else I said would work for Unsupervised Learning? (The features/rows I outlined)
      - Jason Brownlee March 5, 2020 at 10:33 am #
        
        It is impossible to know what the most useful features will be. I recommend running some experiments to see what works for your dataset.
        
        This might give you ideas about what data to collect:
        https://machinelearningmastery.com/how-to-define-your-machine-learning-problem/
Khan June 4, 2020 at 3:08 am #

interesting post. now suggest me algorithms in unsupervised learning to detect malicious/phishing url and legitimate url.

Reply
- Jason Brownlee June 4, 2020 at 6:26 am #
  
  That sounds like a supervised learning problem.
  
  Reply
Jagruti September 19, 2020 at 6:05 pm #

Thank you so much for such amazing post, very easy understand ……Thank You

Reply
- Jason Brownlee September 20, 2020 at 6:43 am #
  
  You’re welcome!
  
  Reply
Yohan October 29, 2020 at 10:39 pm #

brilliant read, but i am stuck on something; is it possible to append data on supervised learning models?

Reply
- Jason Brownlee October 30, 2020 at 6:52 am #
  
  Thanks!
  
  Sure, you can update or refit the model any time you want.
  
  Reply
Licamy December 6, 2020 at 7:36 am #

Hi, I have to predict student performance of a specific class and i collected all other demographic and previous class data of students. So in this case either i apply supervised or unsupervised learning algorithm.

Reply
- Jason Brownlee December 6, 2020 at 8:06 am #
  
  It sounds like supervised learning, this framework will help:
  https://machinelearningmastery.com/how-to-define-your-machine-learning-problem/
  
  Reply
Rukhsar khan January 26, 2021 at 8:48 pm #

which one is better supervised and unsupervised machine learning?

Reply
- Jason Brownlee January 27, 2021 at 6:05 am #
  
  Invalid dichotomy.
  
  They are two different classes of technique for solving different problems.
  
  If you had to chose one to study that would be most useful “at work”, it would be: supervised learning.
  
  Reply
Moses Okpalefe March 16, 2021 at 1:53 am #

Hi Jason,

Well structured write that has finally cleared some misconceptions. I wanted to find out where future predictions will fall under.

So say I had a variable, Y_p and 3 input variables X1, X2, X3, in my data set but I wanted to predict a future Y value, let’s call it Y_f. Would this be a supervised or unsupervised problem?

For example, Y_p could be my current speed, X1, X2 and X3 could be weight, height, age and then Y_f would be the predicted (future speed) after a given period t

Looking forward to your response.

Reply
- Jason Brownlee March 16, 2021 at 4:49 am #
  
  Thanks.
  
  Supervised.
  
  Reply
Moses Okpalefe March 16, 2021 at 11:03 am #

Thank you. What algorithms would be best suited to this problem? Given it is a regression problem.

I find it a bit boggling in the sense that from my understanding of supervised, the target variable is usually known in the historic data (train set). however, for this problem, the target variable in the historic data isn’t known so perhaps you could point me in the direction where I can really understand why it is still a supervised learning problem and what algorithms best tackle this problem.

Reply
- Jason Brownlee March 17, 2021 at 5:56 am #
  
  Good question this will help:
  https://machinelearningmastery.com/faq/single-faq/what-algorithm-config-should-i-use
  
  Reply
Anugrah March 23, 2021 at 5:07 am #

Hi Jason,

Is there any algorithm out there which can perform unsupervised multiclass multi label problems? I know that Kmeans can be used for unsupervised multiclass problems. Is there any way they can be made to work on multi class multi label problems?

Reply
- Jason Brownlee March 24, 2021 at 5:41 am #
  
  No. Classification is a supervised learning problem, not unsupervised.
  
  Kmeans is not aware of classes, it is not a classification algorithm. It is a clustering algorithm and groups data into the number centers you specify.
  
  Reply
Umer April 30, 2021 at 6:36 pm #

Hi Jason,
Does Knn require some initial labeled data, on which it can create clusters, or is it done by some other technique?

Reply
- Jason Brownlee May 1, 2021 at 6:04 am #
  
  Yes, the model requires a good representative labeled dataset for “training”.
  
  Reply
Torubein Fawei May 4, 2021 at 10:03 pm #

A simple and clear explanation. Thank you

Reply
- Jason Brownlee May 5, 2021 at 6:10 am #
  
  You’re welcome!
  
  Reply
Najeh June 7, 2021 at 1:05 am #

Thank you for this post.
I would like to know how can I train and test in unsupervised learning for image dataset: during training all the dataset is labeled and during test how datasets should be (should i get dataset with masks or only normal dataset)?
Thank you

Reply
- Jason Brownlee June 7, 2021 at 5:22 am #
  
  Sorry, I don’t have examples of unsupervised learning.
  
  Reply
VENKANNA ISAMPALLI August 4, 2021 at 11:19 pm #

May I do the clustering on the image data.

Reply
- Jason Brownlee August 5, 2021 at 5:20 am #
  
  I guess so, you may need custom techniques designed for image data.
  
  Reply
ravi August 9, 2021 at 1:31 am #

i have doubt can u pls tell is this a supervised (classification) or unsupervised

Predicting if a new image has cat or dog based on the historical data of other images of cats and dogs, where you are supplied the information about which image is cat or dog

Reply
- Jason Brownlee August 9, 2021 at 5:57 am #
  
  Supervised.
  
  Reply
Ward Canfield January 21, 2022 at 6:25 am #

How can or does the Halting Problem affect unsupervised machine learning? Thank you for your thoughts.

Sincerely,
Ward

Reply
- James Carmichael January 21, 2022 at 9:38 am #
  
  Hi Ward…This is a great question! I have not investigated this a great deal, however I would recommend the following discussion to get some ideas:
  
  https://ai.stackexchange.com/questions/148/what-limits-if-any-does-the-halting-problem-put-on-artificial-intelligence/170
  
  Reply
Raheel June 10, 2022 at 12:24 am #

Sir my Question is to write 5 problems related to supervised learning and unsupervised learning in 2022
kindly reply me as soon as possible

Reply
- James Carmichael June 10, 2022 at 9:24 am #
  
  Hi Raheel…Please clarify your query so that we may better assist you.
  
  Reply
Dami February 5, 2023 at 8:27 pm #

Hi. I have a dataset on house price prediction regards region in the united states. What if you have like five input variables and one output variable which is price ? What models can be suitable for this ?

Reply
- James Carmichael February 6, 2023 at 11:32 am #
  
  Hi Dami…You may find the following resource of interest:
  
  https://medium.com/@manilwagle/predicting-house-prices-using-machine-learning-cab0b82cd3f
  
  Reply
Dami February 5, 2023 at 8:31 pm #

I would want to predict the prices that the house was sold at features like the population of the city, average no of rooms in the same city, average area of houses in the same city, average income of household the city house is located. The data set also contains address. I just need to know the model to use.

Reply
- Mig April 24, 2024 at 3:34 pm #
  
  Linear regression
  
  Reply

Navigation

Supervised and Unsupervised Machine Learning Algorithms

Supervised Machine Learning

Get your FREE Algorithms Mind Map

Unsupervised Machine Learning

Semi-Supervised Machine Learning

Summary

Discover How Machine Learning Algorithms Work!

See How Algorithms Work in Minutes

Finally, Pull Back the Curtain on
Machine Learning Algorithms

More On This Topic

273 Responses to Supervised and Unsupervised Machine Learning Algorithms

Leave a Reply Click here to cancel reply.

Navigation

Supervised Machine Learning

Get your FREE Algorithms Mind Map

Unsupervised Machine Learning

Semi-Supervised Machine Learning

Summary

Discover How Machine Learning Algorithms Work!

See How Algorithms Work in Minutes

Finally, Pull Back the Curtain on Machine Learning Algorithms

More On This Topic

273 Responses to Supervised and Unsupervised Machine Learning Algorithms

Leave a Reply Click here to cancel reply.

Finally, Pull Back the Curtain on
Machine Learning Algorithms