Machine Learning Algorithms Mini-Course

Machine learning algorithms are a very large part of machine learning.

You have to understand how they work to make any progress in the field.

In this post you will discover a 14-part machine learning algorithms mini course that you can follow to finally understand machine learning algorithms.

We are going to cover a lot of ground in this course and you are going to have a great time.

Kick-start your project with my new book Master Machine Learning Algorithms, including step-by-step tutorials and the Excel Spreadsheet files for all examples.

Let’s get started.

Machine Learning Algorithms Mini-Course

Machine Learning Algorithms Mini-Course
Photo by Jared Tarbell, some rights reserved.

Who is This Course For?

Before we get started, let’s make sure you are in the right place.

  • This course is for beginners curious about machine learning algorithms.
  • This course does not assume you know how to write code.
  • This course does not assume a background in mathematics.
  • This course does not assume a background in machine learning theory.

This mini-course will take you on a guided tour of machine learning algorithms from foundations  and through 10 top techniques.

We will visit each algorithm to give you a sense of how it works, but not go into too much depth to keep things moving.

Mini-Course Overview

Let’s take a look at what we’re going to cover over the next 14 lessons.

You may need to come back to this post again and again, so you may want to bookmark it.

This mini-course is broken down int four parts: Algorithm Foundations, Linear Algorithms, Nonlinear Algorithms and Ensemble Algorithms.

Algorithm Foundations

  • Lesson 1: How To Talk About Data in Machine Learning
  • Lesson 2: Principle That Underpins All Algorithms
  • Lesson 3: Parametric and Nonparametric Algorithms
  • Lesson 4: Bias, Variance and the Trade-off

Linear Algorithms

  • Lesson 5: Linear Regression
  • Lesson 6: Logistic Regression
  • Lesson 7: Linear Discriminant Analysis

Nonlinear Algorithms

  • Lesson 8: Classification and Regression Trees
  • Lesson 9: Naive Bayes
  • Lesson 10: k-Nearest Neighbors
  • Lesson 11: Learning Vector Quantization
  • Lesson 12: Support Vector Machines

Ensemble Algorithms

  • Lesson 13: Bagging and Random Forest
  • Lesson 14: Boosting and AdaBoost

Get your FREE Algorithms Mind Map

Machine Learning Algorithms Mind Map

Sample of the handy machine learning algorithms mind map.

I've created a handy mind map of 60+ algorithms organized by type.

Download it, print it and use it. 

Also get exclusive access to the machine learning algorithms email mini-course.



Lesson 1: How To Talk About Data in Machine Learning

Data plays a big part in machine learning.

It is important to understand and use the right terminology when talking about data.

How do you think about data? Think of a spreadsheet. You have columns, rows, and cells.

The statistical perspective of machine learning frames data in the context of a hypothetical function (f) that the machine learning algorithm aims to learn. Given some input variables (Input)  the function answer the question as to what is the predicted output variable (Output).

Output = f(Input)

The inputs and outputs can be referred to as variables or vectors.

The computer science perspective uses a row of data to describe an entity (like a person) or an observation about an entity. As such, the columns for a row are often referred to as attributes of the observation and the rows themselves are called instances.

Lesson 2: The Principle That Underpins All Algorithms

There is a common principle that underlies all supervised machine learning algorithms for predictive modeling.

Machine learning algorithms are described as learning a target function (f) that best maps input variables (X) to an output variable (Y).

Y = f(X)

This is a general learning task where we would like to make predictions in the future (Y) given new examples of input variables (X). We don’t know what the function (f) looks like or it’s form. If we did, we would use it directly and we would not need to learn it from data using machine learning algorithms.

The most common type of machine learning is to learn the mapping Y = f(X) to make predictions of Y for new X. This is called predictive modeling or predictive analytics and our goal is to make the most accurate predictions possible.

Lesson 3: Parametric and Nonparametric Algorithms

What is a parametric machine learning algorithm and how is it different from a nonparametric machine learning algorithm?

Assumptions can greatly simplify the learning process, but can also limit what can be learned. Algorithms that simplify the function to a known form are called parametric machine learning algorithms.

The algorithms involve two steps:

  1. Select a form for the function.
  2. Learn the coefficients for the function from the training data.

Some examples of parametric machine learning algorithms are Linear Regression and Logistic Regression.

Algorithms that do not make strong assumptions about the form of the mapping function are called nonparametric machine learning algorithms. By not making assumptions, they are free to learn any functional form from the training data.

Non-parametric methods are often more flexible, achieve better accuracy but require a lot more data and training time.

Examples of nonparametric algorithms include Support Vector Machines, Neural Networks and Decision Trees.

Lesson 4: Bias, Variance and the Trade-off

Machine learning algorithms can best be understood through the lens of the bias-variance trade-off.

Bias are the simplifying assumptions made by a model to make the target function easier to learn.

Generally parametric algorithms have a high bias making them fast to learn and easier to understand but generally less flexible. In turn they have lower predictive performance on complex problems that fail to meet the simplifying assumptions of the algorithms bias.

Decision trees are an example of a low bias algorithm, whereas linear regression is an example of a high-bias algorithm.

Variance is the amount that the estimate of the target function will change if different training data was used. The target function is estimated from the training data by a machine learning algorithm, so we should expect the algorithm to have some variance, not zero variance.

The k-Nearest Neighbors algorithm is an example of a high-variance algorithm, whereas Linear Discriminant Analysis is an example of a low variance algorithm.

The goal of any predictive modeling machine learning algorithm is to achieve low bias and low variance. In turn the algorithm should achieve good prediction performance. The parameterization of machine learning algorithms is often a battle to balance out bias and variance.

  • Increasing the bias will decrease the variance.
  • Increasing the variance will decrease the bias.

Lesson 5: Linear Regression Algorithm

Linear regression is perhaps one of the most well known and well understood algorithms in statistics and machine learning.

Isn’t it a technique from statistics?

Predictive modeling is primarily concerned with minimizing the error of a model or making the most accurate predictions possible, at the expense of explainability. We will borrow, reuse and steal algorithms from many different fields, including statistics and use them towards these ends.

The representation of linear regression is a equation that describes a line that best fits the relationship between the input variables (x) and the output variables (y), by finding specific weightings for the input variables called coefficients (B).

For example:

y = B0 + B1 * x

We will predict y given the input x and the goal of the linear regression learning algorithm is to find the values for the coefficients B0 and B1.

Different techniques can be used to learn the linear regression model from data, such as a linear algebra solution for ordinary least squares and gradient descent optimization.

Linear regression has been around for more than 200 years and has been extensively studied. Some good rules of thumb when using this technique are to remove variables that are very similar (correlated) and to remove noise from your data, if possible.

It is a fast and simple technique and good first algorithm to try.

Lesson 6: Logistic Regression Algorithm

Logistic regression is another technique borrowed by machine learning from the field of statistics. It is the go-to method for binary classification problems (problems with two class values).

Logistic regression is like linear regression in that the goal is to find the values for the coefficients that weight each input variable.

Unlike linear regression, the prediction for the output is transformed using a non-linear function called the logistic function.

The logistic function looks like a big S and will transform any value into the range 0 to 1. This is useful because we can apply a rule to the output of the logistic function to snap values to 0 and 1 (e.g. IF less than 0.5 then output 1) and predict a class value.

Because of the way that the model is learned, the predictions made by logistic regression can also be used as the probability of a given data instance belonging to class 0 or class 1. This can be useful on problems where you need to give more rationale for a prediction.

Like linear regression, logistic regression does work better when you remove attributes that are unrelated to the output variable as well as attributes that are very similar (correlated) to each other.

It’s a fast model to learn and effective on binary classification problems.

Lesson 7: Linear Discriminant Analysis Algorithm

Logistic regression is a classification algorithm traditionally limited to only two-class classification problems. If you have more than two classes then the Linear Discriminant Analysis algorithm is the preferred linear classification technique.

The representation of LDA is pretty straight forward. It consists of statistical properties of your data, calculated for each class. For a single input variable this includes:

  1. The mean value for each class.
  2. The variance calculated across all classes.

Predictions are made by calculating a discriminate value for each class and making a prediction for the class with the largest value.

The technique assumes that the data has a Gaussian distribution (bell curve), so it is a good idea to remove outliers from your data before hand.

It’s a simple and powerful method for classification predictive modeling problems.

Lesson 8: Classification and Regression Trees

Decision Trees are an important type of algorithm for predictive modeling machine learning.

The representation for the decision tree model is a binary tree. This is your binary tree from algorithms and data structures, nothing too fancy. Each node represents a single input variable (x) and a split point on that variable (assuming the variable is numeric).

The leaf nodes of the tree contain an output variable (y) which is used to make a prediction.  Predictions are made by walking the splits of the tree until arriving at a leaf node and output the class value at that leaf node.

Trees are fast to learn and very fast for making predictions. They are also often accurate for a broad range of problems and do not require any special preparation for your data.

Decision trees have a high variance and can yield more accurate predictions when used in an ensemble, a topic we will cover in Lesson 13 and Lesson 14.

Lesson 9: Naive Bayes Algorithm

Naive Bayes is a simple but surprisingly powerful algorithm for predictive modeling.

The model is comprised of two types of probabilities that can be calculated directly from your training data:

  1. The probability of each class.
  2. The conditional probability for each class given each x value.

Once calculated, the probability model can be used to make predictions for new data using Bayes Theorem.

When your data is real-valued it is common to assume a Gaussian distribution (bell curve) so that you can easily estimate these probabilities.

Naive Bayes is called naive because it assumes that each input variable is independent. This is a strong assumption and unrealistic for real data, nevertheless, the technique is very effective on a large range of complex problems.

Lesson 10: K-Nearest Neighbors Algorithm

The KNN algorithm is very simple and very effective.

The model representation for KNN is the entire training dataset. Simple right?

Predictions are made for a new data point by searching through the entire training set for the K most similar instances (the neighbors) and summarizing the output variable for those K instances. For regression this might be the mean output variable, in classification this might be the mode (or most common) class value.

The trick is in how to determine similarity between the data instances. The simplest technique if your attributes are all of the same scale (all in inches for example) is to use the Euclidean distance, a number you can calculate directly based on the differences between each input variable.

KNN can require a lot of memory or space to store all of the data, but only performs a calculation (or learn) when a prediction is needed, just in time. You can also update and curate your training instances over time to keep predictions accurate.

The idea of distance or closeness can break down in very high dimensions (lots of input variables) which can negatively effect the performance of the algorithm on your problem. This is called the curse of dimensionality. It suggests you only use those input variables that are most relevant to predicting the output variable.

Lesson 11: Learning Vector Quantization

A downside of K-Nearest Neighbors is that you need to hang on to your entire training dataset.

The Learning Vector Quantization algorithm (or LVQ for short) is an artificial neural network algorithm that allows you to choose how many training instances to hang onto and learns exactly what those instances should look like.

The representation for LVQ is a collection of codebook vectors. These are selected randomly in the beginning and adapted to best summarize the training dataset over a number of iterations of the learning algorithm.

After learned, the codebook vectors can be used to make predictions just like K-Nearest Neighbors. The most similar neighbor (best matching codebook vector) is found by calculating the distance between each codebook vector and the new data instance. The class value or (real value in the case of regression) for the best matching unit is then returned as the prediction.

Best results are achieved if you rescale your data to have the same range, such as between 0 and 1.

If you discover that KNN gives good results on your dataset try using LVQ to reduce the memory requirements of storing the entire training dataset.

Lesson 12: Support Vector Machines

Support Vector Machines are perhaps one of the most popular and talked about machine learning algorithms.

A hyperplane is a line that splits the input variable space. In SVM, a hyperplane is selected to best separate the points in the input variable space by their class, either class 0 or class 1.

In two-dimensions you can visualize this as a line and let’s assume that all of our input points can be completely separated by this line.

The SVM learning algorithm finds the coefficients that results in the best separation of the classes by the hyperplane.

The distance between the hyperplane and the closest data points is referred to as the margin. The best or optimal hyperplane that can separate the two classes is the line that as the largest margin.

Only these points are relevant in defining the hyperplane and in the construction of the classifier.

These points are called the support vectors. They support or define the hyperplane.

In practice, an optimization algorithm is used to find the values for the coefficients that maximizes the margin.

SVM might be one of the most powerful out-of-the-box classifiers and worth trying on your dataset.

Lesson 13: Bagging and Random Forest

Random Forest is one of the most popular and most powerful machine learning algorithms. It is a type of ensemble machine learning algorithm called Bootstrap Aggregation or bagging.

The bootstrap is a powerful statistical method for estimating a quantity from a data sample. Such as a mean. You take lots of samples of your data, calculate the mean, then average all of your mean values to give you a better estimation of the true mean value.

In bagging, the same approach is used, but instead for estimating entire statistical models, most commonly decision trees.

Multiple samples of your training data are taken then models are constructed for each data sample. When you need to make a prediction for new data, each model makes a prediction and the predictions are averaged to give a better estimate of the true output value.

Random forest is a tweak on this approach where decision trees are created so that rather than selecting optimal split points, suboptimal splits are made by introducing randomness.

The models created for each sample of the data are therefore more different than they otherwise would be, but still accurate in their unique and different ways. Combining their predictions results in a better estimate of the true underlying output value.

If you get good good results with an algorithm with high variance (like decision trees), you can often get better results by bagging that algorithm.

Lesson 14: Boosting and AdaBoost

Boosting is an ensemble technique that attempts to create a strong classifier from a number of weak classifiers.

This is done by building a model from the training data, then creating a second model that attempts to correct the errors from the first model. Models are added until the training set is predicted perfectly or a maximum number of models are added.

AdaBoost was the first really successful boosting algorithm developed for binary classification. It is the best starting point for understanding boosting. Modern boosting methods build on AdaBoost, most notably stochastic gradient boosting machines.

AdaBoost is used with short decision trees. After the first tree is created, the performance of the tree on each training instance is used to weight how much attention the next tree that is created should pay attention to each training instance. Training data that is hard to predict is given more more weight, whereas easy to predict instances are given less weight.

Models are created sequentially one after the other, each updating the weights on the training instances that affect the learning performed by the next tree in the sequence.

After all the trees are built, predictions are made for new data, and the performance of each tree is weighted by how accurate it was on the training data.

Because so much attention is put on correcting mistakes by the algorithm it is important that you have clean data with outliers removed.

Mini-Course Review

You made it. Well done! Take a moment and look back at how far you have come:

  • You discovered how to talk about data in machine learning and about the underlying principles of all predictive modeling algorithms.
  • You discovered the difference between parametric and nonparametric algorithms and the difference between error introduced by bias and variance.
  • You discovered three linear machine learning algorithms: Linear Regression, Logistic Regression and Linear Discriminant Analysis.
  • You were introduced to 5 nonlinear algorithms: Classification and Regression Trees, Naive Bayes, K-Nearest Neighbors, Learning Vector Quantization and Support Vector Machines.
  • Finally, you discovered two of the most popular ensemble algorithms: Bagging with Decision Trees and Boosting with AdaBoost.

Don’t make light of this, you have come a long way in a short amount of time. This is just the beginning of your journey with machine learning algorithms. Keep practicing and developing your skills.

Did you enjoy this mini-course?
Do you have any questions or sticking points?

Leave a comment and let me know.

Discover How Machine Learning Algorithms Work!

Mater Machine Learning Algorithms

See How Algorithms Work in Minutes

...with just arithmetic and simple examples

Discover how in my new Ebook:
Master Machine Learning Algorithms

It covers explanations and examples of 10 top algorithms, like:
Linear Regression, k-Nearest Neighbors, Support Vector Machines and much more...

Finally, Pull Back the Curtain on
Machine Learning Algorithms

Skip the Academics. Just Results.

See What's Inside

173 Responses to Machine Learning Algorithms Mini-Course

  1. Avatar
    flagship dynamics May 3, 2016 at 2:30 am #

    I really like this.

    • Avatar
      Jason Brownlee May 3, 2016 at 5:42 am #


    • Avatar
      Daliya Vk February 21, 2020 at 1:41 pm #

      hi, i have performed multi variable linear regression on a data set .. I got RMSE in 2000 range ,without any transformations.. .. but when I performed log transformation, i got RMSE as 0.4…. Is that fine??? can I present my results??

      • Avatar
        Jason Brownlee February 22, 2020 at 6:16 am #

        I recommend inverting any transforms on the predictions and expected values before estimating model performance.

  2. Avatar
    ImadYamane May 4, 2016 at 3:17 am #

    Thank you for awesome work you do sir. i really like your website.

  3. Avatar
    Krishna Chaitanya August 16, 2016 at 10:14 am #

    I always enjoy reading your blogs. Thanks for your time and efforts

  4. Avatar
    Chris August 19, 2016 at 5:14 pm #

    Thank you for the post Jason.

    I am wondering whether you meant variance instead of bias here:

    “The k-Nearest Neighbors algorithm is an example of a high-bias algorithm, whereas Linear Discriminant Analysis is an example of a low bias algorithm.”

  5. Avatar
    itodayer September 19, 2016 at 5:40 pm #

    Thank you for awesome work, it is very useful for us.

    • Avatar
      Jason Brownlee September 20, 2016 at 8:29 am #

      I’m glad you find the material useful itodayer.

      • Avatar
        Kishore October 22, 2020 at 11:39 am #

        Thank you sir
        I want some data from you. about feature extraction, feature selection, feature normalisation and different classification algorithms ( compare with SIFT for feature extraction , SVM based feature selection and LASSO for classification) on different gender recognition datasets( FERET,FGNet,LFW and ORL). I will start research on objects ( human gender, animals, roads and vehicle ) classification. Please give me your suggestion this research problem is good and any novality.

  6. Avatar
    Manal Alghamdi October 8, 2016 at 8:18 am #

    I wish this blog the first thing I read about Machine Learning. Your explanation, simple language, and brief information all what needed for beginners !!! Many many thanks.

  7. Avatar
    aarish grover February 26, 2017 at 11:06 pm #

    This is great! Thanks Jason!

  8. Avatar
    Jeremy Cooper May 1, 2017 at 9:01 am #

    Excellent starting point for algorithms! Thanks for the entry point!

  9. Avatar
    Kenny July 24, 2017 at 1:39 pm #

    Hello Jason,
    Can you please explain the differences between Generative Modelling and descriptive Modeling?

    • Avatar
      Jason Brownlee July 25, 2017 at 9:25 am #

      A generative model can generate new instances that have the statistical properties of other instances in the corpus.

      A descriptive model can help you learn something about the domain from the data.

      A predictive model can help estimate an outcome.

  10. Avatar
    Mike Ernst January 12, 2019 at 10:26 am #

    This was a useful introduction for me. I had first tried starting with the information on Weka, but I didn’t really understand why I would choose one of the prepackaged algorithms or why. This was a better place for me to start.

    Thanks for having all this information available.

  11. Avatar
    Joe January 24, 2019 at 3:29 am #

    Really nice mini course really helps out the people with a little knowledge about Machine Learning.

  12. Avatar
    Anirudh Murali March 13, 2019 at 1:10 am #

    Thanks Jason! Your blog is one of the first I look up in when I have ML doubts. You are doing a great job! Thanks again.

  13. Avatar
    Rufaro March 17, 2019 at 4:06 am #

    Hi Jason. Your site is very educative, many thanks.

    Please tell me if my understanding or interpretation is correct.

    “Section : Principle that underlies all machine learning algorithms”

    Because the goal of an algorithm is to learn about the target function (f), prediction is only important because it suggests that we have learnt more. Particularly when our predictions agree with actuals.

    Do you agree?

  14. Avatar
    Ali November 25, 2019 at 2:15 am #

    Hi Jason. This is ALI, Your site is highly educative and useful.

    Can you please tell me which is the best and efficient machine learning algorithm that to classify APT Malware that targets payment systems in the banking sector. Any sample implementation code in python would be highly appreciated. Looking forward to hearing from you soon.


  15. Avatar
    Antoine January 10, 2020 at 7:41 am #

    Very nice introduction.Go ahead.

  16. Avatar
    Dr.Neeladhab Padhy February 4, 2020 at 4:12 pm #

    What are the advance machine learning algorithms are available to use method level refactoring ?

  17. Avatar
    Elia February 4, 2020 at 6:53 pm #

    real appreciate,
    thank you

  18. Avatar
    shaun February 4, 2020 at 11:06 pm #

    so the model is saved in a header type file after training ?

  19. Avatar
    Elia February 6, 2020 at 5:51 pm #

    this is awesome!!! thank you

  20. Avatar
    aftab alam March 3, 2020 at 6:13 am #

    Thank you Jason for sharing lessons. This is pretty much the same syllabus that I am going through in Class. I have implemented the Naive Bayes classifier using log probability in python. I can share my GitHub link for review and feedback

    • Avatar
      Jason Brownlee March 3, 2020 at 10:33 am #

      Well done!

      Sorry, I don’t have the capacity to review your code.

  21. Avatar
    Sam March 30, 2020 at 4:36 pm #

    Thank you for your work, Jason

    Your work is really great

  22. Avatar
    Anup Kumar April 2, 2020 at 11:55 am #

    Hello sir, I enjoyed this very much.

  23. Avatar
    archie April 8, 2020 at 7:22 pm #

    I’m excited to learn more about ML. Thanks

  24. Avatar
    archie April 9, 2020 at 2:08 pm #

    what is the main difference between traditional regression analysis and ML?

  25. Avatar
    kamel April 15, 2020 at 9:29 pm #

    These course motivated to learn more about ML, Ireally like this web.Thanks

  26. Avatar
    Suryakanthi Tangirala April 21, 2020 at 1:38 am #

    Thank you for this article. It gives a brief intro on all the algorithms.

    I have a question on data sets.

    Can you please explain when a data set is termed as high dimensional data set?
    When there are more rows or more columns?

    • Avatar
      Jason Brownlee April 21, 2020 at 6:01 am #

      You’re welcome.

      It is subjective, I think 100 or more is a lot of features. Some say 10 or more, some say 1,000,000 or more. It depends on what you’re used to.

  27. Avatar
    Skylar April 23, 2020 at 10:48 am #

    Hi Jason,

    Thank you for all your ML posts, they are very helpful! Especially the 14 days mini course, I really like it. I start to enjoy learning ML because of your posts and course.

    I would like to ask three questions:
    1. I am using caret R package to conduct ML. Which categories you would put for Nearest Shrunken Centroids (method = “pam”) according to your classification based on algorithm similarity? Should it be “instance based”?
    2. How about XGBoost? which category would you put according to your classification based on algorithm similarity?
    3. When I am exploring different ML methods, would you suggest that I should pick 1-2 methods for each category of algorithms according to your classification to explore for my ML project?

    Thank you and I look forward to hearing from you.

    • Avatar
      Jason Brownlee April 23, 2020 at 1:31 pm #

      Good questions.

      Yes, the nearest shrunk centroids is an instance-based method.

      XGboost is an ensemble of decision trees.

      Yes, that sounds like a great approach.

  28. Avatar
    Shishir Agarwal April 30, 2020 at 12:08 am #

    This course is awesome. You are great in explaining the concepts in simple words.

    Needs to know of the opportunity to builds skills for hand’s on experience and real life project.

  29. Avatar
    Skylar April 30, 2020 at 4:07 am #

    Hi Jason,

    You mentioned in your post that logistic regression algorithm is suitable for binary classification problem. I wonder what ML methods are good for multiple group classification? Thank you!

    • Avatar
      Jason Brownlee April 30, 2020 at 6:52 am #

      LDA is a good linear method to start with for mutli-class classification.

  30. Avatar
    Ankit Prajapati May 2, 2020 at 5:32 pm #

    Does this 14 steps mini course include coding or process of building a model on small data set apart from this theory?

  31. Avatar
    vandana May 4, 2020 at 6:50 pm #

    How we can check whether our multiple regression and logistic regression model is good or not.

  32. Avatar
    Skylar May 9, 2020 at 5:07 am #

    Hi Jason,

    Thank you for your mini-course, especially the part “bagging and random forest”, I feel I start to understand random forest:-) Two confusions:

    1. You mentioned bootstrap, I wonder what is the exact differences between bootstrap and permutation test? Does it matter with sampling with replacement?
    2. You mentioned that bagging is the same approach like bootstrap, but most commonly applied on decision trees. I wonder does it mean that “bagging” is usually the term that we use for random forest, but we usually don’t use “bootstrap”, although they mean the same?

    Thank you very much in advance!

    • Avatar
      Jason Brownlee May 9, 2020 at 6:25 am #

      You’re welcome.

      The main difference is that random forest will sample the features when selecting each split point in the decision tree. Bagging won’t.

  33. Avatar
    Danny Dunne May 10, 2020 at 10:52 pm #

    Hi Jason,

    Thanks for this mini-course. Very helpful in clarifying this very complex subject.

    I have looked at the ebooks and was wondering if you offered anything on multi-input multi-output regression models. I have picked a dataset (production flow) for a project and am suitably confused at this stage !



  34. Avatar
    Isaac Tinubu May 13, 2020 at 7:38 pm #

    Thank you for giving your best to the community. I read the first course as sent to my email, very explanatory and informative.

  35. Avatar
    priyanshu June 3, 2020 at 2:22 pm #

    Information was very informative.

  36. Avatar
    Nawal June 3, 2020 at 10:06 pm #

    In your experience which is the best linear regression model where you want to fit a straight line through data points which show some linear pattern but have lots of scatter? The simple linear regression model gives high RMSE.

    • Avatar
      Jason Brownlee June 4, 2020 at 6:21 am #

      I recommend testing a suite of different linear regression algorithms to see which results in the lowest error on your dataset.

  37. Avatar
    AP Kumar June 4, 2020 at 3:46 am #

    Can we use logistic regression for continuous output type.. or it can be only used with discrete outputs ?

    • Avatar
      Jason Brownlee June 4, 2020 at 6:27 am #

      No. Classification only to predict class membership probability (binomial distribution).

  38. Avatar
    Eman Rohayem June 4, 2020 at 5:31 pm #

    Thanks so much and we are looking forward of to see a practical course after the first mini job

  39. Avatar
    Amrita July 5, 2020 at 4:56 pm #

    Thank you for answering the basics of machine learning.

  40. Avatar
    dwirani amelia July 7, 2020 at 11:28 pm #

    Hi Jason
    in your #2 lesson, you stated that : …We don’t know what the function (f) looks like or its form. If we did, we would use it directly and we would not need to learn it from data using machine learning algorithms.
    So this is very different with usual statistic, that we have to choose first, which function that we want to work with and then find out the significance later.
    and that is why we need to use/try more than one algorithms?
    but how to tell which algorithm is the most appropriate one? more accurate?

    • Avatar
      Jason Brownlee July 8, 2020 at 6:32 am #

      Correct. We test many methods and “discover” what works best.

  41. Avatar
    sudha July 20, 2020 at 8:40 pm #

    Hi jason

    I have run multiple linear regression on 2 variables one is a linear combination of 4 columns with varying coefficients and another is a polynomial function with same inputs. How do I determine which is linear and which is polynomial?

    • Avatar
      Jason Brownlee July 21, 2020 at 6:02 am #

      Sorry, I don’t follow your question, perhaps you can rephrase it?

  42. Avatar
    Ritika Roy August 12, 2020 at 6:52 pm #

    Thanks for sharing the knowledge. Keep the good work going.

  43. Avatar
    BHARATI August 21, 2020 at 10:54 pm #

    hello Jason, regarding the Decision Tree, when we use randomForest do we still need to do bootstrapping? and model validation? Also in randomForest, we get only the results not the tree?

    • Avatar
      Jason Brownlee August 22, 2020 at 6:15 am #

      Yes. Random forest uses the bootstrap to fit each tree.

      Once you choose a model, you can fit it on all available data and use it to start making predictions.

  44. Avatar
    Kalpana Singh August 27, 2020 at 6:03 pm #

    can you please provide an example in R with dummy data set.

  45. Avatar
    Roald Severtson September 3, 2020 at 4:52 am #

    Why haven’t you included SVMs in your algorithm mind map? Don’t they belong in the clustering group?

  46. Avatar
    Thamyres Tetsue Choji October 17, 2020 at 1:09 am #

    Hi ! Do you have any material about cross entropy ? I am a student of chemical engineering … Unfortunately I am not very clear on this concept.

    Thank you so much for your work, it helps a lot of people!

  47. Avatar
    Amadi Precious Ogbonda November 18, 2020 at 4:25 am #

    in lesson two on predictive analysis. you mentioned this equations, Y = f(X). Is this a general form of machine learning approach? Are there other forms?

    • Avatar
      Jason Brownlee November 18, 2020 at 6:46 am #

      Yes, it is the general form for a predictive model.

  48. Avatar
    Sirbu Gheorghe December 10, 2020 at 7:58 am #

    I do not know my self in this fiel but in my opinion how I see understand I think all of these should be like a Grafik in cross in a cube and a tree with roots where the Grafik is with 1 + above 1-under 0+ lright 0-left the tree also inside but only the half inside and all in a ball what will grow and will learn self reaching a new level of thinking etc something 1 0 1 +

  49. Avatar
    tahir-khalil January 31, 2021 at 6:38 pm #

    thanks for help me…..

  50. Avatar
    Rishabh Gupta February 8, 2021 at 3:12 pm #

    Hello Jason sir,
    I found your blogging series and books around 1 year back, and from then onwards I have read a lot of your work, it is truly amazing.

  51. Avatar
    Sylvia March 10, 2021 at 11:45 pm #

    Hello Jason.
    Your works are really helpful.

    I read many times about difference b/w statistics and ML however i am not so sure. Even lesson1 mentions on statistical perspective of ML
    For Statistics we have definite formula’s and in ML we say it learns from data but for both statistics and ML, algorithms are same, like regression ?

    I find formula/mathematical concept based and learning from data contradictory. One is fixed while other is dynamic. But it seems in ML we say it is statistics based and also say it learns from data. Please let me know your thoughts on this. Thanks.

  52. Avatar
    Sylvia Chadha March 11, 2021 at 3:19 pm #

    Okay. Thank you very much for the reference.

  53. Avatar
    Anila Kousar March 20, 2021 at 10:56 pm #

    This is good but quite short. It would be loving if more detail about the lessons is given.

  54. Avatar
    Phoenix March 29, 2021 at 6:57 am #

    Hello, Jason.

    Thank you for sharing great lessons. I’m thinking of designing a model that predict future behavior based on the historical data. I’d like to train the data everyday with cumulated daily data points. I can’t figure out the way I can design the training data sets. My intention is train the model with past 100 days’ data and the response of 101 days, past 101 days’ data and the response of 102 days and so on. I’d like to make it continuously fit as time goes. How can I approach this?

  55. Avatar
    Phoenix March 31, 2021 at 7:10 pm #

    Thank you so much, Jason. You always have an answer!

    • Avatar
      Jason Brownlee April 1, 2021 at 8:09 am #


      Not always, I am only across a fraction of the field – but a useful fraction.

  56. Avatar
    santhadevi D March 31, 2021 at 8:01 pm #

    overfitting can happen and it will increase the computational time

  57. Avatar
    praveen April 7, 2021 at 5:03 am #

    Thank you for giving such a useful information for beginners like me…Thanks alot

  58. Avatar
    Frank Yaw April 21, 2021 at 10:23 pm #

    Hi Jason,
    I am new to machine learning. I wanted to learn Python at the beginning but then attracted by your site.

    It is said that 100% accuracy in training is overfitting. If the machine learning has all the required inputs and a consistent process to output the result, assume everything is perfect. When all new inputs meet the requirement, can we get a 100% accurate prediction?

  59. Avatar
    Prerak Parekh April 23, 2021 at 11:05 pm #

    I am Prerak Parekh from Sardar Patel Institute of Technology from Mumbai, India. I was looking for some good resources on image processing which involves classification of images based on objects. Can you provide me with some resources on the same.
    Thanks and Regards,
    P Parekh.

  60. Avatar
    Vachaspathi April 24, 2021 at 11:54 am #

    In the bias variance trade off, which of the two (bias or variance) should be minimum. I mean which one improves algorithm performance, having lower bias or lower variance?
    How to fix on how much is tolerable?
    What should be the intuition while considering bias-variance trade off.

    • Avatar
      Jason Brownlee April 25, 2021 at 5:13 am #

      We seek good model performance generally, bias/variance is a diagnostic or way of thinking about model error. We don’t explicitly minimize bias and variance.

  61. Avatar
    Blackresin April 28, 2021 at 8:11 pm #

    Can’t wait to breeze read this, I use the law off 3, to solve all my algorithms, tip pick which function will also be a consequence then add you equation done.

  62. Avatar
    Bilal Al Soudi May 3, 2021 at 5:18 pm #

    Thanks Jason … I enjoy how you break the information to its basic elements …

  63. Avatar
    Vasanth Sadasivan June 29, 2021 at 1:17 am #

    Very important lesson 1, when I began my journey was massive confusion on the the terms being interchanged. Very good point!

  64. Avatar
    syou June 30, 2021 at 7:17 pm #

    It’s very useful!! Thank you so so so much!!!

  65. Avatar
    Xiao Tong August 18, 2021 at 12:25 am #

    very useful

    • Avatar
      Adrian Tam August 18, 2021 at 3:22 am #

      Thank you!

  66. Avatar
    Richard Griffin September 23, 2021 at 8:21 am #

    Jason, in using SVMs I always found it the most difficult to determine a non-arbitrary methodology for separation of of the classifications of 0’s & 1’s when “clumping” was involved and it was necessary break up the closely clumped differentiated groups. By setting the classifier (hyperplane) to widen the margin break up the grouping as easy enough soon the multi-dimensional space could quite soon result into a “trivial” separation of the 0’s & 1’s. So, my question is the following: have you some guidance on a methodology which is not so arbitrary that nicely provides an adequate margin in these mire challenging circumstances?

  67. Avatar
    Lucy October 27, 2021 at 10:10 am #

    I have recently started and it is amazing! Thankyou!!

  68. Avatar
    MallikaMehta November 12, 2021 at 9:53 pm #

    Hey Jason,

    I am new to this line and after a long career gap i have started Data Scientist course.
    My worry is this course fully based on statistics subject and i have never learnt it in my life.
    my question is do I need to learn stats first and then I should have done this course?
    please guide me , second thing is I am unable to understand several things, we are going to complete ML soon, should I start to participate in hackathon…. yet? I know I am not ready yet but my worry is what if I miss the opportunities….

    Please seeking your guidance…..Thank you.

    • Avatar
      Adrian Tam November 14, 2021 at 2:22 pm #

      Try to learn by doing. You can always drill down to the theoretical subjects later when you want to learn more. To get the best result and easier to get a sense of what’s going on, try to follow this mini-course, for example. Run some code, get the result as described, then modify the code and see what happened. You can search for stat or math topics when you get a rough idea but want to learn deeper.

  69. Avatar
    Gopal February 18, 2022 at 3:36 am #

    Thanks for your nice course. Pls give me on python

    • Avatar
      James Carmichael February 18, 2022 at 12:48 pm #

      Hi Gopal…Please clarify your question regarding Python so that I may better assist you.

  70. Avatar
    Seyi Ajao March 25, 2022 at 6:55 am #

    Machine learning is made simple the way you are presenting it. Thanks

    • Avatar
      James Carmichael March 25, 2022 at 1:45 pm #

      Great feedback Seyi!!

  71. Avatar
    Seyi Ajao April 1, 2022 at 8:07 am #

    It’s very nice

    • Avatar
      James Carmichael April 1, 2022 at 9:01 am #

      Great feedback Seyi!

  72. Avatar
    Oyebola April 8, 2022 at 2:14 am #

    Thank you for all you do for newbies like me. I appreciate it.

    • Avatar
      Adrian Tam April 8, 2022 at 5:30 am #

      You’re welcomed!

  73. Avatar
    daniel April 27, 2022 at 5:26 pm #

    Thank you janson. it’s interesting

    • Avatar
      James Carmichael April 29, 2022 at 10:32 am #

      You are very welcome Daniel!

  74. Avatar
    Daniel May 12, 2022 at 9:51 am #

    Is recommendable to use categorical data (encoded either one hot or ordinal) with on a linear regression?

    • Avatar
      James Carmichael May 13, 2022 at 12:54 am #

      Hi Daniel…Please elaborate more on what you are wanting to accomplish with your model so that we may better assist you.

  75. Avatar
    Niloofar Rahmani May 13, 2022 at 11:44 pm #

    I enjoy reading your blogs. Thanks for your time and efforts

    • Avatar
      James Carmichael May 14, 2022 at 4:51 am #

      Excellent feedback Niloofar!

  76. Avatar
    JR July 7, 2022 at 1:15 am #

    Hi I fail to understand boosting, if a model has weak learners wouldn’t we want to drop these from the model?
    Almost seems like boosting would over for, no?

  77. Avatar
    Mimi August 13, 2022 at 4:26 am #

    Hi Jason

    Interesting highlights from different perspectives about how to talk about data in machine learning.

    It is exactly like speaking in different languages for similar meaning.

    • Avatar
      James Carmichael August 13, 2022 at 5:56 am #

      Thank you for your support and feedback Mimi! It is greatly appreciated!

  78. Avatar
    Chinedu Ekechukwu October 28, 2022 at 2:56 am #

    Hi Team, is there a well explained tutorial on multinomial logistic regression? My DV is a 4-class category variable. I have more than 6 IVs.

    • Avatar
      James Carmichael October 28, 2022 at 8:27 am #

      Hi Chinedu…Please see my email related to this topic.

  79. Avatar
    Tim Astras March 26, 2023 at 12:16 am #

    Great information with Machine Learning put into basic terminology for the reader to easily understand how it works.

    Great Work Jason!

    • Avatar
      James Carmichael March 26, 2023 at 10:30 am #

      Thank you Tim for your feedback and support! We greatly appreciate it.

  80. Avatar
    Bryan June 20, 2023 at 3:14 am #

    Just finished the mini-course. I thought it might take me a week (at a pace of 2 lessons per day). I was able to finish it in one evening. This was perfect for me because I can’t program (yet), my math is rusty, but I still want to learn about machine learning. I guess I’m ready for the next step? Thanks!

  81. Avatar
    Bryan June 22, 2023 at 7:21 pm #

    I was a bit confused between bias and variance. So I asked ChatGPT. I thought Chat’s response was helpful, and sharing it here in case it is helpful to others, or in case it is wrong so experts can opine.

    In short, Chat said that the problem is about fit. Either underfitting or overfitting. Underfit algorithms fail to capture the underlying patterns and relationships in the data. They are high bias. They perform poorly both with training data and when predicting because the algorithm didn’t really “get” the data.

    And then overfit algorithms capture the noise and random fluctuations in the data. They have high variance. So unlike underfit algorithms, they perform really well with training data. But because they got sidetracked with noise and random fluctuations they too lose sight of the true underlying pattern and so provide poor predictions when faced with new data.

    So there is a U-shape curve for prediction error. As fit improves (becoming less and less underfit), error goes down, up to a certain point; after bottoming out, prediction error climbs back up as fit gets tighter (becoming more and more overfit). So the conceptual aim seems to be to find the error minimum point in the U-shape curve by balancing between underfit (bias) and overfit (variance).

    That’s what I got out of ChatGPT. And off I go back to my e-book. I’m studying “Master Machine Learning Algorithms” right now. 🙂

  82. Avatar
    D July 26, 2023 at 4:33 pm #

    I just loved this post J.!

    • Avatar
      James Carmichael July 27, 2023 at 9:16 am #

      Thank you for your feedback D!

  83. Avatar
    Aylton Jamieson October 26, 2023 at 7:51 pm #

    Dear Jason

    In your lesson 2, you state that generally we don’t know what the function looks like, if we did we would use it directly. I have a function which I understand is a truncated normal distribution (based on twenty repetitions on a samples of 2060). I don’t think that this format allows me to send you a copy of the chart or the data.

    I don’t know how to use it directly: Three questions:

    1) What do you use to assess the function directly?

    2) If the above function is a tabular classification problem, which algorithm would you use to get the best result?

    3) Would something like AutoGluon do a good job?



  84. Avatar
    Satya November 17, 2023 at 8:47 pm #

    Lesson 2: The Principle That Underpins All Algorithms


    i am making an ML MODEL for ECG analysis. Shall take your help for the same.



    • Avatar
      James Carmichael November 18, 2023 at 10:19 am #

      Hi Satya…Thank you for your feedback! Let us know if you have any questions regarding our content.

  85. Avatar
    Faraz November 21, 2023 at 1:28 pm #

    Great and effective learning forum

    • Avatar
      James Carmichael November 22, 2023 at 10:29 am #

      Thank you Faraz for your feedback and support! Let us know if we can help answer questions you may have as you work through our mini-courses!

Leave a Reply