[New Book] Click to get The Beginner's Guide to Data Science!
Use the offer code 20offearlybird to get 20% off. Hurry, sale ends soon!

Parametric and Nonparametric Machine Learning Algorithms

What is a parametric machine learning algorithm and how is it different from a nonparametric machine learning algorithm?

In this post you will discover the difference between parametric and nonparametric machine learning algorithms.

Kick-start your project with my new book Master Machine Learning Algorithms, including step-by-step tutorials and the Excel Spreadsheet files for all examples.

Let’s get started.

Parametric and Nonparametric Machine Learning Algorithms

Parametric and Nonparametric Machine Learning Algorithms
Photo by John M., some rights reserved.

Learning a Function

Machine learning can be summarized as learning a function (f) that maps input variables (X) to output variables (Y).

Y = f(x)

An algorithm learns this target mapping function from training data.

The form of the function is unknown, so our job as machine learning practitioners is to evaluate different machine learning algorithms and see which is better at approximating the underlying function.

Different algorithms make different assumptions or biases about the form of the function and how it can be learned.

Get your FREE Algorithms Mind Map

Machine Learning Algorithms Mind Map

Sample of the handy machine learning algorithms mind map.

I've created a handy mind map of 60+ algorithms organized by type.

Download it, print it and use it. 


Also get exclusive access to the machine learning algorithms email mini-course.

 

 

Parametric Machine Learning Algorithms

Assumptions can greatly simplify the learning process, but can also limit what can be learned. Algorithms that simplify the function to a known form are called parametric machine learning algorithms.

A learning model that summarizes data with a set of parameters of fixed size (independent of the number of training examples) is called a parametric model. No matter how much data you throw at a parametric model, it won’t change its mind about how many parameters it needs.

Artificial Intelligence: A Modern Approach, page 737

The algorithms involve two steps:

  1. Select a form for the function.
  2. Learn the coefficients for the function from the training data.

An easy to understand functional form for the mapping function is a line, as is used in linear regression:

b0 + b1*x1 + b2*x2 = 0

Where b0, b1 and b2 are the coefficients of the line that control the intercept and slope, and x1 and x2 are two input variables.

Assuming the functional form of a line greatly simplifies the learning process. Now, all we need to do is estimate the coefficients of the line equation and we have a predictive model for the problem.

Often the assumed functional form is a linear combination of the input variables and as such parametric machine learning algorithms are often also called “linear machine learning algorithms“.

The problem is, the actual unknown underlying function may not be a linear function like a line. It could be almost a line and require some minor transformation of the input data to work right. Or it could be nothing like a line in which case the assumption is wrong and the approach will produce poor results.

Some more examples of parametric machine learning algorithms include:

  • Logistic Regression
  • Linear Discriminant Analysis
  • Perceptron
  • Naive Bayes
  • Simple Neural Networks

Benefits of Parametric Machine Learning Algorithms:

  • Simpler: These methods are easier to understand and interpret results.
  • Speed: Parametric models are very fast to learn from data.
  • Less Data: They do not require as much training data and can work well even if the fit to the data is not perfect.

Limitations of Parametric Machine Learning Algorithms:

  • Constrained: By choosing a functional form these methods are highly constrained to the specified form.
  • Limited Complexity: The methods are more suited to simpler problems.
  • Poor Fit: In practice the methods are unlikely to match the underlying mapping function.

Nonparametric Machine Learning Algorithms

Algorithms that do not make strong assumptions about the form of the mapping function are called nonparametric machine learning algorithms. By not making assumptions, they are free to learn any functional form from the training data.

Nonparametric methods are good when you have a lot of data and no prior knowledge, and when you don’t want to worry too much about choosing just the right features.

Artificial Intelligence: A Modern Approach, page 757

Nonparametric methods seek to best fit the training data in constructing the mapping function, whilst maintaining some ability to generalize to unseen data. As such, they are able to fit a large number of functional forms.

An easy to understand nonparametric model is the k-nearest neighbors algorithm that makes predictions based on the k most similar training patterns for a new data instance. The method does not assume anything about the form of the mapping function other than patterns that are close are likely to have a similar output variable.

Some more examples of popular nonparametric machine learning algorithms are:

  • k-Nearest Neighbors
  • Decision Trees like CART and C4.5
  • Support Vector Machines

Benefits of Nonparametric Machine Learning Algorithms:

  • Flexibility: Capable of fitting a large number of functional forms.
  • Power: No assumptions (or weak assumptions) about the underlying function.
  • Performance: Can result in higher performance models for prediction.

Limitations of Nonparametric Machine Learning Algorithms:

  • More data: Require a lot more training data to estimate the mapping function.
  • Slower: A lot slower to train as they often have far more parameters to train.
  • Overfitting: More of a risk to overfit the training data and it is harder to explain why specific predictions are made.

Further Reading

This section lists some resources if you are looking to learn more about the difference between parametric and non-parametric machine learning algorithms.

Books

Posts

Summary

In this post you have discovered the difference between parametric and nonparametric machine learning algorithms.

You learned that parametric methods make large assumptions about the mapping of the input variables to the output variable and in turn are faster to train, require less data but may not be as powerful.

You also learned that nonparametric methods make few or no assumptions about the target function and in turn require a lot more data, are slower to train and have a higher model complexity but can result in more powerful models.

If you have any questions about parametric or nonparametric machine learning algorithms or this post, leave a comment and I will do my best to answer them.

Update: I originally had some algorithms listed under the wrong sections like neural nets and naive bayes, which made things confusing. All fixed now.

Discover How Machine Learning Algorithms Work!

Mater Machine Learning Algorithms

See How Algorithms Work in Minutes

...with just arithmetic and simple examples

Discover how in my new Ebook:
Master Machine Learning Algorithms

It covers explanations and examples of 10 top algorithms, like:
Linear Regression, k-Nearest Neighbors, Support Vector Machines and much more...

Finally, Pull Back the Curtain on
Machine Learning Algorithms

Skip the Academics. Just Results.

See What's Inside

64 Responses to Parametric and Nonparametric Machine Learning Algorithms

  1. Avatar
    confused beginner March 14, 2016 at 6:02 pm #

    hi jason

    thanks for taking your time to summarize these topics so that even a novice like me can understand. love your posts

    i have a problem with this article though, according to the small amount of knowledge i have on parametric/non parametric models, non parametric models are models that need to keep the whole data set around to make future predictions. and it looks like Artificial Intelligence: A Modern Approach, chapter 18 agrees with me on this fact stating neural nets are parametric and once the weights w are learnt we can get rid of the training set. i would say its the same case with trees/naive bays as well.

    so what was your thinking behind in categorizing these methods as non-parametric?

    thanks,
    a confused beginner

    • Avatar
      Jason Brownlee July 17, 2016 at 6:57 am #

      Indeed simple multilayer perceptron neural nets are parametric models.

      Non-parametric models do not need to keep the whole dataset around, but one example of a non-parametric algorithm is kNN that does keep the whole dataset. Instead, non-parametric models can vary the number of parameters, like the number of nodes in a decision tree or the number of support vectors, etc.

      • Avatar
        mlvi July 27, 2017 at 1:49 am #

        Isn’t number of nodes in the decision tree a hyper parameter?

        One more question is, How do you deploy non parametric machine learning models in production as there parameters are not fixed?

    • Avatar
      Avishek Chakraborty June 22, 2020 at 2:23 pm #

      Excellent… Master class types

  2. Avatar
    Another confused beginner March 15, 2016 at 3:13 am #

    I am also interesting to know why Naive Bayes is categorized as non-parametric.

    • Avatar
      Jason Brownlee July 17, 2016 at 7:06 am #

      Yes, Naive bayes is generally a parametric method as we choose a distribution (Gaussian) for the input variables, although there are non-parametric formulations of the method that use a kernel estimator. In fact, these may be more common in practice.

  3. Avatar
    Ecolss March 15, 2016 at 5:41 pm #

    Confused here too.

    AFAIK, parametric models have fixed parameter set, i.e. the amount of parameters won’t change once you have designed the model, whereas the amount of parameters of non-parametric models varies, for example, Gaussian Process and matrix factorization for collaborative filtering etc.

    Correct me if I’m wrong 🙂

    • Avatar
      Jason Brownlee July 17, 2016 at 7:06 am #

      This is correct.

    • Avatar
      Mutlu ÅžimÅŸek November 2, 2022 at 12:55 am #

      This is the single useful explanation. Thanks.

  4. Avatar
    Simon Tse July 16, 2016 at 10:21 pm #

    I think the classification does not really depend on what ‘parameters’ are. It’s about the assumption you have made when you try to construct a model or function. Parametric models usually has a probability model (i.e. pdf) behind to support the function-finding process such as normal distribution or other distribution model.

    On the other hand, non-parametric model just depends on the error minimisation search process to identify the set of ‘parameters’ which has nothing to do with a pdf.

    So, parameters are still there for both parametric and non-parametric ML algo. It just doesn’t have additional layer of assumption to govern the nature of pdf of which the ML algo tries to determine.

    • Avatar
      Jason Brownlee July 17, 2016 at 7:10 am #

      Hi Simon, the statistical definition of parametric and non-parametric does not agree with you.

      The crux of the definition is whether the number of parameters is fixed or not.

      It might be more helpful for us to consider linear and non-linear methods instead…

  5. Avatar
    Kevin August 11, 2016 at 1:11 pm #

    Is there a relation between parametric/nonparametric models and lazy/eager learning?

    • Avatar
      ANUDEEP VANJAVAKAM September 24, 2016 at 11:29 am #

      In machine learning literature, nonparametric methods are also
      call instance-based or memory-based learning algorithms.
      -Store the training instances in a lookup table and interpolate
      from these for prediction.
      -Lazy learning algorithm, as opposed to the eager parametric
      methods, which have simple model and a small number
      of parameters, and once parameters are learned we no longer
      keep the training set.

  6. Avatar
    Jianye September 27, 2016 at 11:18 am #

    I have questions of distinguishing between parametric and non parametric algorithms: 1) for linear regression, we can also introducing x^2, x^3 … to make the boundary we learned nonlinear, does it mean that it becomes non parametric in this case?

    2) The main difference between them is that SVM puts additional constraints on how do we select the hyperplane . Why perception is considered as parametric while svm is not?

    • Avatar
      Jason Brownlee September 28, 2016 at 7:35 am #

      Hi Jianye,

      When it comes down to it, parametric means a fixed number of model parameters to define the modeled decision.

      Adding more inputs makes the linear regression equation still parametric.

      SVM can choose the number of support vectors based on the data and hyperparameter tuning, making it non-parametric.

      I hope that is clearer.

  7. Avatar
    Pramit Choudhary January 23, 2017 at 1:09 pm #

    Hi Jason,
    Nice content here. Had some suggestions,
    1. Do you think, it would be a good idea to include histogram: as a simple non-parametric model for estimation probability distribution ? Some beginners might be able to related to histograms.
    2. Also, may be mentioning SVM(RBF kernel) as non-parametric to be precise.
    What do you think ?

    • Avatar
      Jason Brownlee January 24, 2017 at 10:54 am #

      Hi Pramit,

      1. nice suggestion.
      2. perhaps, there is debate about where SVM sits. I do think it is nonparametric as the number of support vectors is chosen based on the data and the interaction with the argument-defined margin.

  8. Avatar
    Manish Barnwal March 30, 2017 at 8:50 pm #

    Jason, as always, an excellent post.

  9. Avatar
    amr gamal April 12, 2017 at 1:40 am #

    jason ,it is a good post about parametric and non parametric model
    but i still confused
    did deep learning supposed to be parametric or non parametric and why
    Best Regards

  10. Avatar
    Aishwarya May 4, 2017 at 8:10 am #

    Hi
    The answer is very convincing, i just have a small question, for pressure distribution plots which ML algorithm should we consider?

    • Avatar
      Jason Brownlee May 5, 2017 at 7:26 am #

      Sorry, I don’t know what pressure distribution plots are.

  11. Avatar
    Sanket Maheshwari May 17, 2017 at 7:45 am #

    Hi Jason,

    Decision tree contains parameters like Splitting Criteria, Minimal Size, Minimal Leaf Size, Minimal Gain, Maximal Depth then why it is called as non-parametric. Please throw some light on it.

    • Avatar
      Jason Brownlee May 17, 2017 at 8:45 am #

      They are considered hyperparameters of the model.

      The chosen split points are the parameters of the model and their number can vary based on specific data. Thus, the decision tree is a nonparametric algorithm.

      Does that make sense?

  12. Avatar
    Sanket Maheshwari May 18, 2017 at 7:37 pm #

    Could you please briefly tell me what are the parameters and hyperparameters in the following models:

    1.Naive Baye
    2.KNN
    3.Decision Tree
    4.Multiple Regression
    5.Logistic Regression

    • Avatar
      Jason Brownlee May 19, 2017 at 8:16 am #

      Yes, please search the blog for posts on each of these algorithms.

  13. Avatar
    Guiferviz November 3, 2017 at 10:45 pm #

    Hi Jason! Nice blog.

    I have a doubt about the “simple neural networks”, shouldn’t it be “neural networks” in general? The number of parameters is determined a priori.

    In addition, I think that linear SVM might be considered as a parametric model because, despite the number of support vector varies with the data, the final decision boundary can be expressed as a fixed number of parameters.

    I know the distinction between parametric and non-parametric is a little bit ambiguous, but what I said makes sense, right?

    • Avatar
      duribef May 12, 2018 at 4:06 am #

      Up to this question! I have the same doubt about linear SVM.

      Saludos!

  14. Avatar
    Aniket Saxena November 7, 2017 at 3:25 am #

    Hi Jason, I want to know that despite having not
    required much data to train, does the parametric algorithms also cause overfitting? Or can they be lead to underfitting, instead?

    • Avatar
      Jason Brownlee November 7, 2017 at 9:53 am #

      Both types of algorithms can over and under fit data.

      It is more common that parametric underfit and non-parametric overfit.

  15. Avatar
    Aniket Saxena November 8, 2017 at 12:20 am #

    Hi Jason, thanks for your help but there is a request by my side to also look question posted above my question because it is a nice question about distinction between parametric vs non-parametric and I am very curious to know your opinion about this question posted by Guiferviz on november 3, 2017. Please answer to this question…….

  16. Avatar
    Magnus January 31, 2018 at 9:10 pm #

    Hi Jason, you mention that simple multilayer perceptron neural nets are parametric models. This I understand, but which neural networks are then non-parametric? I assume e.g. that neural nets with dropouts are non-parametric?

  17. Avatar
    ali October 1, 2018 at 11:34 pm #

    if we are doing regression for decision trees
    do we need to check for correlation among the features?
    when we talk about nonparamertic or parametric are we talking about the method like CART or we are talking about the data.

    and if my data are not normally distributed do I have to do data transformation to make them normally dis. if I want to use parametric or nonparamertic

    • Avatar
      Jason Brownlee October 2, 2018 at 6:25 am #

      It is a good idea to make the problem as simple as possible for the model.

      Nonlinear methods do not require data with a Normal distribution.

  18. Avatar
    sindhu October 7, 2018 at 12:37 am #

    Hi Jason,
    Good post.Could u pls explain parametric and non parametric methods by an example?
    Bit confused about the parameters(what are the parameters,model parameters).For example,in the script the X and y values are the parameters?

  19. Avatar
    Yogesh Soni June 9, 2019 at 4:19 am #

    Hi Jason

    Can you post or let me know about parameter tuning.

    • Avatar
      Jason Brownlee June 9, 2019 at 6:22 am #

      Yes, I have many posts, try a search for “grid search”

  20. Avatar
    Smita Bhagwat December 4, 2019 at 1:42 am #

    Hi Jason, Can you throw some light on Semi Parametric Models and examples of them?

    • Avatar
      Jason Brownlee December 4, 2019 at 5:40 am #

      I’ve not heard of them before.

      Do you have an example?

  21. Avatar
    Tarkan January 12, 2020 at 8:51 am #

    Hi Jason,

    If you may just use K-NN, naive bayes, simple perceptron and multilayer perceptron for building a real time prediction system in a web based application, which algorithm you use for classification and why ? Can you please tell me algorithm’s advantages and disadvantages for this situation ?

    Thank you.

    • Avatar
      Jason Brownlee January 13, 2020 at 8:17 am #

      I would test each and se the one that gave the best performance.

  22. Avatar
    Milan May 14, 2020 at 8:07 am #

    Hi Jason,
    Nice summary and clear examples.
    But I have one problem with understanding…

    Why does the division of models into parametric and non-parametric take only as a criterion whether the number of parameters is fixed and whether we have assumptions of the function?

    Shouldn’t there be a criterion for whether the distribution of attributes is known?

    I know that it is possible if we know the parameters, say the mean and the variance in normal distribution that we can fully determine it with those parameters.

    But here we have an example that hypothesis (b) that it has a given mean but unspecified variance is a parametric hypothesis
    https://en.wikipedia.org/wiki/Nonparametric_statistics

    Does the division of models into parametric and nonparametric differences from the division of statistical methods into parametric and nonparametric?
    Is it an argument that these are different things, models serve for prediction, while methods serve for hypothesis testing? For some it is necessary to know the distribution, while for others it is not?

    Isn’t it an advantage to have more information about the distribution shape, do some models imply a certain distribution (e.g. normal) or do they simply give better results with a certain distribution?

    I know that not all distributions can be converted to a normal distribution without losing the essential distance between the points. Does it make sense to pre-process with logarithmic transformation all numerical attributes (try to convert to normal) to improve model performance?

    Here are some discussion about that topic
    https://www.kaggle.com/getting-started/47679

    • Avatar
      Jason Brownlee May 14, 2020 at 1:25 pm #

      Thanks.

      It is just one approach to think about the diffrent types of algorithms, not the only approach.

      Yes, it is related – e.g. do we know and specify the functional form of the relationship or not.

  23. Avatar
    Jacques Coetzee July 24, 2020 at 6:57 pm #

    Thanks Jason for your great site .

    I appreciate your vast experience and insights and for that reason I feel confident to ask you a machine learning question.

    I want to determine the remaining useful life of a railway wagon based on 9 measured parameters (e.g. Hollow wear, tread wear, etc) for every wheel (8 wheels). I do not have any labeled data and therefore I know that it is an unsupervised learning problem. A regression is not possible to predict the time required. I tried k-means, where k=5 (optimal k according to elbow method) but cant make sense of the result. Do you have any suggestions of what algorithm I can use for this situation?

  24. Avatar
    Harsheni September 15, 2020 at 1:01 pm #

    It’s good to read this. But I have a question regarding parametric methods.

    How parametric methods are exceptional in certain cases in machine learning?

    May i know the answer with example?

    TIA

    • Avatar
      Jason Brownlee September 15, 2020 at 2:53 pm #

      Good question, parametric methods are excellent when they fit the problem you are solving well. They can be the most efficient and most effective.

  25. Avatar
    Senen January 5, 2021 at 2:28 pm #

    You rocked man, I was confused but because of you I am jem clear now most probably.
    Thanks again..

  26. Avatar
    Mansoor Mahfoud June 13, 2021 at 4:23 pm #

    Hi Jason, great article explaining the subject in short clear summary. But from my understanding about those algorithms and particularly KNN, I have though a different opinion, which might be wrong, about the benefits and limitations as presented in this article, if the three cited examples are all nonparametric (and they are). For benefits/performance, does not apply to all three examples. KNN can be very slow in prediction, the more data, the slower it gets because it needs to compute the distance from each data sample hen sort it. On the contrary, also Limitations/slow training does not apply to KNN as KNN is supper fast in training (in fact it takes no time because it need not train anything). I am very interested to know your feedback on this.

    • Avatar
      Jason Brownlee June 14, 2021 at 5:34 am #

      Yes, KNN is fast during training and relatively slow during inference/prediction.

  27. Avatar
    Amy Tang August 12, 2021 at 1:55 pm #

    Hi Jason,

    From my reading, Perceptron seems to be non-parametric instead of parametric..

    Thanks.

  28. Avatar
    TheFlash January 19, 2022 at 1:25 pm #

    hi jason,
    Can you explain these lines which you have stated above in more simpler way,

    “The problem is, the actual unknown underlying function may not be a linear function like a line. It could be almost a line and require some minor transformation of the input data to work right. Or it could be nothing like a line in which case the assumption is wrong and the approach will produce poor results.”

    • Avatar
      James Carmichael January 20, 2022 at 8:39 am #

      Hello…Please explain what part is not clear to you so that I may better help you.

  29. Avatar
    Elie May 3, 2022 at 1:45 am #

    Hello,

    why isn’t mentioned the “kernel density estimtion (KDE)” for nonparametric estimation? Is it considered within SVM?

    Best,

  30. Avatar
    Elham May 9, 2022 at 4:29 pm #

    why can’t I download the photo of algorithms?

    • Avatar
      James Carmichael May 10, 2022 at 12:11 pm #

      Hi Elham…Please elaborate on what you are trying to do and what you are experiencing?

Leave a Reply