What is the Difference Between a Parameter and a Hyperparameter?

It can be confusing when you get started in applied machine learning.

There are so many terms to use and many of the terms may not be used consistently. This is especially true if you have come from another field of study that may use some of the same terms as machine learning, but they are used differently.

For example: the terms “model parameter” and “model hyperparameter.”

Not having a clear definition for these terms is a common struggle for beginners, especially those that have come from the fields of statistics or economics.

In this post, we will take a closer look at these terms.

What is the Difference Between a Parameter and a Hyperparameter?

What is the Difference Between a Parameter and a Hyperparameter?
Photo by Bruce Guenter, some rights reserved.

What is a Model Parameter?

A model parameter is a configuration variable that is internal to the model and whose value can be estimated from data.

  • They are required by the model when making predictions.
  • They values define the skill of the model on your problem.
  • They are estimated or learned from data.
  • They are often not set manually by the practitioner.
  • They are often saved as part of the learned model.

Parameters are key to machine learning algorithms. They are the part of the model that is learned from historical training data.

In classical machine learning literature, we may think of the model as the hypothesis and the parameters as the tailoring of the hypothesis to a specific set of data.

Often model parameters are estimated using an optimization algorithm, which is a type of efficient search through possible parameter values.

  • Statistics: In statistics, you may assume a distribution for a variable, such as a Gaussian distribution. Two parameters of the Gaussian distribution are the mean (mu) and the standard deviation (sigma). This holds in machine learning, where these parameters may be estimated from data and used as part of a predictive model.
  • Programming: In programming, you may pass a parameter to a function. In this case, a parameter is a function argument that could have one of a range of values. In machine learning, the specific model you are using is the function and requires parameters in order to make a prediction on new data.

Whether a model has a fixed or variable number of parameters determines whether it may be referred to as “parametric” or “nonparametric“.

Some examples of model parameters include:

  • The weights in an artificial neural network.
  • The support vectors in a support vector machine.
  • The coefficients in a linear regression or logistic regression.

What is a Model Hyperparameter?

A model hyperparameter is a configuration that is external to the model and whose value cannot be estimated from data.

  • They are often used in processes to help estimate model parameters.
  • They are often specified by the practitioner.
  • They can often be set using heuristics.
  • They are often tuned for a given predictive modeling problem.

We cannot know the best value for a model hyperparameter on a given problem. We may use rules of thumb, copy values used on other problems, or search for the best value by trial and error.

When a machine learning algorithm is tuned for a specific problem, such as when you are using a grid search or a random search, then you are tuning the hyperparameters of the model or order to discover the parameters of the model that result in the most skillful predictions.

Many models have important parameters which cannot be directly estimated from the data. For example, in the K-nearest neighbor classification model … This type of model parameter is referred to as a tuning parameter because there is no analytical formula available to calculate an appropriate value.

— Page 64-65, Applied Predictive Modeling, 2013

Model hyperparameters are often referred to as model parameters which can make things confusing. A good rule of thumb to overcome this confusion is as follows:

If you have to specify a model parameter manually then
it is probably a model hyperparameter.

Some examples of model hyperparameters include:

  • The learning rate for training a neural network.
  • The C and sigma hyperparameters for support vector machines.
  • The k in k-nearest neighbors.

Further Reading

Summary

In this post, you discovered the clear definitions and the difference between model parameters and model hyperparameters.

In summary, model parameters are estimated from data automatically and model hyperparameters are set manually and are used in processes to help estimate model parameters.

Model hyperparameters are often referred to as parameters because they are the parts of the machine learning that must be set manually and tuned.

Did this post help you clear up the confusion?
Let me know in the comments below.

Are there model parameters or hyperparameters that you are still unsure about?
Post them in the comments and I’ll do my best to help clear things up further.

34 Responses to What is the Difference Between a Parameter and a Hyperparameter?

  1. Kiki July 26, 2017 at 7:32 am #

    Awesome article! This was a big point of confusion, as I wasn’t sure what “knobs” I had at my disposal to tune my model — there are a lot of them, but they weren’t all in one place like the dash of a car. 🙂 Thank you for making this clear!

  2. Dr Alan Beckles July 26, 2017 at 7:57 am #

    Excellent post, Jason. Thanks!

  3. ujjawal sinha July 26, 2017 at 8:02 am #

    Thanks Jason , Excellent

  4. Wesley July 26, 2017 at 8:04 am #

    Great explanation…

  5. Deepak Sharma July 27, 2017 at 3:57 am #

    Superb explanation Jason….love reading your articles!!!

  6. Jie July 27, 2017 at 6:06 pm #

    In part model para, you give this example “The support vectors in a support vector machine.” I am a little confusing, why not the coefficients in SVM?

    • Jason Brownlee July 28, 2017 at 8:29 am #

      We call the instances found by SVM “support vectors” they are technically not “weights” or “coefficients”.

  7. Luis July 28, 2017 at 6:15 am #

    Great post, Jason. Thanks!

    One question: k-nearest neighbourhood is considered a non parametric model (vs parametric models). Shouldn’t k be considered as a hyperparameter then?

    • Jason Brownlee July 28, 2017 at 8:36 am #

      The “k” in kNN is a hyperparameter. I say exactly this Luis.

  8. Luis July 28, 2017 at 6:22 am #

    The confounding part was the use of “parameter” in:

    “Many models have important parameters which cannot be directly estimated from the data. For example, in the K-nearest neighbor classification model … This type of model parameter is referred to as a tuning parameter because there is no analytical formula available to calculate an appropriate value.”

    • Jason Brownlee July 28, 2017 at 8:37 am #

      Why is this confounding Luis?

      • Tommy July 31, 2017 at 7:39 pm #

        The book Applied Predictive Modeling does not contain the word hyperparameter. The article above states that many experts mix up the terms parameter and hyperparameter.

        So what’s the point of including the quote? Here are some potential answers:
        1. The authors used the term “tuning parameter” incorrectly, and should have used the term hyperparameter. This understanding is supported by including the quote in the section on hyperparameters, Furthermore my understanding is that using a threshold for statistical significance as a tuning parameter may be called a hyperparameter because it

        However, I believe that “tuning parameter” is not an incorrect description.

        Also, you linked to the Wikipedia page for Baysian hyperparameters rather than the page for hyperparameters in Machine learning https://en.wikipedia.org/wiki/Hyperparameter_optimization

        The Wikipedia page gives the straightforward definition: “In the context of machine learning, hyperparameters are parameters whose values are set prior to the commencement of the learning process. By contrast, the value of other parameters is derived via training.”

        • Tommy July 31, 2017 at 7:56 pm #

          Correct me if I’m wrong, but according to many definitions, hyperparameters are a type of parameter.

          Synonyms for hyperparameters: tuning parameters, meta parameters, free parameters

          Since hyperparameters are a type of parameter, the two terms are interchangeable when discussing hyperparameters. However, not all parameters are hyperparameters.

          • Jason Brownlee August 1, 2017 at 7:58 am #

            Nice perspective, thanks Tommy.

            I cannot disagree generally, but the distinction is important, especially if you are a beginner trying to figure out what to “configure” or “tune”.

        • Jason Brownlee August 1, 2017 at 7:56 am #

          Hi Tommy, I provided the quote to help clarify the definitions, not as an example of misuse. Sorry for the confusion.

          Nice, your definition matches with the “estimated from data vs not” approach used in the post.

  9. Sasikanth July 28, 2017 at 11:58 am #

    Crystal clear. Thanks Jason

  10. Bharath Bhushan July 28, 2017 at 4:12 pm #

    thanks. I was thinking both of them refer to the same thing. Thanks for clarification.

  11. Ravindra July 28, 2017 at 4:34 pm #

    Awesome! It was really confusing(parameters vs hyperparameter) and I was ignoring it, but this post made it very clear.
    Thank You!!

  12. Abkul July 28, 2017 at 4:44 pm #

    superbly explained.Thanks for the always handy post.

  13. Tim July 29, 2017 at 5:31 pm #

    clf = svm.SVC(C =0.01, kernel =’rbf’, random_state=33)

    ——
    random_state is parameter or hyperparameter?

    • Jason Brownlee July 30, 2017 at 7:45 am #

      Deep Tim… great question!

      A gut check says “hyperparameter”, but we do not optimize it, we control for it. This feels wrong though. Perhaps it is neither.

      What I mean is, it impacts the skill of the model, or most models that are stochastic, but we do not “tune” the value for a specific model/dataset. The idea of the “best” random seed does not make sense. Instead, we would re-run the experiment n times in order to develop a robust estimate of skill. We would create an ensemble of n final models to produce a more robust set of predictions.

      Does that help? Am I making sense?

  14. Vinícius July 30, 2017 at 1:28 am #

    Excellent post! I am currently studying an application of Stacked Autoencoders on passive sonar classification and your posts have been very helpful for me. I have learned a lot with you. Taking advantage, do you have any material on this topic? Or novelty detection? Thank you!

    • Jason Brownlee July 30, 2017 at 7:48 am #

      THanks.

      Sorry, I don’t have posts on these topics, I hope to get to them sometime.

  15. Siva August 2, 2017 at 5:40 pm #

    Good clarification and explanation. Thanks!

Leave a Reply