Last Updated on October 12, 2021

Regression models are fit on training data using linear regression and local search optimization algorithms.

Models like linear regression and logistic regression are trained by least squares optimization, and this is the most efficient approach to finding coefficients that minimize error for these models.

Nevertheless, it is possible to use alternate **optimization algorithms to fit a regression model** to a training dataset. This can be a useful exercise to learn more about how regression functions and the central nature of optimization in applied machine learning. It may also be required for regression with data that does not meet the requirements of a least squares optimization procedure.

In this tutorial, you will discover how to manually optimize the coefficients of regression models.

After completing this tutorial, you will know:

- How to develop the inference models for regression from scratch.
- How to optimize the coefficients of a linear regression model for predicting numeric values.
- How to optimize the coefficients of a logistic regression model using stochastic hill climbing.

**Kick-start your project** with my new book Optimization for Machine Learning, including *step-by-step tutorials* and the *Python source code* files for all examples.

## Tutorial Overview

This tutorial is divided into three parts; they are:

- Optimize Regression Models
- Optimize a Linear Regression Model
- Optimize a Logistic Regression Model

## Optimize Regression Models

Regression models, like linear regression and logistic regression, are well-understood algorithms from the field of statistics.

Both algorithms are linear, meaning the output of the model is a weighted sum of the inputs. Linear regression is designed for “*regression*” problems that require a number to be predicted, and logistic regression is designed for “*classification*” problems that require a class label to be predicted.

These regression models involve the use of an optimization algorithm to find a set of coefficients for each input to the model that minimizes the prediction error. Because the models are linear and well understood, efficient optimization algorithms can be used.

In the case of linear regression, the coefficients can be found by least squares optimization, which can be solved using linear algebra. In the case of logistic regression, a local search optimization algorithm is commonly used.

It is possible to use any arbitrary optimization algorithm to train linear and logistic regression models.

That is, we can define a regression model and use a given optimization algorithm to find a set of coefficients for the model that result in a minimum of prediction error or a maximum of classification accuracy.

Using alternate optimization algorithms is expected to be less efficient on average than using the recommended optimization. Nevertheless, it may be more efficient in some specific cases, such as if the input data does not meet the expectations of the model like a Gaussian distribution and is uncorrelated with outer inputs.

It can also be an interesting exercise to demonstrate the central nature of optimization in training machine learning algorithms, and specifically regression models.

Next, let’s explore how to train a linear regression model using stochastic hill climbing.

### Want to Get Started With Optimization Algorithms?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

## Optimize a Linear Regression Model

The linear regression model might be the simplest predictive model that learns from data.

The model has one coefficient for each input and the predicted output is simply the weights of some inputs and coefficients.

In this section, we will optimize the coefficients of a linear regression model.

First, let’s define a synthetic regression problem that we can use as the focus of optimizing the model.

We can use the make_regression() function to define a regression problem with 1,000 rows and 10 input variables.

The example below creates the dataset and summarizes the shape of the data.

1 2 3 4 5 6 |
# define a regression dataset from sklearn.datasets import make_regression # define dataset X, y = make_regression(n_samples=1000, n_features=10, n_informative=2, noise=0.2, random_state=1) # summarize the shape of the dataset print(X.shape, y.shape) |

Running the example prints the shape of the created dataset, confirming our expectations.

1 |
(1000, 10) (1000,) |

Next, we need to define a linear regression model.

Before we optimize the model coefficients, we must develop the model and our confidence in how it works.

Let’s start by developing a function that calculates the activation of the model for a given input row of data from the dataset.

This function will take the row of data and the coefficients for the model and calculate the weighted sum of the input with the addition of an extra y-intercept (also called the offset or bias) coefficient. The *predict_row()* function below implements this.

We are using simple Python lists and imperative programming style instead of NumPy arrays or list compressions intentionally to make the code more readable for Python beginners. Feel free to optimize it and post your code in the comments below.

1 2 3 4 5 6 7 8 |
# linear regression def predict_row(row, coefficients): # add the bias, the last coefficient result = coefficients[-1] # add the weighted input for i in range(len(row)): result += coefficients[i] * row[i] return result |

Next, we can call the *predict_row()* function for each row in a given dataset. The *predict_dataset()* function below implements this.

Again, we are intentionally using a simple imperative coding style for readability instead of list compressions.

1 2 3 4 5 6 7 8 9 |
# use model coefficients to generate predictions for a dataset of rows def predict_dataset(X, coefficients): yhats = list() for row in X: # make a prediction yhat = predict_row(row, coefficients) # store the prediction yhats.append(yhat) return yhats |

Finally, we can use the model to make predictions on our synthetic dataset to confirm it is all working correctly.

We can generate a random set of model coefficients using the rand() function.

Recall that we need one coefficient for each input (ten inputs in this dataset) plus an extra weight for the y-intercept coefficient.

1 2 3 4 5 6 7 |
... # define dataset X, y = make_regression(n_samples=1000, n_features=10, n_informative=2, noise=0.2, random_state=1) # determine the number of coefficients n_coeff = X.shape[1] + 1 # generate random coefficients coefficients = rand(n_coeff) |

We can then use these coefficients with the dataset to make predictions.

1 2 3 |
... # generate predictions for dataset yhat = predict_dataset(X, coefficients) |

We can evaluate the mean squared error of these predictions.

1 2 3 4 |
... # calculate model prediction error score = mean_squared_error(y, yhat) print('MSE: %f' % score) |

That’s it.

We can tie all of this together and demonstrate our linear regression model for regression predictive modeling. The complete example is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
# linear regression model from numpy.random import rand from sklearn.datasets import make_regression from sklearn.metrics import mean_squared_error # linear regression def predict_row(row, coefficients): # add the bias, the last coefficient result = coefficients[-1] # add the weighted input for i in range(len(row)): result += coefficients[i] * row[i] return result # use model coefficients to generate predictions for a dataset of rows def predict_dataset(X, coefficients): yhats = list() for row in X: # make a prediction yhat = predict_row(row, coefficients) # store the prediction yhats.append(yhat) return yhats # define dataset X, y = make_regression(n_samples=1000, n_features=10, n_informative=2, noise=0.2, random_state=1) # determine the number of coefficients n_coeff = X.shape[1] + 1 # generate random coefficients coefficients = rand(n_coeff) # generate predictions for dataset yhat = predict_dataset(X, coefficients) # calculate model prediction error score = mean_squared_error(y, yhat) print('MSE: %f' % score) |

Running the example generates a prediction for each example in the training dataset, then prints the mean squared error for the predictions.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

We would expect a large error given a set of random weights, and that is what we see in this case, with an error value of about 7,307 units.

1 |
MSE: 7307.756740 |

We can now optimize the coefficients of the dataset to achieve low error on this dataset.

First, we need to split the dataset into train and test sets. It is important to hold back some data not used in optimizing the model so that we can prepare a reasonable estimate of the performance of the model when used to make predictions on new data.

We will use 67 percent of the data for training and the remaining 33 percent as a test set for evaluating the performance of the model.

1 2 3 |
... # split into train test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33) |

Next, we can develop a stochastic hill climbing algorithm.

The optimization algorithm requires an objective function to optimize. It must take a set of coefficients and return a score that is to be minimized or maximized corresponding to a better model.

In this case, we will evaluate the mean squared error of the model with a given set of coefficients and return the error score, which must be minimized.

The *objective()* function below implements this, given the dataset and a set of coefficients, and returns the error of the model.

1 2 3 4 5 6 7 |
# objective function def objective(X, y, coefficients): # generate predictions for dataset yhat = predict_dataset(X, coefficients) # calculate accuracy score = mean_squared_error(y, yhat) return score |

Next, we can define the stochastic hill climbing algorithm.

The algorithm will require an initial solution (e.g. random coefficients) and will iteratively keep making small changes to the solution and checking if it results in a better performing model. The amount of change made to the current solution is controlled by a step_size hyperparameter. This process will continue for a fixed number of iterations, also provided as a hyperparameter.

The *hillclimbing()* function below implements this, taking the dataset, objective function, initial solution, and hyperparameters as arguments and returns the best set of coefficients found and the estimated performance.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# hill climbing local search algorithm def hillclimbing(X, y, objective, solution, n_iter, step_size): # evaluate the initial point solution_eval = objective(X, y, solution) # run the hill climb for i in range(n_iter): # take a step candidate = solution + randn(len(solution)) * step_size # evaluate candidate point candidte_eval = objective(X, y, candidate) # check if we should keep the new point if candidte_eval <= solution_eval: # store the new point solution, solution_eval = candidate, candidte_eval # report progress print('>%d %.5f' % (i, solution_eval)) return [solution, solution_eval] |

We can then call this function, passing in an initial set of coefficients as the initial solution and the training dataset as the dataset to optimize the model against.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
... # define the total iterations n_iter = 2000 # define the maximum step size step_size = 0.15 # determine the number of coefficients n_coef = X.shape[1] + 1 # define the initial solution solution = rand(n_coef) # perform the hill climbing search coefficients, score = hillclimbing(X_train, y_train, objective, solution, n_iter, step_size) print('Done!') print('Coefficients: %s' % coefficients) print('Train MSE: %f' % (score)) |

Finally, we can evaluate the best model on the test dataset and report the performance.

1 2 3 4 5 6 |
... # generate predictions for the test dataset yhat = predict_dataset(X_test, coefficients) # calculate accuracy score = mean_squared_error(y_test, yhat) print('Test MSE: %f' % (score)) |

Tying this together, the complete example of optimizing the coefficients of a linear regression model on the synthetic regression dataset is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
# optimize linear regression coefficients for regression dataset from numpy.random import randn from numpy.random import rand from sklearn.datasets import make_regression from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error # linear regression def predict_row(row, coefficients): # add the bias, the last coefficient result = coefficients[-1] # add the weighted input for i in range(len(row)): result += coefficients[i] * row[i] return result # use model coefficients to generate predictions for a dataset of rows def predict_dataset(X, coefficients): yhats = list() for row in X: # make a prediction yhat = predict_row(row, coefficients) # store the prediction yhats.append(yhat) return yhats # objective function def objective(X, y, coefficients): # generate predictions for dataset yhat = predict_dataset(X, coefficients) # calculate accuracy score = mean_squared_error(y, yhat) return score # hill climbing local search algorithm def hillclimbing(X, y, objective, solution, n_iter, step_size): # evaluate the initial point solution_eval = objective(X, y, solution) # run the hill climb for i in range(n_iter): # take a step candidate = solution + randn(len(solution)) * step_size # evaluate candidate point candidte_eval = objective(X, y, candidate) # check if we should keep the new point if candidte_eval <= solution_eval: # store the new point solution, solution_eval = candidate, candidte_eval # report progress print('>%d %.5f' % (i, solution_eval)) return [solution, solution_eval] # define dataset X, y = make_regression(n_samples=1000, n_features=10, n_informative=2, noise=0.2, random_state=1) # split into train test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33) # define the total iterations n_iter = 2000 # define the maximum step size step_size = 0.15 # determine the number of coefficients n_coef = X.shape[1] + 1 # define the initial solution solution = rand(n_coef) # perform the hill climbing search coefficients, score = hillclimbing(X_train, y_train, objective, solution, n_iter, step_size) print('Done!') print('Coefficients: %s' % coefficients) print('Train MSE: %f' % (score)) # generate predictions for the test dataset yhat = predict_dataset(X_test, coefficients) # calculate accuracy score = mean_squared_error(y_test, yhat) print('Test MSE: %f' % (score)) |

Running the example will report the iteration number and mean squared error each time there is an improvement made to the model.

At the end of the search, the performance of the best set of coefficients on the training dataset is reported and the performance of the same model on the test dataset is calculated and reported.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that the optimization algorithm found a set of coefficients that achieved an error of about 0.08 on both the train and test datasets.

The fact that the algorithm found a model with very similar performance on train and test datasets is a good sign, showing that the model did not over-fit (over-optimize) to the training dataset. This means the model generalizes well to new data.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
... >1546 0.35426 >1567 0.32863 >1572 0.32322 >1619 0.24890 >1665 0.24800 >1691 0.24162 >1715 0.15893 >1809 0.15337 >1892 0.14656 >1956 0.08042 Done! Coefficients: [ 1.30559829e-02 -2.58299382e-04 3.33118191e+00 3.20418534e-02 1.36497902e-01 8.65445367e+01 2.78356715e-02 -8.50901499e-02 8.90078243e-02 6.15779867e-02 -3.85657793e-02] Train MSE: 0.080415 Test MSE: 0.080779 |

Now that we are familiar with how to manually optimize the coefficients of a linear regression model, let’s look at how we can extend the example to optimize the coefficients of a logistic regression model for classification.

## Optimize a Logistic Regression Model

A Logistic Regression model is an extension of linear regression for classification predictive modeling.

Logistic regression is for binary classification tasks, meaning datasets that have two class labels, class=0 and class=1.

The output first involves calculating the weighted sum of the inputs, then passing this weighted sum through a logistic function, also called a sigmoid function. The result is a Binomial probability between 0 and 1 for the example belonging to class=1.

In this section, we will build on what we learned in the previous section to optimize the coefficients of regression models for classification. We will develop the model and test it with random coefficients, then use stochastic hill climbing to optimize the model coefficients.

First, let’s define a synthetic binary classification problem that we can use as the focus of optimizing the model.

We can use the make_classification() function to define a binary classification problem with 1,000 rows and five input variables.

The example below creates the dataset and summarizes the shape of the data.

1 2 3 4 5 6 |
# define a binary classification dataset from sklearn.datasets import make_classification # define dataset X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1) # summarize the shape of the dataset print(X.shape, y.shape) |

Running the example prints the shape of the created dataset, confirming our expectations.

1 |
(1000, 5) (1000,) |

Next, we need to define a logistic regression model.

Let’s start by updating the *predict_row()* function to pass the weighted sum of the input and coefficients through a logistic function.

The logistic function is defined as:

- logistic = 1.0 / (1.0 + exp(-result))

Where result is the weighted sum of the inputs and the coefficients and exp()Â is *e* (Euler’s number) raised to the power of the provided value, implemented via the exp() function.

The updated *predict_row()* function is listed below.

1 2 3 4 5 6 7 8 9 10 |
# logistic regression def predict_row(row, coefficients): # add the bias, the last coefficient result = coefficients[-1] # add the weighted input for i in range(len(row)): result += coefficients[i] * row[i] # logistic function logistic = 1.0 / (1.0 + exp(-result)) return logistic |

That’s about it in terms of changes for linear regression to logistic regression.

As with linear regression, we can test the model with a set of random model coefficients.

1 2 3 4 5 6 7 |
... # determine the number of coefficients n_coeff = X.shape[1] + 1 # generate random coefficients coefficients = rand(n_coeff) # generate predictions for dataset yhat = predict_dataset(X, coefficients) |

The predictions made by the model are probabilities for an example belonging to class=1.

We can round the prediction to be integer values 0 and 1 for the expected class labels.

1 2 3 |
... # round predictions to labels yhat = [round(y) for y in yhat] |

We can evaluate the classification accuracy of these predictions.

1 2 3 4 |
... # calculate accuracy score = accuracy_score(y, yhat) print('Accuracy: %f' % score) |

That’s it.

We can tie all of this together and demonstrate our simple logistic regression model for binary classification. The complete example is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
# logistic regression function for binary classification from math import exp from numpy.random import rand from sklearn.datasets import make_classification from sklearn.metrics import accuracy_score # logistic regression def predict_row(row, coefficients): # add the bias, the last coefficient result = coefficients[-1] # add the weighted input for i in range(len(row)): result += coefficients[i] * row[i] # logistic function logistic = 1.0 / (1.0 + exp(-result)) return logistic # use model coefficients to generate predictions for a dataset of rows def predict_dataset(X, coefficients): yhats = list() for row in X: # make a prediction yhat = predict_row(row, coefficients) # store the prediction yhats.append(yhat) return yhats # define dataset X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1) # determine the number of coefficients n_coeff = X.shape[1] + 1 # generate random coefficients coefficients = rand(n_coeff) # generate predictions for dataset yhat = predict_dataset(X, coefficients) # round predictions to labels yhat = [round(y) for y in yhat] # calculate accuracy score = accuracy_score(y, yhat) print('Accuracy: %f' % score) |

Running the example generates a prediction for each example in the training dataset then prints the classification accuracy for the predictions.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

We would expect about 50 percent accuracy given a set of random weights and a dataset with an equal number of examples in each class, and that is approximately what we see in this case.

1 |
Accuracy: 0.540000 |

We can now optimize the weights of the dataset to achieve good accuracy on this dataset.

The stochastic hill climbing algorithm used for linear regression can be used again for logistic regression.

The important difference is an update to the *objective()* function to round the predictions and evaluate the model using classification accuracy instead of mean squared error.

1 2 3 4 5 6 7 8 9 |
# objective function def objective(X, y, coefficients): # generate predictions for dataset yhat = predict_dataset(X, coefficients) # round predictions to labels yhat = [round(y) for y in yhat] # calculate accuracy score = accuracy_score(y, yhat) return score |

The *hillclimbing()* function also must be updated to maximize the score of solutions instead of minimizing in the case of linear regression.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# hill climbing local search algorithm def hillclimbing(X, y, objective, solution, n_iter, step_size): # evaluate the initial point solution_eval = objective(X, y, solution) # run the hill climb for i in range(n_iter): # take a step candidate = solution + randn(len(solution)) * step_size # evaluate candidate point candidte_eval = objective(X, y, candidate) # check if we should keep the new point if candidte_eval >= solution_eval: # store the new point solution, solution_eval = candidate, candidte_eval # report progress print('>%d %.5f' % (i, solution_eval)) return [solution, solution_eval] |

Finally, the coefficients found by the search can be evaluated using classification accuracy at the end of the run.

1 2 3 4 5 6 7 8 |
... # generate predictions for the test dataset yhat = predict_dataset(X_test, coefficients) # round predictions to labels yhat = [round(y) for y in yhat] # calculate accuracy score = accuracy_score(y_test, yhat) print('Test Accuracy: %f' % (score)) |

Tying this all together, the complete example of using stochastic hill climbing to maximize classification accuracy of a logistic regression model is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
# optimize logistic regression model with a stochastic hill climber from math import exp from numpy.random import randn from numpy.random import rand from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # logistic regression def predict_row(row, coefficients): # add the bias, the last coefficient result = coefficients[-1] # add the weighted input for i in range(len(row)): result += coefficients[i] * row[i] # logistic function logistic = 1.0 / (1.0 + exp(-result)) return logistic # use model coefficients to generate predictions for a dataset of rows def predict_dataset(X, coefficients): yhats = list() for row in X: # make a prediction yhat = predict_row(row, coefficients) # store the prediction yhats.append(yhat) return yhats # objective function def objective(X, y, coefficients): # generate predictions for dataset yhat = predict_dataset(X, coefficients) # round predictions to labels yhat = [round(y) for y in yhat] # calculate accuracy score = accuracy_score(y, yhat) return score # hill climbing local search algorithm def hillclimbing(X, y, objective, solution, n_iter, step_size): # evaluate the initial point solution_eval = objective(X, y, solution) # run the hill climb for i in range(n_iter): # take a step candidate = solution + randn(len(solution)) * step_size # evaluate candidate point candidte_eval = objective(X, y, candidate) # check if we should keep the new point if candidte_eval >= solution_eval: # store the new point solution, solution_eval = candidate, candidte_eval # report progress print('>%d %.5f' % (i, solution_eval)) return [solution, solution_eval] # define dataset X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1) # split into train test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33) # define the total iterations n_iter = 2000 # define the maximum step size step_size = 0.1 # determine the number of coefficients n_coef = X.shape[1] + 1 # define the initial solution solution = rand(n_coef) # perform the hill climbing search coefficients, score = hillclimbing(X_train, y_train, objective, solution, n_iter, step_size) print('Done!') print('Coefficients: %s' % coefficients) print('Train Accuracy: %f' % (score)) # generate predictions for the test dataset yhat = predict_dataset(X_test, coefficients) # round predictions to labels yhat = [round(y) for y in yhat] # calculate accuracy score = accuracy_score(y_test, yhat) print('Test Accuracy: %f' % (score)) |

Running the example will report the iteration number and classification accuracy each time there is an improvement made to the model.

At the end of the search, the performance of the best set of coefficients on the training dataset is reported and the performance of the same model on the test dataset is calculated and reported.

**Note**: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that the optimization algorithm found a set of weights that achieved about 87.3 percent accuracy on the training dataset and about 83.9 percent accuracy on the test dataset.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
... >200 0.85672 >225 0.85672 >230 0.85672 >245 0.86418 >281 0.86418 >285 0.86716 >294 0.86716 >306 0.86716 >316 0.86716 >317 0.86716 >320 0.86866 >348 0.86866 >362 0.87313 >784 0.87313 >1649 0.87313 Done! Coefficients: [-0.04652756 0.23243427 2.58587637 -0.45528253 -0.4954355 -0.42658053] Train Accuracy: 0.873134 Test Accuracy: 0.839394 |

## Further Reading

This section provides more resources on the topic if you are looking to go deeper.

- Train-Test Split for Evaluating Machine Learning Algorithms
- How to Implement Linear Regression From Scratch in Python
- How To Implement Logistic Regression From Scratch in Python

### APIs

- sklearn.datasets.make_regression APIs.
- sklearn.datasets.make_classification APIs.
- sklearn.metrics.mean_squared_error APIs.
- numpy.random.rand API.

### Articles

## Summary

In this tutorial, you discovered how to manually optimize the coefficients of regression models.

Specifically, you learned:

- How to develop the inference models for regression from scratch.
- How to optimize the coefficients of a linear regression model for predicting numeric values.
- How to optimize the coefficients of a logistic regression model using stochastic hill climbing.

**Do you have any questions?**

Ask your questions in the comments below and I will do my best to answer.

## No comments yet.