Linear Discriminant Analysis With Python

By Jason Brownlee on August 3, 2020 in Python Machine Learning 10

Linear Discriminant Analysis is a linear classification machine learning algorithm.

The algorithm involves developing a probabilistic model per class based on the specific distribution of observations for each input variable. A new example is then classified by calculating the conditional probability of it belonging to each class and selecting the class with the highest probability.

As such, it is a relatively simple probabilistic classification model that makes strong assumptions about the distribution of each input variable, although it can make effective predictions even when these expectations are violated (e.g. it fails gracefully).

In this tutorial, you will discover the Linear Discriminant Analysis classification machine learning algorithm in Python.

After completing this tutorial, you will know:

The Linear Discriminant Analysis is a simple linear machine learning algorithm for classification.
How to fit, evaluate, and make predictions with the Linear Discriminant Analysis model with Scikit-Learn.
How to tune the hyperparameters of the Linear Discriminant Analysis algorithm on a given dataset.

Let’s get started.

Linear Discriminant Analysis With Python
Photo by Mihai Lucîț, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

Linear Discriminant Analysis
Linear Discriminant Analysis With scikit-learn
Tune LDA Hyperparameters

Linear Discriminant Analysis

Linear Discriminant Analysis, or LDA for short, is a classification machine learning algorithm.

It works by calculating summary statistics for the input features by class label, such as the mean and standard deviation. These statistics represent the model learned from the training data. In practice, linear algebra operations are used to calculate the required quantities efficiently via matrix decomposition.

Predictions are made by estimating the probability that a new example belongs to each class label based on the values of each input feature. The class that results in the largest probability is then assigned to the example. As such, LDA may be considered a simple application of Bayes Theorem for classification.

LDA assumes that the input variables are numeric and normally distributed and that they have the same variance (spread). If this is not the case, it may be desirable to transform the data to have a Gaussian distribution and standardize or normalize the data prior to modeling.

… the LDA classifier results from assuming that the observations within each class come from a normal distribution with a class-specific mean vector and a common variance

— Page 142, An Introduction to Statistical Learning with Applications in R, 2014.

It also assumes that the input variables are not correlated; if they are, a PCA transform may be helpful to remove the linear dependence.

… practitioners should be particularly rigorous in pre-processing data before using LDA. We recommend that predictors be centered and scaled and that near-zero variance predictors be removed.

— Page 293, Applied Predictive Modeling, 2013.

Nevertheless, the model can perform well, even when violating these expectations.

The LDA model is naturally multi-class. This means that it supports two-class classification problems and extends to more than two classes (multi-class classification) without modification or augmentation.

It is a linear classification algorithm, like logistic regression. This means that classes are separated in the feature space by lines or hyperplanes. Extensions of the method can be used that allow other shapes, like Quadratic Discriminant Analysis (QDA), which allows curved shapes in the decision boundary.

… unlike LDA, QDA assumes that each class has its own covariance matrix.

— Page 149, An Introduction to Statistical Learning with Applications in R, 2014.

Now that we are familiar with LDA, let’s look at how to fit and evaluate models using the scikit-learn library.

Linear Discriminant Analysis With scikit-learn

The Linear Discriminant Analysis is available in the scikit-learn Python machine learning library via the LinearDiscriminantAnalysis class.

The method can be used directly without configuration, although the implementation does offer arguments for customization, such as the choice of solver and the use of a penalty.

...
# create the lda model
model = LinearDiscriminantAnalysis()

...

# create the lda model

model = LinearDiscriminantAnalysis()

We can demonstrate the Linear Discriminant Analysis method with a worked example.

First, let’s define a synthetic classification dataset.

We will use the make_classification() function to create a dataset with 1,000 examples, each with 10 input variables.

The example creates and summarizes the dataset.

# test classification dataset
from sklearn.datasets import make_classification
# define dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1)
# summarize the dataset
print(X.shape, y.shape)

# test classification dataset

from sklearn.datasets import make_classification

# define dataset

X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1)

# summarize the dataset

print(X.shape, y.shape)

Running the example creates the dataset and confirms the number of rows and columns of the dataset.

(1000, 10) (1000,)

1	(1000, 10) (1000,)

We can fit and evaluate a Linear Discriminant Analysis model using repeated stratified k-fold cross-validation via the RepeatedStratifiedKFold class. We will use 10 folds and three repeats in the test harness.

The complete example of evaluating the Linear Discriminant Analysis model for the synthetic binary classification task is listed below.

# evaluate a lda model on the dataset
from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
# define dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1)
# define model
model = LinearDiscriminantAnalysis()
# define model evaluation method
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate model
scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
# summarize result
print('Mean Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))

# evaluate a lda model on the dataset

from numpy import mean

from numpy import std

from sklearn.datasets import make_classification

from sklearn.model_selection import cross_val_score

from sklearn.model_selection import RepeatedStratifiedKFold

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

# define dataset

X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1)

# define model

model = LinearDiscriminantAnalysis()

# define model evaluation method

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

# evaluate model

scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)

# summarize result

print('Mean Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))

Running the example evaluates the Linear Discriminant Analysis algorithm on the synthetic dataset and reports the average accuracy across the three repeats of 10-fold cross-validation.

Your specific results may vary given the stochastic nature of the learning algorithm. Consider running the example a few times.

In this case, we can see that the model achieved a mean accuracy of about 89.3 percent.

Mean Accuracy: 0.893 (0.033)

1	Mean Accuracy: 0.893 (0.033)

We may decide to use the Linear Discriminant Analysis as our final model and make predictions on new data.

This can be achieved by fitting the model on all available data and calling the predict() function passing in a new row of data.

We can demonstrate this with a complete example listed below.

# make a prediction with a lda model on the dataset
from sklearn.datasets import make_classification
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
# define dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1)
# define model
model = LinearDiscriminantAnalysis()
# fit model
model.fit(X, y)
# define new data
row = [0.12777556,-3.64400522,-2.23268854,-1.82114386,1.75466361,0.1243966,1.03397657,2.35822076,1.01001752,0.56768485]
# make a prediction
yhat = model.predict([row])
# summarize prediction
print('Predicted Class: %d' % yhat)

# make a prediction with a lda model on the dataset

from sklearn.datasets import make_classification

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

# define dataset

X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1)

# define model

model = LinearDiscriminantAnalysis()

# fit model

model.fit(X, y)

# define new data

row = [0.12777556,-3.64400522,-2.23268854,-1.82114386,1.75466361,0.1243966,1.03397657,2.35822076,1.01001752,0.56768485]

# make a prediction

yhat = model.predict([row])

# summarize prediction

print('Predicted Class: %d' % yhat)

Running the example fits the model and makes a class label prediction for a new row of data.

Predicted Class: 1

1	Predicted Class: 1

Next, we can look at configuring the model hyperparameters.

Tune LDA Hyperparameters

The hyperparameters for the Linear Discriminant Analysis method must be configured for your specific dataset.

An important hyperparameter is the solver, which defaults to ‘svd‘ but can also be set to other values for solvers that support the shrinkage capability.

The example below demonstrates this using the GridSearchCV class with a grid of different solver values.

# grid search solver for lda
from sklearn.datasets import make_classification
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
# define dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1)
# define model
model = LinearDiscriminantAnalysis()
# define model evaluation method
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# define grid
grid = dict()
grid['solver'] = ['svd', 'lsqr', 'eigen']
# define search
search = GridSearchCV(model, grid, scoring='accuracy', cv=cv, n_jobs=-1)
# perform the search
results = search.fit(X, y)
# summarize
print('Mean Accuracy: %.3f' % results.best_score_)
print('Config: %s' % results.best_params_)

# grid search solver for lda

from sklearn.datasets import make_classification

from sklearn.model_selection import GridSearchCV

from sklearn.model_selection import RepeatedStratifiedKFold

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

# define dataset

X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1)

# define model

model = LinearDiscriminantAnalysis()

# define model evaluation method

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

# define grid

grid = dict()

grid['solver'] = ['svd', 'lsqr', 'eigen']

# define search

search = GridSearchCV(model, grid, scoring='accuracy', cv=cv, n_jobs=-1)

# perform the search

results = search.fit(X, y)

# summarize

print('Mean Accuracy: %.3f' % results.best_score_)

print('Config: %s' % results.best_params_)

Running the example will evaluate each combination of configurations using repeated cross-validation.

Your specific results may vary given the stochastic nature of the learning algorithm. Try running the example a few times.

In this case, we can see that the default SVD solver performs the best compared to the other built-in solvers.

Mean Accuracy: 0.893
Config: {'solver': 'svd'}

1 2	Mean Accuracy: 0.893 Config: {'solver': 'svd'}

Next, we can explore whether using shrinkage with the model improves performance.

Shrinkage adds a penalty to the model that acts as a type of regularizer, reducing the complexity of the model.

Regularization reduces the variance associated with the sample based estimate at the expense of potentially increased bias. This bias variance trade-off is generally regulated by one or more (degree-of-belief) parameters that control the strength of the biasing towards the “plausible” set of (population) parameter values.

— Regularized Discriminant Analysis, 1989.

This can be set via the “shrinkage” argument and can be set to a value between 0 and 1. We will test values on a grid with a spacing of 0.01.

In order to use the penalty, a solver must be chosen that supports this capability, such as ‘eigen’ or ‘lsqr‘. We will use the latter in this case.

The complete example of tuning the shrinkage hyperparameter is listed below.

# grid search shrinkage for lda
from numpy import arange
from sklearn.datasets import make_classification
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
# define dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1)
# define model
model = LinearDiscriminantAnalysis(solver='lsqr')
# define model evaluation method
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# define grid
grid = dict()
grid['shrinkage'] = arange(0, 1, 0.01)
# define search
search = GridSearchCV(model, grid, scoring='accuracy', cv=cv, n_jobs=-1)
# perform the search
results = search.fit(X, y)
# summarize
print('Mean Accuracy: %.3f' % results.best_score_)
print('Config: %s' % results.best_params_)

# grid search shrinkage for lda

from numpy import arange

from sklearn.datasets import make_classification

from sklearn.model_selection import GridSearchCV

from sklearn.model_selection import RepeatedStratifiedKFold

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

# define dataset

X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, random_state=1)

# define model

model = LinearDiscriminantAnalysis(solver='lsqr')

# define model evaluation method

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

# define grid

grid = dict()

grid['shrinkage'] = arange(0, 1, 0.01)

# define search

search = GridSearchCV(model, grid, scoring='accuracy', cv=cv, n_jobs=-1)

# perform the search

results = search.fit(X, y)

# summarize

print('Mean Accuracy: %.3f' % results.best_score_)

print('Config: %s' % results.best_params_)

Running the example will evaluate each combination of configurations using repeated cross-validation.

Your specific results may vary given the stochastic nature of the learning algorithm. Try running the example a few times.

In this case, we can see that using shrinkage offers a slight lift in performance from about 89.3 percent to about 89.4 percent, with a value of 0.02.

Mean Accuracy: 0.894
Config: {'shrinkage': 0.02}

1 2	Mean Accuracy: 0.894 Config: {'shrinkage': 0.02}

Summary

In this tutorial, you discovered the Linear Discriminant Analysis classification machine learning algorithm in Python.

Specifically, you learned:

The Linear Discriminant Analysis is a simple linear machine learning algorithm for classification.
How to fit, evaluate, and make predictions with the Linear Discriminant Analysis model with Scikit-Learn.
How to tune the hyperparameters of the Linear Discriminant Analysis algorithm on a given dataset.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

10 Responses to Linear Discriminant Analysis With Python

Joseph September 28, 2020 at 4:46 pm #

Very educative article, thanks for sharing.

- Jason Brownlee September 28, 2020 at 5:05 pm #
  
  Thanks!
  
Anthony The Koala October 2, 2020 at 2:17 pm #

Dear Dr Jason,
Here is an example that letting the gridsearch

# define grid grid = dict() grid['solver'] = ['svd', 'lsqr', 'eigen'] grid['shrinkage'] = arange(0, 1, 0.01)

1
2
3
4

# define grid
grid = dict()
grid['solver'] = ['svd', 'lsqr', 'eigen']
grid['shrinkage'] = arange(0, 1, 0.01)

Results:
* adding more parameters to the grid search did not improve the accuracy.
* the best solver was ‘lsqr’. Compared to Dr Jason’s answer the best solver is ‘svd’.
* excluding ‘lsqr’ and leaving in solvers ‘svd’ and ‘eigen’, ‘eigen’ is the best solver, BUT the results were the same with mean accuracy of 0.894.
* shrinkage and ‘svd’ “don’t mix” as grid search parameters.

Mean Accuracy: 0.894 Config: {'shrinkage': 0.02, 'solver': 'lsqr'} #Leaving out lsqr Mean Accuracy: 0.894 Config: {'shrinkage': 0.02, 'solver': 'eigen'}

1
2
3
4
5

Mean Accuracy: 0.894
Config: {'shrinkage': 0.02, 'solver': 'lsqr'}
#Leaving out lsqr
Mean Accuracy: 0.894
Config: {'shrinkage': 0.02, 'solver': 'eigen'}

Thank you,
Anthony of Sydney

- Jason Brownlee October 2, 2020 at 2:24 pm #
  
  Nice work!
  
Harrison November 27, 2020 at 5:58 pm #

Very helpful!

- Jason Brownlee November 28, 2020 at 6:36 am #
  
  Thanks.
  
grigoris August 23, 2022 at 1:39 am #

Nice tutorial.Can you make a simmilar in Discriminant Correlation Analysis?

Harold Yuan November 2, 2022 at 7:15 am #

Hi Dr Jason – Thanks for great tutorial on using Linear Discriminant with scikit-learn. My question is what covariance does scikit-learn use: computes a ‘pooled variance’, subtracting each feature from the total mean (of all features) or actually computes each covariance per class, i.e. subtracting features from the mean of features of each class when computing the separate covariances … then combining the separate covariances weighted by prior probability into a single covariance? Thanks!

- James Carmichael November 3, 2022 at 6:38 am #
  
  Hi Harold…You are very welcome! The following resource may be of interest:
  
  https://scikit-learn.org/stable/modules/covariance.html
  
  - Harold Yuan November 5, 2022 at 6:19 am #
    
    Hi James,
    
    Yes, thanks for the link on scikit-learn covariance, I appreciate it. My specific question is what covariance computation or estimate does scikit-learn’s Linear Discriminant algorithm use? Thanks!

Navigation

Linear Discriminant Analysis With Python

Tutorial Overview

Linear Discriminant Analysis

Linear Discriminant Analysis With scikit-learn

Tune LDA Hyperparameters

Further Reading

Tutorials

Papers

Books

APIs

Articles

Summary

Discover Fast Machine Learning in Python!

Develop Your Own Models in Minutes

Finally Bring Machine Learning To
Your Own Projects

More On This Topic

10 Responses to Linear Discriminant Analysis With Python

Leave a Reply Click here to cancel reply.

Navigation

Tutorial Overview

Linear Discriminant Analysis

Linear Discriminant Analysis With scikit-learn

Tune LDA Hyperparameters

Further Reading

Tutorials

Papers

Books

APIs

Articles

Summary

Discover Fast Machine Learning in Python!

Develop Your Own Models in Minutes

Finally Bring Machine Learning To Your Own Projects

More On This Topic

10 Responses to Linear Discriminant Analysis With Python

Leave a Reply Click here to cancel reply.

Finally Bring Machine Learning To
Your Own Projects