How to Calculate Principal Component Analysis (PCA) from Scratch in Python

An important machine learning method for dimensionality reduction is called Principal Component Analysis.

It is a method that uses simple matrix operations from linear algebra and statistics to calculate a projection of the original data into the same number or fewer dimensions.

In this tutorial, you will discover the Principal Component Analysis machine learning method for dimensionality reduction and how to implement it from scratch in Python.

After completing this tutorial, you will know:

  • The procedure for calculating the Principal Component Analysis and how to choose principal components.
  • How to calculate the Principal Component Analysis from scratch in NumPy.
  • How to calculate the Principal Component Analysis for reuse on more data in scikit-learn.

Let’s get started.

  • Update Apr/2018: Fixed typo in the explaination of the sklearn PCA attributes. Thanks kris.
How to Calculate the Principal Component Analysis from Scratch in Python

How to Calculate the Principal Component Analysis from Scratch in Python
Photo by mickey, some rights reserved.

Tutorial Overview

This tutorial is divided into 3 parts; they are:

  1. Principal Component Analysis
  2. Manually Calculate Principal Component Analysis
  3. Reusable Principal Component Analysis

Need help with Linear Algebra for Machine Learning?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

Principal Component Analysis

Principal Component Analysis, or PCA for short, is a method for reducing the dimensionality of data.

It can be thought of as a projection method where data with m-columns (features) is projected into a subspace with m or fewer columns, whilst retaining the essence of the original data.

The PCA method can be described and implemented using the tools of linear algebra.

PCA is an operation applied to a dataset, represented by an n x m matrix A that results in a projection of A which we will call B. Let’s walk through the steps of this operation.

The first step is to calculate the mean values of each column.


Next, we need to center the values in each column by subtracting the mean column value.

The next step is to calculate the covariance matrix of the centered matrix C.

Correlation is a normalized measure of the amount and direction (positive or negative) that two columns change together. Covariance is a generalized and unnormalized version of correlation across multiple columns. A covariance matrix is a calculation of covariance of a given matrix with covariance scores for every column with every other column, including itself.

Finally, we calculate the eigendecomposition of the covariance matrix V. This results in a list of eigenvalues and a list of eigenvectors.

The eigenvectors represent the directions or components for the reduced subspace of B, whereas the eigenvalues represent the magnitudes for the directions. For more on this topic, see the post:

The eigenvectors can be sorted by the eigenvalues in descending order to provide a ranking of the components or axes of the new subspace for A.

If all eigenvalues have a similar value, then we know that the existing representation may already be reasonably compressed or dense and that the projection may offer little. If there are eigenvalues close to zero, they represent components or axes of B that may be discarded.

A total of m or less components must be selected to comprise the chosen subspace. Ideally, we would select k eigenvectors, called principal components, that have the k largest eigenvalues.

Other matrix decomposition methods can be used such as Singular-Value Decomposition, or SVD. As such, generally the values are referred to as singular values and the vectors of the subspace are referred to as principal components.

Once chosen, data can be projected into the subspace via matrix multiplication.

Where A is the original data that we wish to project, B^T is the transpose of the chosen principal components and P is the projection of A.

This is called the covariance method for calculating the PCA, although there are alternative ways to to calculate it.

Manually Calculate Principal Component Analysis

There is no pca() function in NumPy, but we can easily calculate the Principal Component Analysis step-by-step using NumPy functions.

The example below defines a small 3×2 matrix, centers the data in the matrix, calculates the covariance matrix of the centered data, and then the eigendecomposition of the covariance matrix. The eigenvectors and eigenvalues are taken as the principal components and singular values and used to project the original data.

Running the example first prints the original matrix, then the eigenvectors and eigenvalues of the centered covariance matrix, followed finally by the projection of the original matrix.

Interestingly, we can see that only the first eigenvector is required, suggesting that we could project our 3×2 matrix onto a 3×1 matrix with little loss.

Reusable Principal Component Analysis

We can calculate a Principal Component Analysis on a dataset using the PCA() class in the scikit-learn library. The benefit of this approach is that once the projection is calculated, it can be applied to new data again and again quite easily.

When creating the class, the number of components can be specified as a parameter.

The class is first fit on a dataset by calling the fit() function, and then the original dataset or other data can be projected into a subspace with the chosen number of dimensions by calling the transform() function.

Once fit, the eigenvalues and principal components can be accessed on the PCA class via the explained_variance_ and components_ attributes.

The example below demonstrates using this class by first creating an instance, fitting it on a 3×2 matrix, accessing the values and vectors of the projection, and transforming the original data.

Running the example first prints the 3×2 data matrix, then the principal components and values, followed by the projection of the original matrix.

We can see, that with some very minor floating point rounding that we achieve the same principal components, singular values, and projection as in the previous example.


This section lists some ideas for extending the tutorial that you may wish to explore.

  • Re-run the examples with your own small contrived matrix values.
  • Load a dataset and calculate the PCA on it and compare the results from the two methods.
  • Search for and locate 10 examples where PCA has been used in machine learning papers.

If you explore any of these extensions, I’d love to know.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.






In this tutorial, you discovered the Principal Component Analysis machine learning method for dimensionality reduction.

Specifically, you learned:

  • The procedure for calculating the Principal Component Analysis and how to choose principal components.
  • How to calculate the Principal Component Analysis from scratch in NumPy.
  • How to calculate the Principal Component Analysis for reuse on more data in scikit-learn.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Get a Handle on Linear Algebra for Machine Learning!

Linear Algebra for Machine Learning

Develop a working understand of linear algebra

…by writing lines of code in python

Discover how in my new Ebook:
Linear Algebra for Machine Learning

It provides self-study tutorials on topics like:
Vector Norms, Matrix Multiplication, Tensors, Eigendecomposition, SVD, PCA and much more…

Finally Understand the Mathematics of Data

Skip the Academics. Just Results.

Click to learn more.

43 Responses to How to Calculate Principal Component Analysis (PCA) from Scratch in Python

  1. John W March 2, 2018 at 1:38 pm #

    Great article! I have been more of an R programmer in the past but have started to mess with Python. Python is a very versatile language and has started to draw my attention over the last few months.

    • Jason Brownlee March 2, 2018 at 3:25 pm #

      Thanks John. I’m a big fan of Python myself these days.

  2. Saeed Ullah March 2, 2018 at 3:24 pm #

    Hello Jason, it’s very nice you are doing great work and I request you to make such a post on ISOMAP Dimensionality Reduction too..

  3. john March 6, 2018 at 8:24 am #


    Could you make a post on the Scree plot ?

    Thank you

  4. Ranjeet Singh March 8, 2018 at 5:57 pm #

    Is there any direct relation between SVD and PCA since both perform dimentionality reduction?

    • Jason Brownlee March 9, 2018 at 6:21 am #

      Yes, they both can be used for dimensionality reduction.

  5. Kaviyarasi March 19, 2018 at 8:01 pm #

    Can we apply this for loaded file .csv format?

  6. kris April 13, 2018 at 8:35 pm #

    Hi Jason, thanks for the great work you are doing with your blog!

    I think the attribute “explained_variance_” of the PCA class from scikit-learn returns the eigenvalues and not the singular values as you mention in the section “Reusable Principal Component Analysis”. For the singular values there is another attribute which is “singular_values_”. Correct?

    Also, “single values” should read “eigenvalues” in the sentence “…that we achieve the same principal components, singular values, and projection as in…”. Correct?

    • Jason Brownlee April 14, 2018 at 6:40 am #

      Correct, fixed.

      Thanks for pointing out the typo!

  7. Baron May 3, 2018 at 10:33 pm #

    Hello teacher. can help you me ? I wanna now how to implement a CPA?

    • Jason Brownlee May 4, 2018 at 7:44 am #

      What is CPA?

      • Baron| May 10, 2018 at 9:10 pm #

        I´m sorry. I mean PCA

        • Surya May 2, 2019 at 6:55 pm #

          I think he has explained that in tutorial

  8. Gravey May 15, 2018 at 11:27 pm #

    Hi Jason,

    Is there similar support for R or Matlab users? I’m trying to find a workshop / training in this area, if you could recommend anything that may help.

  9. Mohammad June 18, 2018 at 11:23 am #

    Great post!

    I found a typo: In the initial explanation, it’s said:
    P = B^T . A

    In the manual calculation:
    P =

    Which one is correct? The original A or the mean-centered C?

    • Jason Brownlee June 18, 2018 at 3:10 pm #

      No typo, perhaps confusing explanation.

      B == vectors (components)
      A == C (centered data to project)

  10. Martin Power October 13, 2018 at 9:02 pm #

    When I copy the code from section “Reusable Principal Component Analysis” and run in a Jupyter notebook with a Python3.6 kernel, I get a different output to what is shown on site.

    The values for the Eigenvectors and Matrix B are the same but the polarity is not the same.

    Any idea what is causing the mismatch?

    [[1 2]
    [3 4]
    [5 6]]
    [[ 0.70710678 0.70710678]
    [-0.70710678 0.70710678]]
    [8. 0.]
    [[-2.82842712e+00 -2.22044605e-16]
    [ 0.00000000e+00 0.00000000e+00]
    [ 2.82842712e+00 2.22044605e-16]]

    • Jason Brownlee October 14, 2018 at 6:03 am #

      Yes, I address this in the post.

      Minor differences and differences in sign can occur due to differences across platforms from multiple runs of the solver (used under the covers).

      These matrix operations require converging a solution, they are not entirely deterministic like arithmetic, we are approximating.

  11. RB October 19, 2018 at 7:53 am #

    Is there a way to store the PCA model after fit() during training and reuse that model later (by loading from saved file) on live data ?

    • Jason Brownlee October 19, 2018 at 10:57 am #

      Yes, you can save the elements to file in plain text or as pickled python objects.

  12. Sanjay November 22, 2018 at 1:00 pm #

    Hi Jason

    while computing the mean, shouldn’t the axis be equal to 0 rather than 1? since each dimension or feature must be averaged rather than each data point

    • Jason Brownlee November 22, 2018 at 2:12 pm #

      I believe 0 would be row-wise, 1 is column wise

  13. uluc December 31, 2018 at 1:20 am #

    This is not from stratch at all. Calculating covariance matrix and eigenvalue decomposition of is it an important part, which this tutorial skips totally.

  14. Yogesh February 1, 2019 at 6:25 pm #

    HI Jason,

    I have a doubt , is there u are saying PCA with eigenvector and PCA with svd both are different ? or i understood wrong,

    secondly can we use together ?

  15. Venkat February 18, 2019 at 7:57 pm #

    Hi Jason,

    Can you extend PCA and Hotelling’s T^2 for confidence interval in python.


  16. Al February 22, 2019 at 5:41 am #

    Hi Jason, I found extracting top PCA explaining 90% of the variance, boosting to a large degree my h2o.deeplearning model to a +99% overall accuracy, AUC, tpr and npr. It is so good once the model is applied to my the test set to look unreal (basically only one misprediction out of 1k+ observations in my confusion matrix). I am not versant with the orthogonal transformations underlying PCA, but I was wondering: would PCA be the cause of overfitting on my data set? How is it possible to get to such an amazing result? How reliable would be my model over future and unseen observations?

    • Jason Brownlee February 22, 2019 at 6:27 am #

      Yes, the transform must be calculated on the train dataset only, then applied to train and test sets.

      • Al February 23, 2019 at 3:58 am #

        I see waht you mean. Thanks!

  17. Samim April 30, 2019 at 6:22 pm #

    Could you please explain more about and pca.transform what exactly is happening when we call these two ?

    • Jason Brownlee May 1, 2019 at 7:01 am #

      Great question, fit is converging on a solution, e.g. finding the eigenvectors and eigenvalues.

      It might help to check the API documentation.

  18. Elvis Dennis June 4, 2019 at 12:32 am #

    What is the difference between Split Zone design and Split Plot design?

    • Jason Brownlee June 4, 2019 at 7:52 am #

      I have not heard these terms before, sorry.

      What is the content?

  19. Rajshree June 13, 2019 at 9:55 pm #

    Amazing description Sir, but in the manual computation of PCA I’m having a different dataset having 1140 eigen vectors and want only 100 of them corresponding to their eigen values. So, how to choose the components and form the feature vector.

Leave a Reply