How To Implement Naive Bayes From Scratch in Python

The Naive Bayes algorithm is simple and effective and should be one of the first methods you try on a classification problem.

In this tutorial you are going to learn about the Naive Bayes algorithm including how it works and how to implement it from scratch in Python.

Update: Check out the follow-up on tips for using the naive bayes algorithm titled: “Better Naive Bayes: 12 Tips To Get The Most From The Naive Bayes Algorithm

naive bayes classifier

Naive Bayes Classifier
Photo by Matt Buck, some rights reserved

About Naive Bayes

The Naive Bayes algorithm is an intuitive method that uses the probabilities of each attribute belonging to each class to make a prediction. It is the supervised learning approach you would come up with if you wanted to model a predictive modeling problem probabilistically.

Naive bayes simplifies the calculation of probabilities by assuming that the probability of each attribute belonging to a given class value is independent of all other attributes. This is a strong assumption but results in a fast and effective method.

The probability of a class value given a value of an attribute is called the conditional probability. By multiplying the conditional probabilities together for each attribute for a given class value, we have a probability of a data instance belonging to that class.

To make a prediction we can calculate probabilities of the instance belonging to each class and select the class value with the highest probability.

Naive bases is often described using categorical data because it is easy to describe and calculate using ratios. A more useful version of the algorithm for our purposes supports numeric attributes and assumes the values of each numerical attribute are normally distributed (fall somewhere on a bell curve). Again, this is a strong assumption, but still gives robust results.

Get your FREE Algorithms Mind Map

Machine Learning Algorithms Mind Map

Sample of the handy machine learning algorithms mind map.

I've created a handy mind map of 60+ algorithms organized by type.

Download it, print it and use it. 

Download For Free

Also get exclusive access to the machine learning algorithms email mini-course.



Predict the Onset of Diabetes

The test problem we will use in this tutorial is the Pima Indians Diabetes problem.

This problem is comprised of 768 observations of medical details for Pima indians patents. The records describe instantaneous measurements taken from the patient such as their age, the number of times pregnant and blood workup. All patients are women aged 21 or older. All attributes are numeric, and their units vary from attribute to attribute.

Each record has a class value that indicates whether the patient suffered an onset of diabetes within 5 years of when the measurements were taken (1) or not (0).

This is a standard dataset that has been studied a lot in machine learning literature. A good prediction accuracy is 70%-76%.

Below is a sample from the file to get a sense of the data we will be working with.

NOTE: Download this file and save it with a .csv extension (e.g. See this file for a description of all the attributes.

Naive Bayes Algorithm Tutorial

This tutorial is broken down into the following steps:

  1. Handle Data: Load the data from CSV file and split it into training and test datasets.
  2. Summarize Data: summarize the properties in the training dataset so that we can calculate probabilities and make predictions.
  3. Make a Prediction: Use the summaries of the dataset to generate a single prediction.
  4. Make Predictions: Generate predictions given a test dataset and a summarized training dataset.
  5. Evaluate Accuracy: Evaluate the accuracy of predictions made for a test dataset as the percentage correct out of all predictions made.
  6. Tie it Together: Use all of the code elements to present a complete and standalone implementation of the Naive Bayes algorithm.

1. Handle Data

The first thing we need to do is load our data file. The data is in CSV format without a header line or any quotes. We can open the file with the open function and read the data lines using the reader function in the csv module.

We also need to convert the attributes that were loaded as strings into numbers that we can work with them. Below is the loadCsv() function for loading the Pima indians dataset.

We can test this function by loading the pima indians dataset and printing the number of data instances that were loaded.

Running this test, you should see something like:

Next we need to split the data into a training dataset that Naive Bayes can use to make predictions and a test dataset that we can use to evaluate the accuracy of the model. We need to split the data set randomly into train and datasets with a ratio of 67% train and 33% test (this is a common ratio for testing an algorithm on a dataset).

Below is the splitDataset() function that will split a given dataset into a given split ratio.

We can test this out by defining a mock dataset with 5 instances, split it into training and testing datasets and print them out to see which data instances ended up where.

Running this test, you should see something like:

2. Summarize Data

The naive bayes model is comprised of a summary of the data in the training dataset. This summary is then used when making predictions.

The summary of the training data collected involves the mean and the standard deviation for each attribute, by class value. For example, if there are two class values and 7 numerical attributes, then we need a mean and standard deviation for each attribute (7) and class value (2) combination, that is 14 attribute summaries.

These are required when making predictions to calculate the probability of specific attribute values belonging to each class value.

We can break the preparation of this summary data down into the following sub-tasks:

  1. Separate Data By Class
  2. Calculate Mean
  3. Calculate Standard Deviation
  4. Summarize Dataset
  5. Summarize Attributes By Class

Separate Data By Class

The first task is to separate the training dataset instances by class value so that we can calculate statistics for each class. We can do that by creating a map of each class value to a list of instances that belong to that class and sort the entire dataset of instances into the appropriate lists.

The separateByClass() function below does just this.

You can see that the function assumes that the last attribute (-1) is the class value. The function returns a map of class values to lists of data instances.

We can test this function with some sample data, as follows:

Running this test, you should see something like:

Calculate Mean

We need to calculate the mean of each attribute for a class value. The mean is the central middle or central tendency of the data, and we will use it as the middle of our gaussian distribution when calculating probabilities.

We also need to calculate the standard deviation of each attribute for a class value. The standard deviation describes the variation of spread of the data, and we will use it to characterize the expected spread of each attribute in our Gaussian distribution when calculating probabilities.

The standard deviation is calculated as the square root of the variance. The variance is calculated as the average of the squared differences for each attribute value from the mean. Note we are using the N-1 method, which subtracts 1 from the number of attribute values when calculating the variance.

We can test this by taking the mean of the numbers from 1 to 5.

Running this test, you should see something like:

Summarize Dataset

Now we have the tools to summarize a dataset. For a given list of instances (for a class value) we can calculate the mean and the standard deviation for each attribute.

The zip function groups the values for each attribute across our data instances into their own lists so that we can compute the mean and standard deviation values for the attribute.

We can test this summarize() function with some test data that shows markedly different mean and standard deviation values for the first and second data attributes.

Running this test, you should see something like:

Summarize Attributes By Class

We can pull it all together by first separating our training dataset into instances grouped by class. Then calculate the summaries for each attribute.

We can test this summarizeByClass() function with a small test dataset.

Running this test, you should see something like:

3. Make Prediction

We are now ready to make predictions using the summaries prepared from our training data. Making predictions involves calculating the probability that a given data instance belongs to each class, then selecting the class with the largest probability as the prediction.

We can divide this part into the following tasks:

  1. Calculate Gaussian Probability Density Function
  2. Calculate Class Probabilities
  3. Make a Prediction
  4. Estimate Accuracy

Calculate Gaussian Probability Density Function

We can use a Gaussian function to estimate the probability of a given attribute value, given the known mean and standard deviation for the attribute estimated from the training data.

Given that the attribute summaries where prepared for each attribute and class value, the result is the conditional probability of a given attribute value given a class value.

See the references for the details of this equation for the Gaussian probability density function. In summary we are plugging our known details into the Gaussian (attribute value, mean and standard deviation) and reading off the likelihood that our attribute value belongs to the class.

In the calculateProbability() function we calculate the exponent first, then calculate the main division. This lets us fit the equation nicely on two lines.

We can test this with some sample data, as follows.

Running this test, you should see something like:

Calculate Class Probabilities

Now that we can calculate the probability of an attribute belonging to a class, we can combine the probabilities of all of the attribute values for a data instance and come up with a probability of the entire data instance belonging to the class.

We combine probabilities together by multiplying them. In the calculateClassProbabilities() below, the probability of a given data instance is calculated by multiplying together the attribute probabilities for each class. the result is a map of class values to probabilities.

We can test the calculateClassProbabilities() function.

Running this test, you should see something like:

Make a Prediction

Now that can calculate the probability of a data instance belonging to each class value, we can look for the largest probability and return the associated class.

The predict() function belong does just that.

We can test the predict() function as follows:

Running this test, you should see something like:

4. Make Predictions

Finally, we can estimate the accuracy of the model by making predictions for each data instance in our test dataset. The getPredictions() will do this and return a list of predictions for each test instance.

We can test the getPredictions() function.

Running this test, you should see something like:

5. Get Accuracy

The predictions can be compared to the class values in the test dataset and a classification accuracy can be calculated as an accuracy ratio between 0& and 100%. The getAccuracy() will calculate this accuracy ratio.

We can test the getAccuracy() function using the sample code below.

Running this test, you should see something like:

6. Tie it Together

Finally, we need to tie it all together.

Below provides the full code listing for Naive Bayes implemented from scratch in Python.

Running the example provides output like the following:

Implementation Extensions

This section provides you with ideas for extensions that you could apply and investigate with the Python code you have implemented as part of this tutorial.

You have implemented your own version of Gaussian Naive Bayes in python from scratch.

You can extend the implementation further.

  • Calculate Class Probabilities: Update the example to summarize the probabilities of a data instance belonging to each class as a ratio. This can be calculated as the probability of a data instance belonging to one class, divided by the sum of the probabilities of the data instance belonging to each class. For example an instance had the probability of 0.02 for class A and 0.001 for class B, the likelihood of the instance belonging to class A is (0.02/(0.02+0.001))*100 which is about 95.23%.
  • Log Probabilities: The conditional probabilities for each class given an attribute value are small. When they are multiplied together they result in very small values, which can lead to floating point underflow (numbers too small to represent in Python). A common fix for this is to combine the log of the probabilities together. Research and implement this improvement.
  • Nominal Attributes: Update the implementation to support nominal attributes. This is much similar and the summary information you can collect for each attribute is the ratio of category values for each class. Dive into the references for more information.
  • Different Density Function (bernoulli or multinomial): We have looked at Gaussian Naive Bayes, but you can also look at other distributions. Implement a different distribution such as multinomial, bernoulli or kernel naive bayes that make different assumptions about the distribution of attribute values and/or their relationship with the class value.

Resources and Further Reading

This section will provide some resources that you can use to learn more about the Naive Bayes algorithm in terms of both theory of how and why it works and practical concerns for implementing it in code.


More resources for learning about the problem of predicting the onset of diabetes.


This section links to open source implementations of Naive Bayes in popular machine learning libraries. Review these if you are considering implementing your own version of the method for operational use.


You may have one or more books on applied machine learning. This section highlights the sections or chapters in common applied books on machine learning that refer to Naive Bayes.

Next Step

Take action.

Follow the tutorial and implement Naive Bayes from scratch. Adapt the example to another problem. Follow the extensions and improve upon the implementation.

Leave a comment and share your experiences.

Update: Check out the follow-up on tips for using the naive bayes algorithm titled: “Better Naive Bayes: 12 Tips To Get The Most From The Naive Bayes Algorithm

Want to Code Algorithms in Python Without Math?

Code Your First Algorithm in Minutes

...with step-by-step tutorials on real-world datasets

Discover how in my new Ebook: Machine Learning Algorithms From Scratch

It covers 18 tutorial lessons with all the code for 12 top algorithms, including:
Linear Regression, k-Nearest Neighbors, Stochastic Gradient Descent and much more...

Finally, Pull Back the Curtain on
Machine Learning Algorithms

Skip the Academics. Just Results.

Click to learn more.

72 Responses to How To Implement Naive Bayes From Scratch in Python

  1. david jensen December 12, 2014 at 3:28 am #

    Statistical methods should be developed from scratch because of misunderstandings. Thank you.

  2. Anurag December 14, 2014 at 1:11 pm #

    This is a wonderful article. Your blog is one of those blogs that I visit everyday. Thanks for sharing this stuff. I had a question about the programming language that should be used for building these algorithms from scratch. I know that Python is widely used because it’s easy to write code by importing useful libraries that are already available. Nevertheless, I am a C++ guy. Although I am a beginner in practical ML, I have tried to write efficient codes before I started learning and implementing ML. Now I am aware of the complexities involved in coding if you’re using C++: more coding is to be done than what is required in Python. Considering that, what language is your preference and under what situations? I know that it’s lame to ask about preferences of programming language as it is essentially a personal choice. But still I’d like you to share your take on this. Also try to share the trade-offs while choosing these programming languages.

    Thank you.

  3. Alcides Schulz January 15, 2015 at 12:32 am #

    Hi Jason, found your website and read it in one day. Thank you, it really helped me to understand ML and what to do.
    I did the 2 examples here and I think I will take a look at scikit-learn now.
    I have a personal project that I want to use ML, and I’ll keep you posted on the progress.

    One small note on this post, is on the “1. Handle data” you refer to from previous post.

    Thank you so much, example is really good to show how to do it. Please keep it coming.

    • Jason Brownlee January 15, 2015 at 7:43 am #

      Thanks for the kind words Alcides.

      Fixed the reference to the iris dataset.

  4. toolate January 22, 2015 at 2:16 am #

    Hi Jason, still one more note on your post, is on the “1. Handle data” the flower measures that you refer to

  5. Tamilselvan February 4, 2015 at 11:37 pm #

    Great Article. Learned a Lot. Thanks. Thanks.

  6. Abhinav kumar February 23, 2015 at 8:13 pm #

    thank u

  7. Roy March 7, 2015 at 2:53 pm #

    Thanks for your nice article. I really appreciate the step by step instructions.

  8. malini March 17, 2015 at 7:19 pm #

    hello sir, plz tell me how to compare the data set using naive Bayes algorithm.

  9. Isha March 21, 2015 at 5:40 pm #

    Why does the accuracy change every time you run this code?
    when i tried running this code every time it gave me different accuracy percentage in the range from 70-78%
    Why is it so?
    Why is it not giving a constant accuracy percent?

    • Harry April 9, 2015 at 8:37 am #

      As Splitting of dataset into testdata and traindata is done using a random function accuracy varies.

  10. Sheepsy90 March 25, 2015 at 8:12 pm #

    Hey nice article – one question – why do you use the N-1 in the STD Deviation Process?

  11. Vaishali April 8, 2015 at 6:01 pm #

    Hey! Thanks a ton! This was very useful.
    It would be great if you give an idea on how other metrics like precision and recall can be calculated.


  12. Ashwin Perti April 24, 2015 at 5:28 pm #


    When I am running the same code in IDLE (python 2.7) – the code is working fine, but when I run the same code in eclipse. the error coming is:

    1) warning – unused variable dataset
    2) undefined variable dataset in for loop

    Why this difference.

  13. Melvin Tjon Akon May 21, 2015 at 1:46 am #

    Great post, Jason.
    For a MBA/LLM, it makes naive bayes very easy to understand and to implement in legal coding. Looking forward to read more. Best, Melvin

  14. Igor Balabine June 10, 2015 at 11:44 am #


    Great example. Thanks! One nit: “calculateProbability” is not a good name for a function which actually calculates Gaussian probability density – pdf value may be greater than 1.



    • - Ruud - November 26, 2016 at 2:23 am #

      Good point, thanks!

  15. Alex Ubot July 2, 2015 at 10:06 pm #

    Hi Jason,

    Fantastic post. I really learnt a lot. However I do have a question? Why don’ t you use the P(y) value in your calculateClassProbabilities() ?
    If I understood the model correctly, everything is based on the bayes theorem :
    P(y|x1….xn) = P(x1…..xn|y) * P(y) / P(x1……xn)
    P(x1……xn) will be a constant so we can get rid of it.
    Your post explain very well how to calculate P(x1……xn|y) (assumption made that x1…..xn are all independent we then have
    P(x1……xn|y) = P(x1|y) * …. P(xn|y) )
    How about p(y) ? I assume that we should calculate the frequency of the observation y in the training set and then multiply it to probabilities[classValue] so that we have :
    P(y|x1…..xn) = frequency(classValue) * probabilities[classValue]

    Otherwise let’ s assume that in a training set of 500 lines, we have two class 0 and 1 but observed 100 times 0 et 400 times 1. If we do not compute the frequency, then the probability may be biased, right ? Did I misunderstand something ? Hopefully my post is clear. I really hope that you will reply because I am a bit confused.


    • Babu February 28, 2016 at 7:43 am #

      I have the same question – why is multiplying by p(y) is omitted?

      • Babu March 10, 2016 at 2:09 pm #

        No Answer yet – no one on internet has answer to this.

        Just don’t want to accept answers without understanding it.

        • frong April 3, 2016 at 3:15 pm #

          yeah,I have the same question too, maybe the P(y) is nessary ,but why the accuracy is not so low when P(y) is missing? is it proving that bayes model is powerful?

          • gd April 7, 2016 at 2:27 am #


            I believe this is because P(y) = 1 as classes are already segregated before calculating P(x1…xn|Y).

            Can experts comment on this please?

          • Babu May 23, 2016 at 7:32 am #

            There is huge bug in this implementation;

            First of all the implementation using GaussianNB gives totally a different answer.
            Why is no one is replying even after 2 months of this.

            My concern is, there are so many more bad bayesians in a wrong concept.
            My lead read this article and now he thinks I am wrong

            At least the parameters are correct – something wrong with calculating probs.

            def SplitXy(Xy):
            Xy10 = Xy;
            #print Xy10
            #print “========”
            y= zXy10[-1]
            del zXy10[-1]
            X=[list(t) for t in z1]
            return X,y

            from sklearn.naive_bayes import GaussianNB
            X,y = SplitXy(trainingSet)
            Xt,yt = SplitXy(testSet)

            model = GaussianNB()
  , y)

            ### Compare the models built by Python

            print (“Class: 0”)
            for i,j in enumerate(model.theta_[0]):
            print (“({:8.2f} {:9.2f} {:7.2f} )”.format(j, model.sigma_[0][i], sqrt(model.sigma_[0][i])) , end=””)
            print (“==> “, summaries[0][i])

            print (“Class: 1”)
            for i,j in enumerate(model.theta_[1]):
            print (“({:8.2f} {:9.2f} {:7.2f} )”.format(j, model.sigma_[1][i], sqrt(model.sigma_[1][i])) , end=””)
            print (“==> “, summaries[1][i])

            Class: 0
            ( 3.18 9.06 3.01 )==> (3.1766467065868262, 3.0147673799630748)
            ( 109.12 699.16 26.44 )==> (109.11976047904191, 26.481293163857107)
            ( 68.71 286.46 16.93 )==> (68.712574850299404, 16.950414098038465)
            ( 19.74 228.74 15.12 )==> (19.742514970059879, 15.146913806453629)
            ( 68.64 10763.69 103.75 )==> (68.640718562874255, 103.90387227315443)
            ( 30.71 58.05 7.62 )==> (30.710778443113771, 7.630215185470916)
            ( 0.42 0.09 0.29 )==> (0.42285928143712581, 0.29409299864249266)
            ( 30.66 118.36 10.88 )==> (30.658682634730539, 10.895778423248444)
            Class: 1
            ( 4.76 12.44 3.53 )==> (4.7611111111111111, 3.5365037952376928)
            ( 139.17 1064.54 32.63 )==> (139.17222222222222, 32.71833930500929)
            ( 69.27 525.24 22.92 )==> (69.272222222222226, 22.98209907114023)
            ( 22.64 309.59 17.60 )==> (22.638888888888889, 17.644143437447358)
            ( 101.13 20409.91 142.86 )==> (101.12777777777778, 143.2617649699204)
            ( 34.99 57.18 7.56 )==> (34.99388888888889, 7.5825893182809425)
            ( 0.54 0.14 0.37 )==> (0.53544444444444439, 0.3702077209795522)
            ( 36.73 112.86 10.62 )==> (36.727777777777774, 10.653417924304598)

    • EL YAMANI May 22, 2016 at 8:57 am #


      Thanks for this article , it is very helpful . I just have a remark about the probabilty that you are calculating which is P(x|Ck) and then you make predictions, the result will be biased since you don’t multiply by P(Ck) , P(x) can be omitted since it’s only a normalisation constant.

  16. Anand July 20, 2015 at 9:12 pm #

    Thanks a lot for this tutorial, Jason.

    I have a quick question if you can help.

    In the separateByClass() definition, I could not understand how vector[-1] is a right usage when vector is an int type object.

    If I try the same commands one by one outside the function, the line of code with vector[-1] obviously throws a TypeError: 'int' object has no attribute '__getitem__'.

    Then how is it working inside the function?

    I am sorry for my ignorance. I am new to python. Thank you.

  17. Sarah August 26, 2015 at 5:50 pm #

    Hello Jason! I just wanted to leave a message to say thank you for the website. I am preparing for a job in this field and it has helped me so much. Keep up the amazing work!! 🙂

    • Jason Brownlee August 26, 2015 at 6:56 pm #

      You’re welcome! Thanks for leaving such a kind comment, you really made my day 🙂

  18. Jaime Lopez September 7, 2015 at 8:52 am #

    Hi Jason,

    Very easy to follow your classifier. I try it and works well on your data, but is important to note that it works just on numerical databases, so maybe one have to transform your data from categorical to numerical format.

    Another thing, when I transformed one database, sometimes the algorithm find division by zero error, although I avoided to use that number on features and classes.

    Any suggestion Jason?

    Thanks, Jaime

    • syed belgam April 11, 2016 at 2:05 pm #


  19. eduardo September 28, 2015 at 1:32 pm #

    It is by far the best material I’ve found , please continue helping the community!

  20. Thibroll September 29, 2015 at 9:11 pm #


    This is all well explained, and depicts well the steps of machine learning. But the way you calculate your P(y|X) here is false, and may lead to unwanted error.

    Here, in theory, using the Bayes law, we know that : P(y|X) = P(y).P(X|y)/P(X). As we want to maximize P(y|X) with a given X, we can ignore P(X) and pick the result for the maximized value of P(y).P(X|y)

    2 points remain inconsistent :
    – First, you pick a gaussian distribution to estimate P(X|y). But here, you calculateProbability calculates the DENSITY of the function to the specific points X, y, with associated mean and deviation, and not the actual probability.
    – The second point is that you don’t take into consideration the calculation of P(y) to estimate P(y|X). Your model (with the correct probability calculation) may work only if all samples have same amount in every value of y (considering y is discret), or if you are lucky enough.

    Anyway, despite those mathematical issue, this is a good work, and a god introduction to machine learning.

  21. mondet October 6, 2015 at 10:08 am #

    Thanks Jason for all this great material. One thing that i adore from you is the intellectual honesty, the spirit of collaboration and the parsimony.

    In my opinion you are one of the best didactics exponents in the ML.

    Thanks to Thibroll too. But i would like to have a real example of the problem in R, python or any other language.



  22. Erika October 15, 2015 at 10:03 am #

    Hi Jason,
    I have trying to get started with machine learning and your article has given me the much needed first push towards that. Thank you for your efforts! 🙂

  23. Swagath November 9, 2015 at 5:35 pm #

    i need this code in java.. please help me//

  24. Sarah November 16, 2015 at 11:54 pm #

    I am working with this code – tweaking it here or there – have found it very helpful as I implement a NB from scratch. I am trying to take the next step and add in categorical data. Any suggestions on where I can head to get ideas for how to add this? Or any particular functions/methods in Python you can recommend? I’ve brought in all the attributes and split them into two datasets for continuous vs. categorical so that I can work on them separately before bringing their probabilities back together. I’ve got the categorical in the same dictionary where the key is the class and the values are lists of attributes for each instance. I’m not sure how to go through the values to count frequencies and then how to store this back up so that I have the attribute values along with their frequencies/probabilities. A dictionary within a dictionary? Should I be going in another direction and not using a similar format?

  25. Emmanuel Nuakoh November 19, 2015 at 6:36 am #

    Thank you Jason, this tutorial is helping me with my implementation of NB algorithm for my PhD Dissertation. Very elaborate.

  26. Anna January 14, 2016 at 2:32 am #

    Hi! thank you! Have you tried to do the same for the textual datasets, for example 20Newsgroups ? Would appreciate some hints or ideas )

  27. Randy January 16, 2016 at 4:15 pm #

    Great article, but as others pointed out there are some mathematical mistakes like using the probability density function for single value probabilities.

  28. Meghna February 7, 2016 at 7:45 pm #

    Thank you for this amazing article!! I implemented the same for wine and MNIST data set and these tutorials helped me so much!! 🙂

  29. David February 7, 2016 at 11:17 pm #

    I got an error with the first print statement, because your parenthesis are closing the call to print (which returns None) before you’re calling format, so instead of

    print(‘Split {0} rows into train with {1} and test with {2}’).format(len(dataset), train, test)

    it should be

    print(‘Split {0} rows into train with {1} and test with {2}’.format(len(dataset), train, test))

    Anyway, thanks for this tutorial, it was really useful, cheers!

  30. Kumar Ramanathan February 12, 2016 at 12:20 pm #

    Sincere gratitude for this most excellent site. Yes, I never learn until I write code for the algorithm. It is such an important exercise, to get concepts embedded into one’s brain. Brilliant effort, truly !

  31. Syed February 18, 2016 at 8:15 am #

    Just to test the algorithm, i change the class of few of the data to something else i.e 3 or 4, (last digit in a line) and i get divide by zero error while calculating the variance. I am not sure why. does it mean that this particular program works only for 2 classess? cant see anything which restricts it to that.

  32. Takuma Udagawa March 20, 2016 at 1:19 pm #

    Hi, I’m a student in Japan.
    It seems to me that you are calculating p(X1|Ck)*p(X2|Ck)*…*p(Xm|Ck) and choosing Ck such that this value would be maximum.
    However, when I looked in the Wikipedia, you are supposed to calculate p(X1|Ck)*p(X2|Ck)*…*p(Xm|Ck)*p(Ck).
    I don’t understand when you calculated p(Ck).
    Would you tell me about it?

    • jessie November 24, 2016 at 1:21 am #

      Had the same thought, where’s the prior calculated?

  33. Babu May 23, 2016 at 7:36 am #

    This is the same question as Alex Ubot above.

    Calculating the parameters are correct.
    but prediction implementation is incorrect.

    Unfortunately this article comes up high and everyone is learning incorrect way of doing things I think

  34. Swapnil June 10, 2016 at 1:21 am #

    Really nice tutorial. Can you post a detailed implementation of RandomForest as well ? It will be very helpful for us if you do so.


  35. sourena maroofi July 22, 2016 at 12:24 am #

    thanks Jason…very nice tutorial.

  36. Gary July 27, 2016 at 5:44 pm #

    I was interested in this Naive Bayes example and downloaded the .csv data and the code to process it.

    However, when I try to run it in Pycharm IDE using Python 3.5 I get no end of run-time errors.

    Has anyone else run the code successfully? And if so, what IDE/environment did they use?



    • Sudarshan August 10, 2016 at 5:05 pm #

      Hi Gary,

      You might want to run it using Python 2.7.

  37. Sudarshan August 10, 2016 at 5:02 pm #


    Thanks for the excellent tutorial. I’ve attempted to implement the same in Go.

    Here is a link for anyone that’s interested interested.

  38. Atlas August 13, 2016 at 6:40 am #

    This is AWESOME!!! Thank you Jason.

    Where can I find more of this?

  39. Alex August 20, 2016 at 4:34 pm #

    That can be implemented in any language because there’re no special libraries involved.

  40. SAFA August 28, 2016 at 1:39 am #

    there is some errors in “def splitDataset”
    in machine learning algorithm , split a dataset into trainning and testing must be done without repetition (duplication) , so the index = random.randrange(len(copy)) generate duplicate data
    for example ” index = 0 192 1 2 0 14 34 56 1 ………
    the spliting method must be done without duplication of data.

  41. Krati Jain September 12, 2016 at 2:35 pm #

    This is a highly informative and detailed explained article. Although I think that this is suitable for Python 2.x versions for 3.x, we don’t have ‘iteritems’ function in a dict object, we currently have ‘items’ in dict object. Secondly, format function is called on lot of print functions, which should have been on string in the print funciton but it has been somehow called on print function, which throws an error, can you please look into it.

  42. upen September 16, 2016 at 5:01 pm #

    hey Jason
    thanks for such a great tutorial im newbie to the concept and want to try naive bayes approach on movie-review on the review of a single movie that i have collected in a text file
    can you please provide some hint on the topic how to load my file and perform positve or negative review on it

  43. Abhis September 20, 2016 at 3:00 am #

    Would you please help me how i can implement naive bayes to predict student performance using their marks and feedback

    • Jason Brownlee September 20, 2016 at 8:35 am #

      I’m sorry, I am happy to answer your questions, but I cannot help you with your project. I just don’t have the capacity.

  44. Vinay October 13, 2016 at 2:18 pm #

    Hey Jason,

    Thanks a lot for such a nice article, helped a lot in understanding the implementations,

    i have a problem while running the script.
    I get the below error

    if (vector[-1] not in separated):
    IndexError: list index out of range

    can you please help me in getting it right?

    • Jason Brownlee October 14, 2016 at 8:58 am #

      Thanks Vinay.

      Check that the data was loaded successfully. Perhaps there are empty lines or columns in your loaded data?

  45. Viji October 20, 2016 at 8:57 pm #

    Hi Jason,

    Thank you for the wonderful article. U have used the ‘?'(testSet = [[1.1, ‘?’], [19.1, ‘?’]]) in the test set. can u please tell me what it specifies

  46. jeni November 15, 2016 at 9:11 pm #

    please send me a code in text classification using naive bayes classifier in python . the data set classifies +ve,-ve or neutral

    • Jason Brownlee November 16, 2016 at 9:28 am #

      Hi jeni, sorry I don’t have such an example prepared.

  47. MLNewbie November 28, 2016 at 1:21 pm #

    I am a newbie to ML and I found your website today. It is one of the greatest ML resources available on the Internet. I bookmarked it and thanks for everything Jason and I will visit your website everyday going forward.

    • Jason Brownlee November 29, 2016 at 8:47 am #

      Thanks, I’m glad you like it.

      • Anne January 7, 2017 at 6:58 pm #

        def predict(summaries, inputVector):
        probabilities = calculateClassProbabilities(summaries, inputVector)
        bestLabel, bestProb = None, -1
        for classValue, probability in probabilities.iteritems():
        if bestLabel is None or probability > bestProb:
        bestProb = probability
        bestLabel = classValue
        return bestLabel

        why is the prediction different for these
        summaries = {‘A’ : [(1, 0.5)], ‘B’: [(20, 5.0)]} –predicts A
        summaries = {‘0’ : [(1, 0.5)], ‘1’: [(20, 5.0)]} — predicts 0
        summaries = {0 : [(1, 0.5)], 1: [(20, 5.0)]} — predicts 1

Leave a Reply