How to Code a Neural Network with Backpropagation In Python (from scratch)

The backpropagation algorithm is used in the classical feed-forward artificial neural network.

It is the technique still used to train large deep learning networks.

In this tutorial, you will discover how to implement the backpropagation algorithm for a neural network from scratch with Python.

After completing this tutorial, you will know:

  • How to forward-propagate an input to calculate an output.
  • How to back-propagate error and train a network.
  • How to apply the backpropagation algorithm to a real-world predictive modeling problem.

Kick-start your project with my new book Machine Learning Algorithms From Scratch, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

  • Update Nov/2016: Fixed a bug in the activate() function. Thanks Alex!
  • Update Jan/2017: Fixes issues with Python 3.
  • Update Jan/2017: Updated small bug in update_weights(). Thanks Tomasz!
  • Update Apr/2018: Added direct link to CSV dataset.
  • Update Aug/2018: Tested and updated to work with Python 3.6.
  • Update Sep/2019: Updated wheat-seeds.csv to fix formatting issues.
  • Update Oct/2021: Reverse the sign of error to be consistent with other literature.
How to Implement the Backpropagation Algorithm From Scratch In Python

How to Implement the Backpropagation Algorithm From Scratch In Python
Photo by NICHD, some rights reserved.

Description

This section provides a brief introduction to the Backpropagation Algorithm and the Wheat Seeds dataset that we will be using in this tutorial.

Backpropagation Algorithm

The Backpropagation algorithm is a supervised learning method for multilayer feed-forward networks from the field of Artificial Neural Networks.

Feed-forward neural networks are inspired by the information processing of one or more neural cells, called a neuron. A neuron accepts input signals via its dendrites, which pass the electrical signal down to the cell body. The axon carries the signal out to synapses, which are the connections of a cell’s axon to other cell’s dendrites.

The principle of the backpropagation approach is to model a given function by modifying internal weightings of input signals to produce an expected output signal. The system is trained using a supervised learning method, where the error between the system’s output and a known expected output is presented to the system and used to modify its internal state.

Technically, the backpropagation algorithm is a method for training the weights in a multilayer feed-forward neural network. As such, it requires a network structure to be defined of one or more layers where one layer is fully connected to the next layer. A standard network structure is one input layer, one hidden layer, and one output layer.

Backpropagation can be used for both classification and regression problems, but we will focus on classification in this tutorial.

In classification problems, best results are achieved when the network has one neuron in the output layer for each class value. For example, a 2-class or binary classification problem with the class values of A and B. These expected outputs would have to be transformed into binary vectors with one column for each class value. Such as [1, 0] and [0, 1] for A and B respectively. This is called a one hot encoding.

Wheat Seeds Dataset

The seeds dataset involves the prediction of species given measurements seeds from different varieties of wheat.

There are 201 records and 7 numerical input variables. It is a classification problem with 3 output classes. The scale for each numeric input value vary, so some data normalization may be required for use with algorithms that weight inputs like the backpropagation algorithm.

Below is a sample of the first 5 rows of the dataset.

Using the Zero Rule algorithm that predicts the most common class value, the baseline accuracy for the problem is 28.095%.

You can learn more and download the seeds dataset from the UCI Machine Learning Repository.

Download the seeds dataset and place it into your current working directory with the filename seeds_dataset.csv.

The dataset is in tab-separated format, so you must convert it to CSV using a text editor or a spreadsheet program.

Update, download the dataset in CSV format directly:

Tutorial

This tutorial is broken down into 6 parts:

  1. Initialize Network.
  2. Forward Propagate.
  3. Back Propagate Error.
  4. Train Network.
  5. Predict.
  6. Seeds Dataset Case Study.

These steps will provide the foundation that you need to implement the backpropagation algorithm from scratch and apply it to your own predictive modeling problems.

1. Initialize Network

Let’s start with something easy, the creation of a new network ready for training.

Each neuron has a set of weights that need to be maintained. One weight for each input connection and an additional weight for the bias. We will need to store additional properties for a neuron during training, therefore we will use a dictionary to represent each neuron and store properties by names such as ‘weights‘ for the weights.

A network is organized into layers. The input layer is really just a row from our training dataset. The first real layer is the hidden layer. This is followed by the output layer that has one neuron for each class value.

We will organize layers as arrays of dictionaries and treat the whole network as an array of layers.

It is good practice to initialize the network weights to small random numbers. In this case, will we use random numbers in the range of 0 to 1.

Below is a function named initialize_network() that creates a new neural network ready for training. It accepts three parameters, the number of inputs, the number of neurons to have in the hidden layer and the number of outputs.

You can see that for the hidden layer we create n_hidden neurons and each neuron in the hidden layer has n_inputs + 1 weights, one for each input column in a dataset and an additional one for the bias.

You can also see that the output layer that connects to the hidden layer has n_outputs neurons, each with n_hidden + 1 weights. This means that each neuron in the output layer connects to (has a weight for) each neuron in the hidden layer.

Let’s test out this function. Below is a complete example that creates a small network.

Running the example, you can see that the code prints out each layer one by one. You can see the hidden layer has one neuron with 2 input weights plus the bias. The output layer has 2 neurons, each with 1 weight plus the bias.

Now that we know how to create and initialize a network, let’s see how we can use it to calculate an output.

2. Forward Propagate

We can calculate an output from a neural network by propagating an input signal through each layer until the output layer outputs its values.

We call this forward-propagation.

It is the technique we will need to generate predictions during training that will need to be corrected, and it is the method we will need after the network is trained to make predictions on new data.

We can break forward propagation down into three parts:

  1. Neuron Activation.
  2. Neuron Transfer.
  3. Forward Propagation.

2.1. Neuron Activation

The first step is to calculate the activation of one neuron given an input.

The input could be a row from our training dataset, as in the case of the hidden layer. It may also be the outputs from each neuron in the hidden layer, in the case of the output layer.

Neuron activation is calculated as the weighted sum of the inputs. Much like linear regression.

Where weight is a network weight, input is an input, i is the index of a weight or an input and bias is a special weight that has no input to multiply with (or you can think of the input as always being 1.0).

Below is an implementation of this in a function named activate(). You can see that the function assumes that the bias is the last weight in the list of weights. This helps here and later to make the code easier to read.

Now, let’s see how to use the neuron activation.

2.2. Neuron Transfer

Once a neuron is activated, we need to transfer the activation to see what the neuron output actually is.

Different transfer functions can be used. It is traditional to use the sigmoid activation function, but you can also use the tanh (hyperbolic tangent) function to transfer outputs. More recently, the rectifier transfer function has been popular with large deep learning networks.

The sigmoid activation function looks like an S shape, it’s also called the logistic function. It can take any input value and produce a number between 0 and 1 on an S-curve. It is also a function of which we can easily calculate the derivative (slope) that we will need later when backpropagating error.

We can transfer an activation function using the sigmoid function as follows:

Where e is the base of the natural logarithms (Euler’s number).

Below is a function named transfer() that implements the sigmoid equation.

Now that we have the pieces, let’s see how they are used.

2.3. Forward Propagation

Forward propagating an input is straightforward.

We work through each layer of our network calculating the outputs for each neuron. All of the outputs from one layer become inputs to the neurons on the next layer.

Below is a function named forward_propagate() that implements the forward propagation for a row of data from our dataset with our neural network.

You can see that a neuron’s output value is stored in the neuron with the name ‘output‘. You can also see that we collect the outputs for a layer in an array named new_inputs that becomes the array inputs and is used as inputs for the following layer.

The function returns the outputs from the last layer also called the output layer.

Let’s put all of these pieces together and test out the forward propagation of our network.

We define our network inline with one hidden neuron that expects 2 input values and an output layer with two neurons.

Running the example propagates the input pattern [1, 0] and produces an output value that is printed. Because the output layer has two neurons, we get a list of two numbers as output.

The actual output values are just nonsense for now, but next, we will start to learn how to make the weights in the neurons more useful.

3. Back Propagate Error

The backpropagation algorithm is named for the way in which weights are trained.

Error is calculated between the expected outputs and the outputs forward propagated from the network. These errors are then propagated backward through the network from the output layer to the hidden layer, assigning blame for the error and updating weights as they go.

The math for backpropagating error is rooted in calculus, but we will remain high level in this section and focus on what is calculated and how rather than why the calculations take this particular form.

This part is broken down into two sections.

  1. Transfer Derivative.
  2. Error Backpropagation.

3.1. Transfer Derivative

Given an output value from a neuron, we need to calculate it’s slope.

We are using the sigmoid transfer function, the derivative of which can be calculated as follows:

Below is a function named transfer_derivative() that implements this equation.

Now, let’s see how this can be used.

3.2. Error Backpropagation

The first step is to calculate the error for each output neuron, this will give us our error signal (input) to propagate backwards through the network.

The error for a given neuron can be calculated as follows:

Where expected is the expected output value for the neuron, output is the output value for the neuron and transfer_derivative() calculates the slope of the neuron’s output value, as shown above.

This error calculation is used for neurons in the output layer. The expected value is the class value itself. In the hidden layer, things are a little more complicated.

The error signal for a neuron in the hidden layer is calculated as the weighted error of each neuron in the output layer. Think of the error traveling back along the weights of the output layer to the neurons in the hidden layer.

The back-propagated error signal is accumulated and then used to determine the error for the neuron in the hidden layer, as follows:

Where error_j is the error signal from the jth neuron in the output layer, weight_k is the weight that connects the kth neuron to the current neuron and output is the output for the current neuron.

Below is a function named backward_propagate_error() that implements this procedure.

You can see that the error signal calculated for each neuron is stored with the name ‘delta’. You can see that the layers of the network are iterated in reverse order, starting at the output and working backwards. This ensures that the neurons in the output layer have ‘delta’ values calculated first that neurons in the hidden layer can use in the subsequent iteration. I chose the name ‘delta’ to reflect the change the error implies on the neuron (e.g. the weight delta).

You can see that the error signal for neurons in the hidden layer is accumulated from neurons in the output layer where the hidden neuron number j is also the index of the neuron’s weight in the output layer neuron[‘weights’][j].

Let’s put all of the pieces together and see how it works.

We define a fixed neural network with output values and backpropagate an expected output pattern. The complete example is listed below.

Running the example prints the network after the backpropagation of error is complete. You can see that error values are calculated and stored in the neurons for the output layer and the hidden layer.

Now let’s use the backpropagation of error to train the network.

4. Train Network

The network is trained using stochastic gradient descent.

This involves multiple iterations of exposing a training dataset to the network and for each row of data forward propagating the inputs, backpropagating the error and updating the network weights.

This part is broken down into two sections:

  1. Update Weights.
  2. Train Network.

4.1. Update Weights

Once errors are calculated for each neuron in the network via the back propagation method above, they can be used to update weights.

Network weights are updated as follows:

Where weight is a given weight, learning_rate is a parameter that you must specify, error is the error calculated by the backpropagation procedure for the neuron and input is the input value that caused the error.

The same procedure can be used for updating the bias weight, except there is no input term, or input is the fixed value of 1.0.

Learning rate controls how much to change the weight to correct for the error. For example, a value of 0.1 will update the weight 10% of the amount that it possibly could be updated. Small learning rates are preferred that cause slower learning over a large number of training iterations. This increases the likelihood of the network finding a good set of weights across all layers rather than the fastest set of weights that minimize error (called premature convergence).

Below is a function named update_weights() that updates the weights for a network given an input row of data, a learning rate and assume that a forward and backward propagation have already been performed.

Remember that the input for the output layer is a collection of outputs from the hidden layer.

Now we know how to update network weights, let’s see how we can do it repeatedly.

4.2. Train Network

As mentioned, the network is updated using stochastic gradient descent.

This involves first looping for a fixed number of epochs and within each epoch updating the network for each row in the training dataset.

Because updates are made for each training pattern, this type of learning is called online learning. If errors were accumulated across an epoch before updating the weights, this is called batch learning or batch gradient descent.

Below is a function that implements the training of an already initialized neural network with a given training dataset, learning rate, fixed number of epochs and an expected number of output values.

The expected number of output values is used to transform class values in the training data into a one hot encoding. That is a binary vector with one column for each class value to match the output of the network. This is required to calculate the error for the output layer.

You can also see that the sum squared error between the expected output and the network output is accumulated each epoch and printed. This is helpful to create a trace of how much the network is learning and improving each epoch.

We now have all of the pieces to train the network. We can put together an example that includes everything we’ve seen so far including network initialization and train a network on a small dataset.

Below is a small contrived dataset that we can use to test out training our neural network.

Below is the complete example. We will use 2 neurons in the hidden layer. It is a binary classification problem (2 classes) so there will be two neurons in the output layer. The network will be trained for 20 epochs with a learning rate of 0.5, which is high because we are training for so few iterations.

Running the example first prints the sum squared error each training epoch. We can see a trend of this error decreasing with each epoch.

Once trained, the network is printed, showing the learned weights. Also still in the network are output and delta values that can be ignored. We could update our training function to delete these data if we wanted.

Once a network is trained, we need to use it to make predictions.

5. Predict

Making predictions with a trained neural network is easy enough.

We have already seen how to forward-propagate an input pattern to get an output. This is all we need to do to make a prediction. We can use the output values themselves directly as the probability of a pattern belonging to each output class.

It may be more useful to turn this output back into a crisp class prediction. We can do this by selecting the class value with the larger probability. This is also called the arg max function.

Below is a function named predict() that implements this procedure. It returns the index in the network output that has the largest probability. It assumes that class values have been converted to integers starting at 0.

We can put this together with our code above for forward propagating input and with our small contrived dataset to test making predictions with an already-trained network. The example hardcodes a network trained from the previous step.

The complete example is listed below.

Running the example prints the expected output for each record in the training dataset, followed by the crisp prediction made by the network.

It shows that the network achieves 100% accuracy on this small dataset.

Now we are ready to apply our backpropagation algorithm to a real world dataset.

6. Wheat Seeds Dataset

This section applies the Backpropagation algorithm to the wheat seeds dataset.

The first step is to load the dataset and convert the loaded data to numbers that we can use in our neural network. For this we will use the helper function load_csv() to load the file, str_column_to_float() to convert string numbers to floats and str_column_to_int() to convert the class column to integer values.

Input values vary in scale and need to be normalized to the range of 0 and 1. It is generally good practice to normalize input values to the range of the chosen transfer function, in this case, the sigmoid function that outputs values between 0 and 1. The dataset_minmax() and normalize_dataset() helper functions were used to normalize the input values.

We will evaluate the algorithm using k-fold cross-validation with 5 folds. This means that 201/5=40.2 or 40 records will be in each fold. We will use the helper functions evaluate_algorithm() to evaluate the algorithm with cross-validation and accuracy_metric() to calculate the accuracy of predictions.

A new function named back_propagation() was developed to manage the application of the Backpropagation algorithm, first initializing a network, training it on the training dataset and then using the trained network to make predictions on a test dataset.

The complete example is listed below.

A network with 5 neurons in the hidden layer and 3 neurons in the output layer was constructed. The network was trained for 500 epochs with a learning rate of 0.3. These parameters were found with a little trial and error, but you may be able to do much better.

Running the example prints the average classification accuracy on each fold as well as the average performance across all folds.

You can see that backpropagation and the chosen configuration achieved a mean classification accuracy of about 93% which is dramatically better than the Zero Rule algorithm that did slightly better than 28% accuracy.

Extensions

This section lists extensions to the tutorial that you may wish to explore.

  • Tune Algorithm Parameters. Try larger or smaller networks trained for longer or shorter. See if you can get better performance on the seeds dataset.
  • Additional Methods. Experiment with different weight initialization techniques (such as small random numbers) and different transfer functions (such as tanh).
  • More Layers. Add support for more hidden layers, trained in just the same way as the one hidden layer used in this tutorial.
  • Regression. Change the network so that there is only one neuron in the output layer and that a real value is predicted. Pick a regression dataset to practice on. A linear transfer function could be used for neurons in the output layer, or the output values of the chosen dataset could be scaled to values between 0 and 1.
  • Batch Gradient Descent. Change the training procedure from online to batch gradient descent and update the weights only at the end of each epoch.

Did you try any of these extensions?
Share your experiences in the comments below.

Review

In this tutorial, you discovered how to implement the Backpropagation algorithm from scratch.

Specifically, you learned:

  • How to forward propagate an input to calculate a network output.
  • How to back propagate error and update network weights.
  • How to apply the backpropagation algorithm to a real world dataset.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Discover How to Code Algorithms From Scratch!

Machine Learning Algorithms From Scratch

No Libraries, Just Python Code.

...with step-by-step tutorials on real-world datasets

Discover how in my new Ebook:
Machine Learning Algorithms From Scratch

It covers 18 tutorials with all the code for 12 top algorithms, like:
Linear Regression, k-Nearest Neighbors, Stochastic Gradient Descent and much more...

Finally, Pull Back the Curtain on
Machine Learning Algorithms

Skip the Academics. Just Results.

See What's Inside

835 Responses to How to Code a Neural Network with Backpropagation In Python (from scratch)

  1. Avatar
    Talk Data To Me November 7, 2016 at 9:28 pm #

    That’s what I was looking for. Write a neural network without any libraries (scikit, keras etc.) Thnak you very much!

    • Avatar
      Jason Brownlee November 8, 2016 at 9:51 am #

      I’m glad to hear it!

      • Avatar
        sari dewi August 16, 2019 at 11:55 am #

        Hy Mr. jason , i try your code to make a neural network with backpropagation method, I using jupyter notebook anaconda and pyhton 3.7 64 bit, when i try this code

        seed(1)
        # load and prepare data
        filename =’datalatih.csv’
        dataset = load_csv(filename)
        for i in range(len(dataset[0])-1):
        str_column_to_float(dataset, i)
        # convert class column to integers
        str_column_to_int(dataset, len(dataset[0])-1)
        # normalize input variables
        minmax = dataset_minmax(dataset)
        normalize_dataset(dataset, minmax)
        # evaluate algorithm
        n_folds =5
        l_rate =0.3
        n_epoch =500
        n_hidden =5
        scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)

        print (‘Scores: %s’ % scores)
        print (‘Mean Accuracy: %.3f%%’ % (sum(scores)/float(len(scores))))

        but I get error message

        IndexError Traceback (most recent call last)
        in
        196 n_epoch =500
        197 n_hidden =5
        –> 198 scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
        199
        200 print (‘Scores: %s’ % scores)

        in evaluate_algorithm(dataset, algorithm, n_folds, *args)
        79 test_set.append(row_copy)
        80 row_copy[-1] = None
        —> 81 predicted = algorithm(train_set, test_set, *args)
        82 actual = [row[-1] for row in fold]
        83 accuracy = accuracy_metric(actual, predicted)

        in back_propagation(train, test, l_rate, n_epoch, n_hidden)
        171 n_outputs = len(set([row[-1] for row in train]))
        172 network = initialize_network(n_inputs, n_hidden, n_outputs)
        –> 173 train_network(network, train, l_rate, n_epoch, n_outputs)
        174 predictions = list()
        175 for row in test:

        in train_network(network, train, l_rate, n_epoch, n_outputs)
        148 outputs = forward_propagate(network, row)
        149 expected = [0 for i in range(n_outputs)]
        –> 150 expected[row[-1]] = 1
        151 backward_propagate_error(network, expected)
        152 update_weights(network, row, l_rate)

        IndexError: list assignment index out of range

        what my mistake? is there missing code? thankyou

      • Avatar
        Febry Triyadi November 22, 2019 at 6:53 pm #

        Hi Mr.Jason i have trouble with your code. Please check it, i not understand with expected[row[-1]] = 1

        IndexError Traceback (most recent call last)
        in ()
        13 n_epoch = 500
        14 n_hidden = 5
        —> 15 scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
        16 print(‘Scores: %s’ % scores)
        17 print(‘Mean Accuracy: %.3f%%’ % (sum(scores)/float(len(scores))))

        2 frames
        in train_network(network, train, l_rate, n_epoch, n_outputs)
        50 outputs = forward_propagate(network, row)
        51 expected = [0 for i in range(n_outputs)]
        —> 52 expected[row[-1]] = 1
        53 backward_propagate_error(network, expected)
        54 update_weights(network, row, l_rate)

        IndexError: list assignment index out of range

    • Avatar
      WB February 20, 2018 at 3:07 pm #

      I experienced the following applying the Backpropagation algorithm to the wheat seeds dataset. I am wondering how to resolve the errors? Thank you
      —————————————————————————
      ValueError Traceback (most recent call last)
      in ()
      184 dataset = load_csv(filename)
      185 for i in range(len(dataset[0])-1):
      –> 186 str_column_to_float(dataset, i)
      187 # convert class column to integers
      188 str_column_to_int(dataset, len(dataset[0])-1)

      in str_column_to_float(dataset, column)
      20 def str_column_to_float(dataset, column):
      21 for row in dataset:
      —> 22 row[column] = float(row[column].strip())
      23
      24 # Convert string column to integer

      ValueError: could not convert string to float:

      • Avatar
        Jason Brownlee February 21, 2018 at 6:35 am #

        Are you using Python 2?

        • Avatar
          wb February 21, 2018 at 2:51 pm #

          Yes I am

        • Avatar
          harshith October 5, 2018 at 8:28 pm #

          hi bro whass up

      • Avatar
        Mike Harney March 5, 2018 at 9:53 am #

        Hi wb, I’m on 3.6 and I found the same issue. Maybe you can answer this Jason, but it looks like the some of the data is misaligned in the sample. When opened in Excel, there are many open spaces followed by data jutted out to an extra column. I assume this is unintentional, and when I corrected the spacing, it appeared to work for me.

        • Avatar
          Jason Brownlee March 6, 2018 at 6:08 am #

          The code was written and tested with Python 2.7.

          • Avatar
            JU April 23, 2018 at 7:24 am #

            Mike is right – the dataset from the UCI website is slightly defective: It has two tabs in some places where there should be only one. This needs to be corrected during the conversion to CSV. In Excel the easiest way is to use the text importer and then click the “Treat consecutive delimiters as one” checkbox.

          • Avatar
            Jason Brownlee April 23, 2018 at 7:37 am #

            Here is the dataset ready to use:
            https://raw.githubusercontent.com/jbrownlee/Datasets/master/wheat-seeds.csv

      • Avatar
        Alexis Batyk August 29, 2018 at 6:22 am #

        [SOLVED]
        i have the same issue with

        https://raw.githubusercontent.com/jbrownlee/Datasets/master/wheat-seeds.csv

        there is still dirty that csv

        use a text editor -> select search and replace tool -> search ‘,,’ replace ‘,’ and it works

        • Avatar
          Jason Brownlee August 29, 2018 at 8:16 am #

          I don’t have such problems on Py 3.6.

        • Avatar
          Jackson Scott October 1, 2018 at 9:08 am #

          thanks, this worked for me as well. The csv file had some tabbed over and others correct.

          • Avatar
            Dharmendra Kumar September 3, 2019 at 7:38 pm #

            Thank you

          • Avatar
            Jason Brownlee September 4, 2019 at 5:56 am #

            You’re welcome.

      • Avatar
        Deng October 14, 2018 at 5:50 pm #

        The data in the seeds_dataset file contains the backspace key, and it is ok to reset the data

    • Avatar
      George Dong May 12, 2019 at 6:14 pm #

      I echo that too!

      Just one question please! In your code below, I could not understand why multiplication is used instead of division in the last line. Though division caused divide by zero problem.

      My understanding is gradient = dError / dWeights. Therefore, dWeights = dError / gradient
      i.e. delta = errors[j] / derivative

      Did we somehow make changes here, for calculation reasons, to use arctan instead of tan for gradient?

      I’d be grateful if you could help.

      • Avatar
        Dhaila November 22, 2020 at 12:36 am #

        Hi Dong,

        I was looking into the code. And have the same ques as you raised above. That why we are multiplying. Can I please ask you if you get any understanding of that?

        • Avatar
          Francisco December 6, 2022 at 10:11 pm #

          Hi Dhaila, sorry if this comes a bit late, but for anyone wondering why it is multiplied and not divided, it is due to the chain rule. The core idea of backpropagation is to find the gradient of the cost function i.e. error with respect to the weights, in other words, dE/dw. However, the error we have computed is (label-output), which is equivalent to dE/dy; then, we have computed the derivative from the neuron, which is dy/dw. Hence, by multiplying, you will get dE/dy *dy/dw = dE/dw which is what we are looking for. This explanation is simplified, if you would like a more in-depth answer, I would suggest reading chapter 8 from Deep Learning by Ian Goodfellow or Machine learning by Bishop. They go into more depth about this topic. Also, Jason, feel free to correct me if you think I might have misrepresented anything

    • Avatar
      Maria January 12, 2020 at 5:28 pm #

      Hi Jason ,I need code of back propagation artificial neural network for predicting population dynamics of insects pests.

  2. Avatar
    MO November 8, 2016 at 9:26 am #

    where can i see your data set, i want to see how it looked like

    • Avatar
      Jason Brownlee November 8, 2016 at 10:01 am #

      Hi MO.

      The small contrived dataset used for testing is listed inline in the post in section 4.2

      The dataset used for the full example is on the UCI ML repository, linked in the section titled “Wheat Seeds Dataset”. Here is the direct link:
      http://archive.ics.uci.edu/ml/datasets/seeds

      • Avatar
        Solene EBA March 4, 2022 at 11:56 pm #

        Hello do you have any ideas to calculate the Rsquared

        • Avatar
          James Carmichael March 5, 2022 at 12:36 pm #

          Hi Solene..Please clarify what code listing you have a question about so that I may better assist you.

  3. Avatar
    prakash November 11, 2016 at 12:40 am #

    in two class classification for 0 the expected value is [1,0] for 1 its is [0,1].
    how will be the output vectors for more than two class??

    • Avatar
      Jason Brownlee November 11, 2016 at 10:02 am #

      Hi prakash,

      For multi-class classification, we can extend the one hot encoding.

      Three class values for “red”, “green” “blue” can be represented as an output vector like:
      1, 0, 0 for red
      0, 1, 0 for green
      0, 0, 1 for blue

      I hope that helps.

  4. Avatar
    Rakesh November 13, 2016 at 3:41 pm #

    Hi, Jason.
    You’ve mentioned that there are 3 output classes.
    How do we check the values which come under the 3 classes / clusters?
    Could we print the data which fall under each class?

    • Avatar
      Jason Brownlee November 14, 2016 at 7:35 am #

      Hi Rakesh,

      The data does belong to 3 classes. We can check the skill of our model by comparing the predicted classes to the actual/expected classes and calculate an accuracy measure.

  5. Avatar
    Alex November 16, 2016 at 12:35 pm #

    I’m confused why the activation method iterates from 0 to len(inputs) – 1 instead of from 0 to len(weights) – 1. Am I missing something?

    • Avatar
      Jason Brownlee November 17, 2016 at 9:47 am #

      Hi Alex,

      The length of weights is the length of the input + 1 (to accommodate the bias term).

      We add the bias term first, then we add the weighted inputs. This is why we iterate over input values.

      Does that help?

      • Avatar
        Alex November 17, 2016 at 12:29 pm #

        When I step through the code above for the ‘forward_propagate’ test case, I see the code correctly generate the output for the single hidden node but that output doesn’t get correctly processed when determining the outputs for the output layer. As written above in the activate function ‘for i in range(len(inputs)-1):’, when the calculation gets to the activate function for the output node for class=0, since ‘inputs’ has a single element in it (the output from the single hidden node), ‘len(inputs) – 1’ equals 0 so the for loop never executes. I’m assuming the code is supposed to read ‘for i in range(len(weights) -1):’ Does that make sense?

        I’m just trying to make sure I don’t fundamentally misunderstand something and improve this post for other readers. This site has been really, really helpful for me.

        • Avatar
          Jason Brownlee November 18, 2016 at 8:27 am #

          I’m with you now, thanks for helping me catch-up.

          Nice spot. I’ll fix up the tutorial.

          Update: Fixed. Thanks again mate!

  6. Avatar
    Tomasz Panek November 21, 2016 at 1:23 am #

    # Update network weights with error
    def update_weights(network, row, l_rate):
    for i in range(len(network)):
    inputs = row
    if i != 0:
    inputs = [neuron[‘output’] for neuron in network[i – 1]]
    for neuron in network[i]:
    for j in range(len(inputs)-1):
    neuron[‘weights’][j] += l_rate * neuron[‘delta’] * inputs[j]
    neuron[‘weights’][-1] += l_rate * neuron[‘delta’]

    In this fragment:
    for j in range(len(inputs)-1):
    neuron[‘weights’][j] += l_rate * neuron[‘delta’] * inputs[j]
    neuron[‘weights’][-1] += l_rate * neuron[‘delta’]

    If inputs length = 1, you are not updating weights, it’s correct? You are updating only bias, because in hidden layer is only one neuron.

  7. Avatar
    Tomasz November 21, 2016 at 1:34 am #

    Hello. In method update_weight you are doing for j in range(len(inputs) – 1). If inputs lenght = 1, you aren’t updating weights. It’s correct? Hidden layer have one neuron so in output layer weights aren’t updated

    • Avatar
      Jason Brownlee November 22, 2016 at 6:54 am #

      Hi Tomasz,

      The assumption here is that the input vector always contains at least one input value and an output value, even if the output is set to None.

      You may have found a bug though when updating the layers. I’ll investigate and get back to you.

      • Avatar
        Jason Brownlee January 3, 2017 at 10:17 am #

        Thanks Tomasz, this was indeed a bug.

        I have updated the update_weights() function in the above code examples.

        • Avatar
          Jerry Jones October 16, 2018 at 8:18 am #

          I don’t understand how update_weights updates the NN. There is no global variable or return from the function. What am I missing?

          • Avatar
            Jason Brownlee October 16, 2018 at 2:33 pm #

            The weights are passed in by reference and modified in place.

            This is an advanced tutorial, I’d recommend using Keras for beginners.

  8. Avatar
    Michael December 13, 2016 at 4:15 am #

    Hi, Thanks for the tutorial, I’m doing a backpropagation project at the moment so its been really useful.

    I was a little confused on the back-propagation error calculation function. Does “if i != len(network)-1:” mean that if the current layer isn’t the output layer then this following code is run or does it mean that the current layer is an output layer?

    • Avatar
      Jason Brownlee December 13, 2016 at 8:08 am #

      Glad to hear it Michael.

      The line means if the index i is not equal to the index of the last layer of the network (the output layer), then run code inside the block.

  9. Avatar
    Michael January 5, 2017 at 7:53 am #

    I have another question.
    Would it be possible to extend the code from this tutorial and create a network that trains using the MNIST handwritten digit set? using a input unit to represent each pixel in the image. I’m also not sure whether/how I could use feature extractors for the images.

    I have a project where I have to implement the Backpropagation algorithm with possibly the MNIST handwritten digit training set.

    I hope my question makes sense!

    • Avatar
      Jason Brownlee January 5, 2017 at 9:42 am #

      Sure Michael, but I would recommend using a library like Keras instead as this code is not written for performance.

      Load an image as a long list of pixel integer values, convert to floats and away you go. No feature extraction needed for a simple MLP implementation. You should get performance above 90%.

  10. Avatar
    Calin January 6, 2017 at 10:40 pm #

    Hi Jason,

    Great post!

    I have a concern though:

    In train_network method there are these two lines of code:

    expected = [0 for i in range(n_outputs)]
    expected[row[-1]] = 1

    Couldn’t be the case that expected[row[-1]] = 1 will throw IndexError, as n_outputs is the size of the training set which is a subset of the dataset and row basically contains values from the whole dataset?

    • Avatar
      Jason Brownlee January 7, 2017 at 8:37 am #

      Hi Calin,

      If I understand you correctly, No. The n_outputs var is the length of the number of possible output values.

      Maybe put some print() statements in to help you better understand what values variables have.

      • Avatar
        Calin January 7, 2017 at 9:48 pm #

        Hmm..I ran the entire code (with the csv file downloaded from http://archive.ics.uci.edu/ml/datasets/seeds), added some breakpoints and this is what I got after a few iterations:

        n_outputs = 168
        row[-1] = 201

        which is causing IndexError: list assignment index out of range.

        • Avatar
          Adriaan January 11, 2017 at 4:27 am #

          I’ve got the same error, That my list assignment index is out of range

          • Avatar
            Jason Brownlee January 11, 2017 at 9:29 am #

            Sorry to hear that, did you try running the updated code?

          • Avatar
            Ivan January 16, 2017 at 10:28 am #

            This is error of csv read. Try to reformat it with commas. For me it worked

          • Avatar
            Jason Brownlee January 16, 2017 at 10:45 am #

            What was the problem and fix exactly Ivan?

          • Avatar
            Bob February 5, 2017 at 10:59 am #

            The data file (http://archive.ics.uci.edu/ml/machine-learning-databases/00236/seeds_dataset.txt) has a few lines with double tabs (\t\t) as the delimiter — removing the double tabs and changing tabs to commas fixed it.

            Thanks for the good article.

          • Avatar
            Jason Brownlee February 6, 2017 at 9:42 am #

            Thanks for the note Bob.

          • Avatar
            Rowen Bruce October 20, 2018 at 8:52 pm #

            updated code

    • Avatar
      Adriaan January 11, 2017 at 5:50 am #

      I’ve had the same error at the ‘train_network’ function. Is your dataset fine? I’ve had some problems because the CSV file wasn’t loaded correctly due to my regional windows settings. I’ve had to adjust my settings and everything worked out alright.

      http://superuser.com/questions/783060/excel-save-as-csv-options-possible-to-change-comma-to-pipe-or-tab-instead

  11. Avatar
    Stanley January 8, 2017 at 3:15 pm #

    Thanks for such a good article.

    Just one question: in the equation “weight = weight + learning_rate * error * input”, why there is an “input”? IMO it should be: “weight = weight + learning_rate * error”?

    • Avatar
      Jason Brownlee January 9, 2017 at 7:47 am #

      The var names and explanation are correct.

      The update equation is:

      For the input layer the input are the input data, for hidden layers the input is the output of the prior layer.

      • Avatar
        Herman October 21, 2021 at 6:33 pm #

        I think the formula should be weight = weight – learning_rate * error * input instead of +. Am I right?

        • Avatar
          Adrian Tam October 22, 2021 at 3:50 am #

          You’re right if you comparing what it is done here to your textbook! However, notice the line “errors.append(expected[j] – neuron[‘output’])”, hence the error is expressed negative of what you expect. So this is corrected.

          Probably I should revise the code to make it consistent with other people’s implementation.

  12. Avatar
    Madwadasa January 13, 2017 at 3:31 am #

    Jason,

    Thanks for the code and post.
    Why is “expected” in expected = [0 for i in range(n_outputs)] initialized to [0,0] ?
    Should not the o/p values be taken as expected when training the model ?
    i.e for example in case of Xor should not 1 be taken as the expected ?

    • Avatar
      Jason Brownlee January 13, 2017 at 9:16 am #

      Hi Madwadasa,

      Expected is a one-hot encoding. All classes are “0” expect the actual class for the row which is marked as a “1” on the next line.

  13. Avatar
    Michael January 19, 2017 at 3:44 am #

    Hello, I have a couple more questions. When training the network with a dataset, does the error at each epoch indicate the distance between the predicted outcomes and the expected outcomes together for the whole dataset? Also when the mean accuracy is given in my case being 13% when I used the MNIST digit set, does this mean that the network will be correct 13% of the time and would have an error rate of 87%?

    • Avatar
      Jason Brownlee January 19, 2017 at 7:38 am #

      Hi Michael,

      The epoch error does capture how wrong the algorithm is on all training data. This may or may not be a distance depending on the error measure used. RMSE is technically not a distance measure, you could use Euclidean distance if you like, but I would not recommend it.

      Yes, in generally when the model makes predictions your understanding is correct.

  14. Avatar
    Bernardo Galvão January 24, 2017 at 3:51 am #

    Hi Jason,

    in the excerpt regarding error of a neuron in a hidden layer:

    “Where error_j is the error signal from the jth neuron in the output layer, weight_k is the weight that connects the kth neuron to the current neuron and output is the output for the current neuron.”

    is the k-th neuron a neuron in the output layer or a neuron in the hidden layer we’re “on”? What about the current neuron, are you referring to the neuron in the output layer? Sorry, english is not my native tongue.

    Appreciate your work!

    Bernardo

  15. Avatar
    anonymous February 1, 2017 at 1:42 am #

    It would have been better if recall and precision were printed. Can somebody tell me how to print them in the above code.

  16. Avatar
    kehinde kolade February 6, 2017 at 8:29 pm #

    Hello Jason, great tutorial, I am developer and I do not really know much about this machine learning thing but I need to extend this your code to incorporate the Momentum aspect to the training, can you please explain how I can achieve this extension?

    • Avatar
      Jason Brownlee February 7, 2017 at 10:14 am #

      Sorry, I don’t have the capacity to write or spell out this change for you.

      My advice would be to read a good book on the topic, such as Neural Smithing: http://amzn.to/2ld9ds0

  17. Avatar
    ibrahim February 18, 2017 at 2:21 am #

    Hi Jason,
    I have my own code written in C++, which works similar to your code. My intention is to extend my code to convolutional deep neural nets, and i have actually written the convolution, Relu and pooling functions however i could not begin to apply the backpropagation i have used in my shallow neural net, to the convolutional deep net, cause i really cant imagine the transition of the backpropagation calculation between the convolutional layers and the standard shallow layers existing in the same system. I hoped to find a source for this issue however i always come to the point that there is a standard backpropagation algorithm given for shallow nets that i applied already. Can you please guide me on this problem?

    • Avatar
      Jason Brownlee February 18, 2017 at 8:42 am #

      I”d love to guide you but I don’t have my own from scratch implementation of CNNs, sorry. I’m not best placed to help at the moment.

      I’d recommend reading code from existing open source implementations.

      Good luck with your project.

  18. Avatar
    matias February 22, 2017 at 3:34 pm #

    Thank you, I was looking for exactly this kind of ann algorith. A simple thank won’t be enough tho lol

  19. Avatar
    Manohar Katam February 26, 2017 at 3:40 pm #

    Great one! .. I have one doubt .. the dataset seeds contains missing features/fields for some rows.. how you are handling that …

    • Avatar
      Jason Brownlee February 27, 2017 at 5:49 am #

      You could set the missing values to 0, you could remove the rows with missing values, you could impute the missing values with mean column values, etc.

      Try a few different methods and see what results in the best performing models.

      • Avatar
        Manohar Katam March 1, 2017 at 2:59 pm #

        What if I have canonical forms like “male” or “female” in my dataset… Will this program work even with string data..

        • Avatar
          Jason Brownlee March 2, 2017 at 8:11 am #

          Hi Manohar,

          No, you will need to convert them to integers (integer encoding) or similar.

  20. Avatar
    Wissal ARGOUBI February 27, 2017 at 11:12 pm #

    Great job! this is what i was looking for ! thank you very much .
    However i already have a data base and i didn’t know how to make it work with this code how can i adapt it on my data
    Thank you

  21. Avatar
    Shweta Gupta March 5, 2017 at 4:37 am #

    Thanks for such a great article..
    I have one question, in update_weights why you have used weight=weight+l_rate*delta*input rather than weight=weight+l_rate*delta?

  22. Avatar
    Sittha March 13, 2017 at 1:23 pm #

    Thanks for a good tutorial.
    I have some IndexError: list assignment index out of range. And I cannot fix it with comma or full-stop separator.

    • Avatar
      Jason Brownlee March 14, 2017 at 8:11 am #

      What is the full error you are getting?

      Did you copy-paste the full final example and run it on the same dataset?

      • Avatar
        Sittha March 24, 2017 at 3:36 am #

        line 151 :
        expected[row[-1]] = 1
        IndexError : list assignment index out of range

        • Avatar
          Jason Brownlee March 24, 2017 at 8:00 am #

          Is this with a different dataset?

          • Avatar
            Benji Weiss May 11, 2017 at 5:31 am #

            if it is a different dataset, what do i need to do to not get this error

  23. Avatar
    Karan March 16, 2017 at 6:26 pm #

    The dataset that was given was for training the network. Now how do we test the network by providing the 7 features without giving the class label(1,2 or 3) ?

    • Avatar
      Jason Brownlee March 17, 2017 at 8:27 am #

      You will have to adapt the example to fit the model on all of the training data, then you can call predict() to make predictions on new data.

      • Avatar
        Karan March 19, 2017 at 7:43 pm #

        Ok Jason, i’ll try that and get back to you! Thank you!

  24. Avatar
    Karan March 19, 2017 at 7:48 pm #

    Just a suggestion for the people who would be using their own dataset(not the seeds_dataset) for training their network, make sure you add an IF loop as follows before the 45th line :
    if minmax[i][1]!=minmax[i][0]

    This is because your own dataset might contain same values in the same column and that might cause a divide by zero error.

  25. Avatar
    Li Qun March 25, 2017 at 5:45 pm #

    Thanks jason for the amazing posts of your from scratch pyhton implementations! i have learned so much from you!

    I have followed through both your naive bayes and backprop posts, and I have a (perhaps quite naive) question:

    what is the relationship between the two? did backprop actually implement bayesian inference (after all, what i understand is that bayesian = weights being updated every cycle) already? perhaps just non-gaussian? so.. are non-gaussian PDF weight updates not bayesian inference?

    i guess to put it simply : is backpropagation essentially a bayesian inference loop for an n number of epochs?

    I came from the naive bayes tutorial wanting to implement backpropagation together with your naive bayes implementation but got a bit lost along the way.

    sorry if i was going around in circles, i sincerely hope someone would be able to at least point me on the right direction.

    • Avatar
      Jason Brownlee March 26, 2017 at 6:11 am #

      Great question.

      No, they are both very different. Naive bayes is a direct use of the probabilities and bayes theorem. The neural net is approximating a mapping function from inputs and outputs – a very different approach that does not directly use the joint probability.

  26. Avatar
    Chiraag March 26, 2017 at 10:10 pm #

    How did you decide that the number of folds will be 5 ? Could you please explain the significance of this number. Thank You.

    • Avatar
      Jason Brownlee March 27, 2017 at 7:54 am #

      In this case, it was pretty arbitary.

      Generally, you want to split the data so that each fold is representative of the dataset. The objective measure is how closely the mean performance reflect the actual performance of the model on unseen data. We can only estimate this in practice (standard error?).

  27. Avatar
    Li Qun March 27, 2017 at 10:19 pm #

    Dear Jason,

    thank you for the reply! I read up a bit more about the differences between Naive Bayes (or Bayesian Nets in general) and Neural Networks and found this Quora answer that i thought was very clear. I’ll put it up here to give other readers a good point to go from:

    https://www.quora.com/What-is-the-difference-between-a-Bayesian-network-and-an-artificial-neural-network

    TL:DR :
    – they look the same, but every node in a Bayesian Network has meaning, in that you can read a Bayesian network structure (like a mind map) and see what’s happening where and why.
    – a Neural Network structure doesn’t have explicit meaning, its just dots that link previous dots.
    – there are more reasons, but the above two highlighted the biggest difference.

    Just a quick guess after playing around with backpropagation a little: the way NB and backprop NN would work together is by running Naive Bayes to get a good ‘first guess’ of initial weights that are then run through and Neural Network and Backpropagated?

    • Avatar
      Jason Brownlee March 28, 2017 at 8:23 am #

      Please note that a Bayesian network and naive bayes are very different algorithms.

  28. Avatar
    Melissa March 27, 2017 at 10:54 pm #

    Hi Jason,
    Further to this update:

    Update Jan/2017: Changed the calculation of fold_size in cross_validation_split() to always be an integer. Fixes issues with Python 3.

    I’m still having this same problem whilst using python 3, on both the seeds data set and my own. It returns an error at line 75 saying ‘list object has no attribute ‘sum” and also saying than ‘an integer is required.’

    Any help would be very much appreciated.
    Overall this code is very helpful. Thank you!

    • Avatar
      Jason Brownlee March 28, 2017 at 8:24 am #

      Sorry to hear that, did you try copy-paste the complete working example from the end of the post and run it on the same dataset from the command line?

      • Avatar
        Melissa March 28, 2017 at 9:29 am #

        Yes I’ve done that, but still the same problem!

  29. Avatar
    david March 29, 2017 at 6:16 am #

    Hello jason,

    please i need help on how to pass the output of the trained network into a fuzzy logic system if possible a code or link which can help understand better. Thank you

  30. Avatar
    Aditya April 2, 2017 at 3:57 pm #

    Awesome Explanation

  31. Avatar
    Raunak Jain April 6, 2017 at 5:20 pm #

    Hello Jason
    I m getting list assignment index out or range error. How to handle this error?

    • Avatar
      Jason Brownlee April 9, 2017 at 2:37 pm #

      The example was developed for Python 2, perhaps this is Python version issue?

  32. Avatar
    Marco April 6, 2017 at 9:37 pm #

    Thanks but I think python is not a good choice…

    • Avatar
      Jason Brownlee April 9, 2017 at 2:40 pm #

      I think it is a good choice for learning how backprop works.

      What would be a better choice?

  33. Avatar
    Agrawal April 6, 2017 at 9:38 pm #

    Hey, Jason Thanks for this wonderful lecture on Neural Network.

    As I am working on Iris Recognition, I have extracted the features of each eye and store it in .csv file, Can u suggest how further can I build my Backpropagation code.
    As when I run your code I am getting many errors.
    Thank you

  34. Avatar
    Jack April 7, 2017 at 3:42 pm #

    Could you please convert this iterative implementation into matrix implementation?

  35. Avatar
    Jk April 12, 2017 at 5:04 am #

    Hi Jason,

    In section 4.1 , may you please explain why you used ### inputs = row[:-1] ### ?

    Thanks

    • Avatar
      Jason Brownlee April 12, 2017 at 7:58 am #

      Yes. By default we are back-propagating the error of the expected output vs the network output (inputs = row[:-1]), but if we are not the output layer, propagate the error from the previous layer in the network (inputs = [neuron[‘output’] for neuron in network[i – 1]]).

      I hope that helps.

      • Avatar
        JK April 13, 2017 at 3:59 am #

        Thanks for your respond. I understand what you said , the part I am no understanding is the [:-1] . why eliminating the last list item ?

        • Avatar
          Jason Brownlee April 13, 2017 at 10:10 am #

          It is a range from 0 to the second last item in the list, e.g. (0 to n-1)

        • Avatar
          Amer April 6, 2018 at 7:22 am #

          Because the last Item in the weights array is the biass

  36. Avatar
    Prem Puri April 12, 2017 at 8:18 pm #

    In function call, def backward_propagate_error(network, expected):
    how much i understand is , it sequentially pass upto
    if i != len(network)-1:
    for j in range(len(layer)):
    error = 0.0
    for neuron in network[i + 1]:
    error += (neuron[‘weights’][j] * neuron[‘delta’])
    My question is which value is used in neuron[‘delta’]

    • Avatar
      Jason Brownlee April 13, 2017 at 10:01 am #

      delta is set in the previous code block. It is the error signal that is being propagated backward.

      • Avatar
        Nishu March 25, 2018 at 11:32 am #

        I’m sorry, but I still can’t find the location where delta is set and hence, the code gives error.
        Where is the delta set for the first time?

  37. Avatar
    Prem Puri April 14, 2017 at 3:20 am #

    Thanks very much!

  38. Avatar
    youssef oumate April 26, 2017 at 4:53 pm #

    Hi Jason

    Thank you very much for this awesome implementation of neural network,
    I have a question for you : I want to replace the activation function from Sigmoid
    to RELU . So, what are the changes that I should perform in order to get
    correct predictions?

  39. Avatar
    Yahya Alaa April 30, 2017 at 2:38 am #

    Hi Jason,
    Thank you very much for this wonderful implementation of Neural Network, it really helped me a lot to understand neural networks concept,

    n_inputs = len(dataset[0]) – 1
    n_outputs = len(set([row[-1] for row in dataset]))
    network = initialize_network(n_inputs, 2, n_outputs)
    train_network(network, dataset, 0.5, 20, n_outputs)

    What do n_inputs and n_outputs refer to? According to the small dataset used in this section, is n_inputs only 2 and n_outputs only 2 (0 or 1) or I am missing something?

    • Avatar
      Jason Brownlee April 30, 2017 at 5:31 am #

      Input/outputs refers to the number of input and output features (columns) in your data.

    • Avatar
      Yahya Alaa May 3, 2017 at 1:42 pm #

      Is the program training the network for 500 epochs for each one of the k-folds and then testing the network with the testing data set?

      • Avatar
        Jason Brownlee May 4, 2017 at 8:02 am #

        Hi Yahya,

        5-fold cross validation is used.

        That means that 5 models are fit and evaluated on 5 different hold out sets. Each model is trained for 500 epochs.

        I hope that makes things clearer Yahya.

        • Avatar
          Yahya Alaa May 4, 2017 at 8:17 am #

          Yes you made things clear to me, Thank you.
          I have two other questions,
          How to know when to stop training the network to avoid overfitting?
          How to choose the number of neurons in the hidden layer?

          • Avatar
            Jason Brownlee May 5, 2017 at 7:27 am #

            You can use early stopping, to save network weights when the skill on a validation set stops improving.

            The number of neurons can be found through trial and error.

          • Avatar
            Yahya Alaa May 6, 2017 at 8:48 am #

            I am working on a program that recognizes handwritten digits, the dataset is consisting of pictures (45*45) pixels each, which is 2025 input neurons, this causes me a problem in the activation function, the summation of (weight[i] * input[i]) is big, then it gives me always a result of (0.99 -> 1) after putting the value of the activation function in the Sigmoid function, any suggestions?

          • Avatar
            Jason Brownlee May 7, 2017 at 5:31 am #

            I would recommend using a Convolutional Neural Network rather than a Multilayer Perceptron.

  40. Avatar
    morok April 30, 2017 at 3:56 am #

    In section 3.2. Error Backpropagation, where did output numbers came from for testing backpropagation

    ‘output’: 0.7105668883115941
    ‘output’: 0.6213859615555266
    ‘output’: 0.6573693455986976

    Perhaps from outputs on test forward propagation [0.6629970129852887, 0.7253160725279748] taking dd -> derivative = output * (1.0 – output), problem is they don’t match, so I’m a bit lost here…

    thanks!

    Awesome article!!!

    • Avatar
      Jason Brownlee April 30, 2017 at 5:34 am #

      In that example, the output and weights were contrived to test back propagation of error. Note the “delta” in those outputs.

    • Avatar
      Massa November 25, 2017 at 7:36 am #

      hello Dr Jason…

      I was wondering …

      n_outputs = len(set([row[-1] for row in dataset]))

      this line, how does it give the number of output features?
      when I print it gives the number of the dataset(number of rows, not columns)

  41. Avatar
    Umamaheswaran May 8, 2017 at 9:49 pm #

    Hi Jason,

    I am using the MNIST data set to implement a handwritten digit classifier. How many training examples will be needed to get a perfomance above 90%.

  42. Avatar
    Huyen May 9, 2017 at 6:32 pm #

    Hi Jason,

    Your blog is totally awesome not only by this post but also for the whole series about neural network. Some of them explained so much useful thing than others on Internet. They help me a lot to understand the core of network instead of applying directly Keras or Tensorflow.

    Just one question, if I would like to change the result from classification to regression, which part in back propagation I need to change and how?

    Thank you in advance for your answer

    • Avatar
      Jason Brownlee May 10, 2017 at 8:46 am #

      Thanks Huyen.

      You would change the activation function in the output layer to linear (e.g. no transform).

  43. Avatar
    TGoritsky May 12, 2017 at 12:41 am #

    Hi Jason,

    I am playing around with your code to better understand how the ANN works. Right now I am trying to do predictions with a NN, that is trained on my own dataset, but the program returns me one class label for all rows in a test dataset. I understand, that normalizing dataset should help, but it doesn`t work (I am using your minmax and normalize_dataset functions). Also, is there a way to return prediction for one-dimensional dataset?
    Here is the code (sorry for lack of formatting):
    def make_predictions():
    dataset = [[29,46,107,324,56,44,121,35,1],
    [29,46,109,327,51,37,123,38,1],
    [28,42,107,309,55,32,124,38,1],
    [40,112,287,59,35,121,36,1],
    [27,43,129,306,75,41,107,38,1],
    [28,38,127,289,79,40,109,37,1],
    [29,37,126,292,77,35,100,34,1],
    [30,40,87,48,77,51,272,80,2],
    [26,37,88,47,84,44,250,80,2],
    [29,39,91,47,84,46,247,79,2],
    [28,38,85,45,80,47,249,78,2],
    [28,36,81,43,76,50,337,83,2],
    [28,34,75,41,83,52,344,81,2],
    [30,38,80,46,71,53,347,92,2],
    [28,35,72,45,64,47,360,101,2]]
    network = [[{‘weights’: [0.09640510259345969, 0.37923370996257266, 0.5476265202749506, 0.9144446394025773, 0.837692750149296, 0.5343300438262426, 0.7679511829130964, 0.5325204151469501, 0.06532276962299033]}],
    [{‘weights’: [0.040400453542770665, 0.13301701225112483]}, {‘weights’: [0.1665525504275246, 0.5382087395561351]}, {‘weights’: [0.26800994395551214, 0.3322334781304659]}]]
    # minmax = dataset_minmax(dataset)
    # normalize_dataset(dataset, minmax)
    for row in dataset:
    prediction = predict(network, row)
    print(‘Expected=%d, Got=%d’ % (row[-1], prediction))

  44. Avatar
    Tomo May 18, 2017 at 6:22 pm #

    Hi Jason!
    In the function “backward_propagate_error”, when you do this:

    neuron[‘delta’] = errors[j] * transfer_derivative(neuron[‘output’])

    The derivative should be applied on the activation of that neuron, not to the output . Am I right??

    neuron[‘delta’] = errors[j] * transfer_derivative(activate(neuron[‘weights’], inputs))

    And inputs is:
    inputs = row[-1]
    if i != 0:
    inputs = [neuron[‘output’] for neuron in self.network[i-1]]

    Thank you! The post was really helpful!

    • Avatar
      Adika February 2, 2021 at 2:30 am #

      I think you are right but not sure.

  45. Avatar
    Tina May 26, 2017 at 3:49 am #

    Hello Jason!

    This is a very interesting contribution to the community 🙂
    Have you tried using the algorithm with other activation functions?
    I tried with Gaussian, tanh and sinx, but the accuracy was not that high, so I think that I omitted something. What I altered were the activation functions and the derivatives. Is there something else that needs to be changed?

    • Avatar
      Jason Brownlee June 2, 2017 at 11:49 am #

      Sigmoid was the defacto standard for many years because it performs well on many different problems.

      Now the defacto standard is ReLU.

      • Avatar
        Manu June 6, 2017 at 8:50 pm #

        Sigmoid and ReLU are transfer functions right ?
        Activation function is just the sum of all weights and inputs

  46. Avatar
    vishwanathan May 27, 2017 at 8:08 pm #

    Thanks for the great post. Here is some observation that I am not able to understand. In the back ward propagate you are not taking all the weights and only considering the jth. Can you kindly help understand. I was under the impression that the delta from output is applied across all the weights,
    for neuron in network[i + 1]:
    error += (neuron[‘weights’][j] * neuron[‘delta’])

    • Avatar
      vishwanathan May 27, 2017 at 8:14 pm #

      I understand you do not want to take in the bias weight hence the exclusion of the last weight in neuron. I kind of get stumped on bias.

  47. Avatar
    vishwanathan May 27, 2017 at 9:12 pm #

    Thanks for the great article. In the backward propagate, the delta value is applied for each weight across the neuron and the error is summed. I am curious why is the delta not applied to individual weights of the neuron and the error summed for that neuron. Can you please clarify?

  48. Avatar
    Josue May 29, 2017 at 3:12 am #

    Why don’t you split the data into TrainData and TestData, like 80% of the dataset for training and 20% for testing, because if you train with 100% of rows of the dataset and then test some rows of the dataset the accuracy will be good . But if you put new data on the seeds.csv the model will work with less accuracy, Right?

  49. Avatar
    Josue May 29, 2017 at 11:08 am #

    Thanks for the post! I have a question about cross-validation. The dataset of seeds is perfect for 5 folds but for a dataset of 211? I’ll have uniformly sized subset right? (211/5) Can you give me a suggestion how I could handle that ?
    Thanks in advanced.

    • Avatar
      Jason Brownlee June 2, 2017 at 12:20 pm #

      One way is that some records can be discarded to give even sized groups.

  50. Avatar
    Sebastián May 30, 2017 at 9:35 am #

    Thanks so much for the tutorial. It was really helpful!

  51. Avatar
    Manu June 10, 2017 at 9:00 pm #

    Hello Jason,

    any advice on how to handle multi-classifier problems when the classes have high cardinality ?
    I’m thinking about input data of search engines linked to choosen urls.

    • Avatar
      Jason Brownlee June 11, 2017 at 8:25 am #

      Ouch, consider modeling it as regression instead (e.g. a rating or recommender system).

      • Avatar
        Manuel June 13, 2017 at 1:17 am #

        Ok thank you very much Jason.
        But it wont work with searches unseen by the algorithm.
        I red something in the books “Programming collective intelligence” about a neural net from scratch for this king of problem but I don’t understang how it works for the moments…

        • Avatar
          Jason Brownlee June 13, 2017 at 8:23 am #

          Consider focusing on one measure/metric that really matters in your domain, then try a suite of framings of the problem and different algorithms to get a feeling for what might work best.

  52. Avatar
    Yash June 18, 2017 at 6:21 pm #

    I am not able to understand the above code.So, I request you to explain me the above code

  53. Avatar
    Tathagat June 21, 2017 at 3:20 pm #

    Hey Jason..am a novice in machine learning..have a small question…how can I track the timesteps involved in the algorithm with accordance with the code?

  54. Avatar
    bazooka June 29, 2017 at 6:52 am #

    Hi, Jason. I am so confused, in the result, why there are 4 set of [output,weight,delta]

    like this:
    [{‘output’: 0.9999930495852168, ‘weights’: [0.9315463130784808, 1.0639526745114607, 0.9274685127907779], ‘delta’: -4.508489650980804e-09}, {‘output’: 0.9992087809233077, ‘weights’: [-2.4595353900551125, 5.153506472345162, -0.5778256160239431], ‘delta’: 1.940550145482836e-06}]
    [{‘output’: 0.01193860966265472, ‘weights’: [2.3512725698865053, -8.719060612965613, 1.944330467290268], ‘delta’: -0.0001408287858584854}, {‘output’: 0.988067899681387, ‘weights’: [-2.2568526798573116, 8.720113230271012, -2.0392501730513253], ‘delta’: 0.0001406761850156443}]

    after the backpropagation we find the optimal weights to get minimum error, what does these 4 group means?
    E

    • Avatar
      Jason Brownlee June 29, 2017 at 7:48 am #

      That is the internal state of the whole trained network.

  55. Avatar
    hassan June 29, 2017 at 7:30 am #

    hi Jason
    thanks for your code and good description here, i like it so much.
    i run your example code and encounter with an error same others whom left note here
    the error is:
    expected[row[-1]] = 1
    IndexError: list assignment index out of range

    how i can fix this error?

    • Avatar
      Jason Brownlee June 29, 2017 at 7:49 am #

      The code was written for Python 2.7, confirm that this is your Python version.

      Also confirm that you have copied the code exactly.

  56. Avatar
    Jerome July 5, 2017 at 9:20 pm #

    Dear Jason,

    i have this question about Back Propagate Error

    1- derivative sigmoid = output * (1.0 – output)
    That is ok

    2- error = (expected – output) * transfer_derivative(output)
    Ok but it also means that error == 0 for output = 1 whatever the expected is because transfer_derivative(1) ==0

    So, whatever the expected , error is nil if output is 1 …
    Is there something rotten here?

    Thanks

    Jerome

  57. Avatar
    wddddds July 10, 2017 at 10:01 pm #

    Thank you Jason, It’s a great tutorial and really helpful for me!

    But I have to say that trying to reimplement your code strongly increased my ability of debugging 🙂

  58. Avatar
    Victor July 17, 2017 at 7:50 pm #

    Hi Jason,

    Thanks for sharing your code. I’m a PhD candidate in machine learning, and I have a doubt about the weights update in section 4.1:

    weight = weight + learning_rate * error * input

    Should not it be as follows?

    weight = weight – learning_rate * error * input

    Thanks again for sharing this.

    Regards,
    Victor.

    • Avatar
      Víctor August 4, 2017 at 11:07 pm #

      I didn’t say anything, my mistake in understanding.

      Thanks again for sharing your work.

  59. Avatar
    vishnu priya July 22, 2017 at 4:26 pm #

    Hi..
    Thanks for ur coding. It was too helpful. can u suggest me how to use this code for classifying tamil characters. i have tried in cnn and now i need to compare the result with bpn. can u pls suggest me.

    thank you

  60. Avatar
    vishnu priya July 23, 2017 at 4:06 pm #

    Thank you sir. With this tutorial i have implemented cnn sir. but for BPN i am getting error rate 687.203 sir. i dnt know what to do sir. can u help me sir.

    Thank you

  61. Avatar
    Vishnupriya July 24, 2017 at 4:53 pm #

    Classification of Tamil characters sir. I have 144 different classes. I have taken 7 glcm features of each character and I need to train this features in backpropagation and predict the character to which class it belongs.

  62. Avatar
    codeo July 26, 2017 at 5:37 pm #

    Hi, so I wasn’t following this tutorial when implementing my neural network from scratch, and mine is in JavaScript. I just need help with the theory. How do I calculate the error for each node in the net so that I can incrementally change the weights? Great tutorial btw

    • Avatar
      codeo July 26, 2017 at 6:38 pm #

      Hahaha nevermind, it was my code
      Multidimensional arrays and stuff boggle the mind hah

  63. Avatar
    PRABHAKARAN M July 31, 2017 at 4:31 pm #

    [ 6.38491205 5.333345 4.81565798 5.43552204 9.96445304 2.57268919 4.07671018 1.5258789 6.19728301 0 1 ]
    Dear sir,
    the above mentioned numerical values are extracted from the dental x-ray image using gray level co occurrence matrix [10 inputs and 1 output]. This dataset is used as a input for BPN classifier. whether the same data set as[.csv] file can be used as the input for DEEP Convolutional Neural Network technique ? and can i get the output as image ? for example if i give the dental x ray images as numerical values i have to get the caries affected teeth as the output for the given dataset.

  64. Avatar
    PRABHAKARAN M July 31, 2017 at 4:32 pm #

    can i get the example code for dental caries detection using deep Convolutional Neural Network for the given dataset as x ray images.

    • Avatar
      Jason Brownlee August 1, 2017 at 7:52 am #

      I do not have sample code for this problem, sorry.

  65. Avatar
    John August 1, 2017 at 3:26 am #

    Very nice explanation, thank you.
    I have some questions.

    1) weight = weight + learning_rate * error * input

    Do I really need to multiply it with input ? For example here http://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html they dont it multiply it with input. At least, I think that…

    2) Is your method same as in http://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html?
    i think yes, but again, Im not sure and Im confused by that input multiplication.

    3) What is exactly loss function in your example (I usually found some derivations of loss (cost ?) function (in other explanations), not transfer function derivation)? Im actually very confused by notation which I find around …

    4) momentum and weight decay. In your example, you can implement them that you substract calculated decay and add calculated momentum (to weight update) ? Again, I found forms which substract both and weight update as w + deltaW, so again I’m mega confused by notation for backpropagation which I found…

    Sorry for dumb questions, … math is not my strong side, so many things which can be inferred by math sense are simply hidden for me.

    • Avatar
      John August 1, 2017 at 3:30 am #

      *substract both and weight update as w + deltaW, so again

      I found above sentence as nonsense, must be side effect of my confusion …

      • Avatar
        Jason Brownlee August 1, 2017 at 8:12 am #

        Hang in there.

        Pick one tutorial and focus on it. Jumping from place to place will make things worse for sure.

    • Avatar
      Jason Brownlee August 1, 2017 at 8:10 am #

      Hi John, good questions.

      According to my textbook, yes.
      I can’t speak for random sites on the internet sorry.

      Loss is prediction error. You can change this to other forms like MAE or MSE.

      No decay or momentum in this example. Easy to add if you want. There are many ways to dial in the learning process. No hard and fast rules, just some norms that people reuse.

  66. Avatar
    Parminder Kaur August 6, 2017 at 7:50 pm #

    A VERY GOOD TUTORIAL SIR…
    Sir i am implementing remote sensed image classification using BPN neural network using IDL.
    I am not finding good resources on constructing features for input dataset and also number of hidden layers and number of neurons in hidden layer.
    Any resources you know, can help me?

    Thanks

    • Avatar
      Jason Brownlee August 7, 2017 at 8:41 am #

      The CNN will perform feature extraction automatically, you could explore using different filters on the data to see if it helps the network.

      The number of layers and neurons/filters per layer must be found using trial and error. It is common to copy the designs from other papers as a starting point.

      I hope that helps.

  67. Avatar
    pero August 9, 2017 at 1:11 am #

    Nice tutorial, very clean and readable code. =) thank you!

  68. Avatar
    Vatandas August 15, 2017 at 3:28 am #

    1. I expect that this code is deep learning (many hidden layer) but not. One sentence is easy (“you can add more hidden layer as explained”) but to do is not as easy as you said.

    2. I think your code is wrong.
    neuron[‘delta’] = errors[j] * transfer_derivative(neuron[‘output’])
    but
    Error = Target – ActivatedOutputNode
    Delta = Error * Derivative(NONActivatedOutputNode)

    I mean you use the same ‘output’ variable both error and delta. But in error it must be activated one, in delta it must be NONactivated one.

    • Avatar
      A Researcher May 2, 2019 at 3:10 am #

      Exactly, this article is completely misleading :S

  69. Avatar
    8CG_256 August 18, 2017 at 9:02 am #

    Nice tutorial, very clean code and beginner-friendly. Thank you very much!

    • Avatar
      Jason Brownlee August 18, 2017 at 4:36 pm #

      Thanks, I’m glad you found it useful!

    • Avatar
      8CG_256 August 18, 2017 at 9:26 pm #

      I only have one slight issue: I implemented this in Ruby and I tried to train it using the IRIS dataset, keeping the network simple (1 input layer, 1 hidden layer, 1 output layer) and after decreasing for a while the error rate keeps increasing. I tried lowering the learning rate, even making it dynamic so it decreases whenever the error increases but it doesn’t seem to help. Could you give me some advice? P.S sorry for my bad English

  70. Avatar
    Derek Martins August 22, 2017 at 9:22 pm #

    Hi Jason, I enjoy so much your tutorials. Can you do a tutorial implementing BackPropagation Through Time? Thanks man.

  71. Avatar
    Anubhav Singh August 24, 2017 at 1:08 pm #

    Hello Jason,

    Thank you for the great tutorial!

    I would like to know how I can obtain the weight*input for every single neuron in the network…

    I’ve been trying these lines –

    for layer in network:
    new_inputs = []
    for neuron in layer:
    activation = activate(neuron[‘weights’], inputs)
    neuron[‘output’] = transfer(activation)
    new_inputs.append(neuron[‘output’])

    but the activation variable here is a single value…what I understand is that if I have set n_hidden = 5 (number of hidden layers), I should get N*5 (N = number of features in the dataset) outputs if I print the activation…

    Kindly help 🙂

    Thank you!

  72. Avatar
    Jose Panakkel August 25, 2017 at 10:45 am #

    Dear Jason,

    I have a question on the delta calculation at the output layer, where
    the primary value is the difference between the neuron output and
    the expected output. And we are then multiplying this difference
    with the transfer_derivative. where transfer_derivative is a function
    of neuron’s output.

    My question is, is it correct to find the difference between the
    neuron’s output and the expected output?

    In this case of the example, you have chosen digital outputs [0,1]
    and hence it may not have come up .. but my point is…
    one is already subjected to a transfer function, and one is not.

    The neuron’s output is always subjected to a transfer function and
    hence will be in a specific range, say -.5 to +.5 or something..
    But the expected output is the user’s choice .. isnt it?
    user can have an expected value of say 488.34, for some stock price
    learning.. then is it still correct to find this primary difference
    between the expected output and the neuron output, at the output
    layer delta calculation?

    shoulnt the expected output also be subjected to the same transfer
    function before finding the difference? Or the otherway, like
    shoulnt the neuron ouptut be subjected to a reverse transfer function
    before comparing with the expected output directly?

    Thanks and Regards,
    Jose Panakkel

  73. Avatar
    RealUser404 September 6, 2017 at 1:36 pm #

    Hello Jason, great tutorial that helped me a lot!

    I have a question concerning the back-propagation : what if instead of having an error function I only have a desired gradient for the output (in the case of an actor-critic model for example)?
    How can I change your backprop function to make it work? Or can I just use the gradient as the error?

    • Avatar
      Jason Brownlee September 7, 2017 at 12:49 pm #

      Sorry, I don’t follow, perhaps you can restate your question with an example?

  74. Avatar
    user28 September 8, 2017 at 9:26 pm #

    Hi Jason , thank you for providing this tutorial. I’m confused of how can I implement the same backpropagation algorithm with output not binary. Since I noticed that your example has binary output. Like predicting for stock price given the open, high, low and close values. Regards.

  75. Avatar
    Lewis September 11, 2017 at 2:11 am #

    Hi Jason,

    great article. I have an interest in NN but I am not that good at python.

    Want I wanted to try was to withhold say 5 rows from the dataset and have the trained network predict the results for those rows. these is is different from what I think the example does which is rolling predictions with the learning. Removing 5 rows from the dataset is of course easy but my pitiful attempts at predicting with unseen data like below fail ((I guess network is not in scope at the end): any help appreciated!

    # predict unseen data
    unseendataset = [[12.37,13.47,0.8567,5.204,2.96,3.919,5.001],
    [12.19,13.2,0.8783,5.137,2.981,3.631,4.87],
    [11.23,12.88,0.8511,5.14,2.795,4.325,5.003],
    [13.2,13.66,0.8883,5.236,3.232,8.315,5.056],
    [11.84,13.21,0.8521,5.175,2.836,3.598,5.044],
    [12.3,13.34,0.8684,5.243,2.974,5.637,5.063]]

    for row in unseendataset:
    prediction2 = predict(network, row)
    print(‘Predicted=%d’ % (prediction2))

  76. Avatar
    Karim September 14, 2017 at 1:27 pm #

    Hi Jason, I am trying to generalize your implementation to work with a variable number of layers and nodes. However, whenever I try to increase the number of nodes too much it stops working (the network freezes at one error rate and all output nodes are active, i.e. giving 1). Although the code would work if I decreased the layers and the errors will go down.
    Is there something I am missing when using too many layers? The concepts should be the same.

    I trained a network with 4 layers: [14,10,10,4] and it worked.
    I trained a network with 4 layers [14,100,40,4] and it is stuck. Same dataset.

    My code is here if you are looking in more details:
    https://github.com/KariMagdy/Implementing-a-neural-network

    Thanks

  77. Avatar
    Laksh October 4, 2017 at 11:11 pm #

    Hi, Jason Brownlee,
    can we extend this code for 2 or more hidden layers ?

  78. Avatar
    dsliver33 October 9, 2017 at 1:52 pm #

    Dear Mr. Brownlee,

    I’m trying to alter the code to represent a regression problem (sigmoid on hidden layer, linear on output layer). As far as I know, the main part of the code that would have to be modified is the FF algorithm. I’ve rewritten the code as below:

    With this code, I’m getting an “OverflowError: (34, ‘Result too large’)” error. Could you please tell what I’m doing wrong? All the other parts of the code are as you’ve written.

    • Avatar
      Jason Brownlee October 9, 2017 at 4:47 pm #

      What did you change exactly? Can you highlight the change for me?

      Also, try using pre tags.

      • Avatar
        dsliver33 October 10, 2017 at 4:08 am #

        (I don’t know how to highlight the change, sorry!)

        I got the hidden layer (network[0]), and I applied your algorithm (calculate activation, transfer the activation to the output, append that to a new list called “new_inputs”).

        After that, I get the output layer (network[-1]), I calculate the activation with the “new_inputs”, but I do NOT apply the sigmoid transfer function (so, the outputs should be linear). The results are appended to a new list, which is set to be the return of the function.

        Would that be the best way to remove the sigmoid function from the output layer, making the code a regression, instead of a classification?

    • Avatar
      Liam McGoldrick October 26, 2017 at 5:23 am #

      I am having the same issue with mine. i made alterations and they are just the same as yours. Did you find a solution?

    • Avatar
      Liam McGoldrick October 26, 2017 at 5:27 am #

      I GOT IT TO WORK!!! You have to normalize your output data. Then you can apply the transfer function to the output layer just the same! After that it will work!

      • Avatar
        Steven August 20, 2019 at 8:40 pm #

        But didn’t you changed the function ‘train_network’ ???

      • Avatar
        Urvi Deole March 12, 2021 at 2:21 am #

        Could you please mention the functions you made changes to to get the code to work for regression?

  79. Avatar
    Chris October 12, 2017 at 11:27 am #

    Hi Jason, nice posting and it really helps a lot
    for j in range(len(layer)):
    neuron = layer[j]
    neuron[‘delta’] = errors[j] * transfer_derivative(neuron[‘output’])
    Should the neuron[‘output’] be the output of the activation function instead of the transfer function here?

  80. Avatar
    Asad October 14, 2017 at 3:24 pm #

    hi jason, nice post its really helps alot.

    please tell me how we can change the neuron in hidden layer and in output layer?
    and what will be the result when we change the neuron in hidden layer and in output layer?
    in this tutorial u take one hidden layer,so can we use more than one hidden layer? and how?

    please tell me i m waiting

  81. Avatar
    dsliver33 October 16, 2017 at 2:27 pm #

    Dear Mr. Brownlee,

    I’m trying to adapt the code to support many hidden layers. I’ve adapted the code as below, with a new input called “n_layers”, to insert N hidden layers in the network.

    # Initialize a network with “n_layers” hidden layers
    def initialize_network3(n_inputs, n_hidden, n_layers, n_outputs):
    network = list()
    for i in range(n_layers):
    hidden_layer = [{‘weights’:[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
    network.append(hidden_layer)
    output_layer = [{‘weights’:[random() for i in range(n_hidden)]} for i in range(n_outputs)]
    network.append(output_layer)
    return network

    When I try to run the code, it shows the error below. Do you have any idea why?

    in backward_propagate_error(network, expected)
    78 error = 0.0
    79 for neuron in network[i + 1]:
    —> 80 error += (neuron[‘weights’][j] * neuron[‘delta’])
    81 errors.append(error)
    82 else:

    IndexError: list index out of range

    • Avatar
      dna_remaps February 3, 2018 at 10:43 pm #

      This took me a minute to figure out myself.

      You need to add a conditional after your first layer to make sure your subsequent hidden layer weights have the proper dimensions (n_hidden+1, n_hidden)

      for i in range(n_layers):
      hidden_layer = [{‘weights’:[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
      if i > 0:
      hidden_layer = [[{‘weights’:[random() for i in range(n_hidden + 1)]} for i in range(n_hidden)]
      network.append(hidden_layer)

  82. Avatar
    Arijit Mukherjee October 17, 2017 at 1:40 am #

    Hi,

    In the output/last layer when we are calculating the backprop error why are we multiplying with the transfer derivative with the (expected-output)?? transfer derivative is already canceled out for the the last layer , the update should be only (expected-output)*previous_layer_input , ???
    Thanks

  83. Avatar
    Tanoh Henry October 18, 2017 at 8:54 pm #

    Really good article. Thanks a lot.
    Need a little bit of clarification.
    For backward propagation starting at the output layer,
    you get the error by appending to errors expected[j] – neuron[‘output’].
    Isn’t Error = 0.5 * sum(errors)?
    and then using this sum of errors for back-propagation?
    Thanks.

  84. Avatar
    Liam October 21, 2017 at 5:41 am #

    Thanks for the tutorial! I am trying to modify your code to do a regression model and I am stuck. I have an input data set (4 columns and many rows) and a single variable output data set (in range of tens of thousands). I fed them into the train procedure and I get an error when it reaches “expected = [0 for i in range(n_outputs)]” in the train portion. The error reads “only length-1 arrays can be converted to Python scalar”. Now I understand this is because of the intended purpose for the code was a categorization problem but I am wondering what I would need to modify to get this to work? Any help would go a long way as I have been stuck on this issue for some time now.

    Thanks, and again wonderful tutorial!

  85. Avatar
    Sam October 26, 2017 at 11:15 pm #

    Hi
    I am implementing a 2 layer neural network with 100 hidden units in the first layer and 50
    in the next using your code. Implement sigmoid activation function in each layer. Train/test your
    model on the MNIST dataset subset.
    But it is always giving same prediction.
    [0.99999999986772, 0.99999999994584]
    Expected=0, Got=1
    [0.99999999986772, 0.99999999994584]
    Expected=1, Got=1
    [0.99999999986772, 0.99999999994584]
    Expected=1, Got=1
    [0.99999999986772, 0.99999999994584]
    Expected=1, Got=1
    [0.99999999986772, 0.99999999994584]
    Expected=1, Got=1
    [0.99999999986772, 0.99999999994584]
    Expected=0, Got=1
    [0.99999999986772, 0.99999999994584]
    Expected=0, Got=1
    [0.99999999986772, 0.99999999994584]
    Expected=1, Got=1
    [0.99999999986772, 0.99999999994584]
    Expected=0, Got=1
    [0.99999999986772, 0.99999999994584]
    Expected=0, Got=1
    [0.99999999986772, 0.99999999994584]
    Expected=0, Got=1
    [0.99999999986772, 0.99999999994584]
    Expected=0, Got=1

  86. Avatar
    John October 28, 2017 at 6:52 pm #

    help, I dont know why i got this error.

    Traceback (most recent call last):
    File “a.py”, line 185, in
    for i in range(len(dataset[0])-1):
    TypeError: ‘NoneType’ object has no attribute ‘__getitem__’

    • Avatar
      Jason Brownlee October 29, 2017 at 5:52 am #

      You cannot have the “-1” within the call to len()

  87. Avatar
    João Costa November 12, 2017 at 12:46 pm #

    Hey Jason, thanks for your post!

    This is helping me a lot with a college work. But in this NN, how can I set manually not the number of input neuros, the input values?

    For example, if I have 1 input neuro, I wan’t to set this value to 0.485.

    Best regards!

  88. Avatar
    yesta November 17, 2017 at 8:21 am #

    Hi, Jason
    Thank you for this amazing tutorial!

    I have a question that may be out of the topic. How do you call models or type of DL models where you feed a model with new test data in order to make the model adaptive to the environment?

    Thank you.

    • Avatar
      Jason Brownlee November 17, 2017 at 9:31 am #

      Yes, you can update a model after it has been trained.

  89. Avatar
    Nil December 6, 2017 at 7:47 pm #

    Hi, Dr. Jason,

    I have been studying how to develop a neural network from scratch and this tutorial is the main one I have been following because it is helping me so much.
    I have a doubt: When I study the theory I see the neural network scheme carrying only the weights and bias. And here in practice I see that the network is also carrying the output values and the delta i.e (weights, bias, output and delta). Will the final model be saved like this? with the latter (weights, bias, output and delta)? would this be the rule in practice?

    I would appreciate it if you could help with this issue so that I could get out of where I left off.

    Your posts are really very good there is where I find my way in to learning in Machine Learning.

    Best Regards

    • Avatar
      Jason Brownlee December 7, 2017 at 7:51 am #

      The final model (e.g. trained) only needs to perform the forward pass.

      • Avatar
        Nil December 8, 2017 at 5:50 am #

        Understood.
        Thank you Dr. Jason

  90. Avatar
    MohamedElshazly December 8, 2017 at 9:30 pm #

    Hi , there’s something i don’t understand :

    wouldn’t the last line be out of range because the current ‘ i ‘ is the last one and i can’t go beyond it by 1 ? thanks in advance

    • Avatar
      Jason Brownlee December 9, 2017 at 5:41 am #

      No, because of the “if” check on the 4th line down.

  91. Avatar
    Olu December 9, 2017 at 3:17 am #

    Hi Mr Brownlee,

    Thank you for your tutorial. The training for the example worked however when I try to implement the code for the Wheat Seeds Dataset I get an error from my line 210:

    for i in range(len(dataset[0]) – 1):
    str_column_to_float(dataset, i)

    The error is: IndexError: list index out of range

    Can you please explain why it is (dataset[0])? Does (dataset[0]) means the 1st column in the dataset?

  92. Avatar
    Jonesy December 12, 2017 at 3:17 pm #

    Hello Jason,

    Fantastic stuff here. I had a question about the network configuration. This 2 input, 2 hidden and 2 output seems a bit odd to me. I’m used to seeing 2, 2, 1 for XOR – can you explain why you have two output nodes and how they work with the max function? I think it would better explain this line for me in train():

    expected[row[-1]] = 1

    And lastly, why would one choose this configuration over a 2, 2, 1.

    Thanks!

  93. Avatar
    Mohamed Elshazly December 17, 2017 at 9:34 pm #

    Hi Jason

    In the tain_network function the line “expected[row[-1]] = 1” what i understand is that you take the Y value of every row (which is either 0 or 1 ) and use it as an index in the expected array and you change the value at that index to 1 ,First i don’t know if i understand that correctly in the first place or not but if so, Wouldn’t the modification to the expected array be locked down to just only the first and second index because “expected[row[-1]] = 1” would only be expected[0] or expected[1] ? and how would that help in our algorithm .

    looking forward to your response and thanks for the Great Tutorial

  94. Avatar
    MohamedElshazly December 18, 2017 at 4:55 pm #

    HI again Jason

    If I’m implementing this algorithm in python 3 what should i change in expected[row[-1]]=1 in order for it to work because I’m having this error : list assignment index out of range
    thanks in advance

    • Avatar
      Jason Brownlee December 19, 2017 at 5:15 am #

      I don’t know off the cuff, I will look into porting the example to Py3 in the new year.

  95. Avatar
    Tushar December 19, 2017 at 5:29 pm #

    You are just awesome Jason. You are adding more value to people’s ML skills than most average graduate schools do in the US.

    Thanks a ton!

  96. Avatar
    mark January 6, 2018 at 3:29 am #

    Wow, thanks for your codes. I have a question, what if I want to add regularisation term like L2 during back propagation, what should i do?

    • Avatar
      Jason Brownlee January 6, 2018 at 5:55 am #

      I would recommend moving to a platform like Keras:
      https://machinelearningmastery.com/start-here/#deeplearning

      • Avatar
        mark January 7, 2018 at 5:12 pm #

        Thanks for replying. I know the keras and have been using keras for a while. But in the problem I am focusing on, I need to make changes on the back propagation. That’s why I didn’t use keras.
        So let’s go back to my original question, is the error term the cost function? Thanks.

        • Avatar
          Jason Brownlee January 8, 2018 at 5:42 am #

          Sorry, I cannot work-through adding regularization to this tutorial for you.

  97. Avatar
    Mojo January 20, 2018 at 7:04 pm #

    Hello Jeson,
    Thanks for the informative tutorial. I have a question.
    if i want to change the error equation and as well as the equation between input with hidden and hidden with output layer. How can i change it?
    Hope you will reply in a short time.

    Regards,
    Mojo

  98. Avatar
    Aliya Anil February 16, 2018 at 9:04 pm #

    Hi Jason,

    It was indeed a very informative tutorial. Could you please explain the need for seed(1) in the code?

  99. Avatar
    Raj February 19, 2018 at 7:10 am #

    Hey there,
    Been following your tutorial and I’m having problems with using my dataset with it. The outputs of the hidden neurons appear to only be exactly 1 constantly. I’m not sure what’s wrong exactly or how to fix it but its resulting in the network not learning at all. Please let me know if you can help.
    Thanks,
    Raj

    • Avatar
      Jason Brownlee February 19, 2018 at 9:10 am #

      Perhaps try to get the code and data in the tutorial working first and use that as a starting point for your own problem.

      Generally, I would recommend using a library like Keras for your own projects and only code methods from scratch as a learning exercise.

  100. Avatar
    Aliya Anil February 20, 2018 at 4:37 pm #

    Hi,

    I tried the first code in the tutorial with 4-parameter dataset, but it is not predicting like the 2-parameter set. Could you explain the reason?

    Thanks,
    Aliya

  101. Avatar
    Nik March 2, 2018 at 1:54 pm #

    Dear Jason,

    Can I use the codes for handwritten digits recognition? If yes, are there any special recommendations what to change in the codes or I can use them with no changes?

    Thanks,
    Nik

  102. Avatar
    Filoingko March 4, 2018 at 2:58 am #

    Hi,

    How can I use this trained network to predict another data set.

    Thank you.

    • Avatar
      Jason Brownlee March 4, 2018 at 6:04 am #

      The code in this tutorial is to teach you about backprop, not for use on real problems. If you are working through a problem, I’d recommend using Keras.

  103. Avatar
    Jean-Michel Richer March 5, 2018 at 9:27 pm #

    Dear Jason,
    I have tried to use your code on a simple XOR example but get a result of [0, 0, 1, 1] instead of [0,1,1,0]
    Scores: [0.0]
    Mean Accuracy: 0.000%

    The input xor.csv file is
    0,0,0
    0,1,1
    1,0,1
    1,1,0

    For this I have modified the evaluate_algorithm function to:
    def evaluate_algorithm_no_fold(dataset, algorithm, *args):
    scores = list()
    predicted = algorithm(dataset, dataset, *args)
    print(predicted)
    accuracy = accuracy_metric(dataset, predicted)
    scores.append(accuracy)
    return scores

    and call the function like this:
    scores = evaluate_algorithm_no_fold(dataset, back_propagation, 0.1, 500, 4)

    Would you have some explanation because I can not figure out why it is not working ?
    Best regards,
    JM

    • Avatar
      Jason Brownlee March 6, 2018 at 6:12 am #

      Perhaps the model requires tuning to your new dataset.

  104. Avatar
    Tanveer March 5, 2018 at 9:29 pm #

    Thank You So Much Jason !! Wonderful Tutorial. THANKS Much !!

  105. Avatar
    Mojo March 9, 2018 at 10:06 pm #

    If i want to calculate the training accuracy and F-measure and want to change the activation function, how i can do it?

    • Avatar
      Jason Brownlee March 10, 2018 at 6:28 am #

      Perhaps you would be better off using scikit-learn and Keras instead.

  106. Avatar
    Fahad March 12, 2018 at 8:03 pm #

    Is there something wrong with this code in case of using MINIST data? I tried to change the structured of the data to be compatible with the code, but it gave me a huge error and the error did not decrees during all training steps

  107. Avatar
    Fahad March 13, 2018 at 4:06 pm #

    Thanks Jason for your response. I want to apply the code without keras. I tried to change the structure of the data to be each row as a vector of 784 pixel followed by a class label, but as I said it gave a huge error and does not decrees at all.

    I am trying to develop some algorithm for enhancing of learning, hence, I need to deal with the procedure as step by step. So keras or any other library does not help.

    Thanks again Jason

    • Avatar
      Jason Brownlee March 14, 2018 at 6:17 am #

      Perhaps update the code to use numpy, it will be much faster.

  108. Avatar
    kelvin March 15, 2018 at 2:39 am #

    Hi Mr Brownlee,

    Can you teach me how to plot the errors per epochs (validation error) and accuracy for both training and validation in your scratch network?

    • Avatar
      kelvin March 15, 2018 at 2:44 am #

      I only can find the training error but not validation error in the code. For the accuracy, I plot a graph have a straight line only.

      • Avatar
        kelvin March 15, 2018 at 12:19 pm #

        Is there any possible way to do it on your scratch network? for example which part of the code save the training error, validation error, training accuracy and validation accuracy? So I can plot the graph myself since your scratch model does not have “model” for me to save the history.

        • Avatar
          Jason Brownlee March 15, 2018 at 2:50 pm #

          Yes, perhaps change it from CV to a single/train test, then evaluate the model skill on each dataset at the end of each epoch. Save the results in a list and return the lists.

      • Avatar
        Zahra May 6, 2019 at 9:37 am #

        Hello, I’m so confuse..
        I try to run this code in command prompt. But, I use my dataset (not Wheat Seeds dataset).

        And why this happened? What’s wrong? What should I do? What should I change?
        Please, help me! ????????????????

        Traceback (most recent call last):
        File “journal.py”, line 197, in
        scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
        File “journal.py”, line 81, in evaluate_algorithm
        predicted = algorithm(train_set, test_set, *args)
        File “journal.py”, line 173, in back_propagation
        train_network(network, train, l_rate, n_epoch, n_outputs)
        File “journal.py”, line 150, in train_network
        expected[row[-1]] = 1
        IndexError: list assignment index out of range

  109. Avatar
    Jack March 15, 2018 at 12:11 pm #

    Can I use this model for regression problem? For example us this model for boston house-prices dataset?

    • Avatar
      Jason Brownlee March 15, 2018 at 2:50 pm #

      Sure, some changes would be required, such as the activation in the output layer would need to be linear.

  110. Avatar
    Nabil March 15, 2018 at 3:49 pm #

    Are you using MSE?

    • Avatar
      Jason Brownlee March 16, 2018 at 6:09 am #

      As mentioned in the post, we are reporting accuracy for the classification problem.

  111. Avatar
    Olu March 19, 2018 at 11:23 pm #

    In the train section,

    Can you please explain how this expected[row[-1]] = 1 knows where to insert the 1 in the arrays of zero created.

    • Avatar
      Jason Brownlee March 20, 2018 at 6:23 am #

      Good question.

      expected is all zeros. row[-1] is the index of the class value. therefore we set the index of the class value in expected to 1.

      Perhaps it is worth reading up on array indexing:
      https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-python/

      • Avatar
        kmillen November 16, 2018 at 11:27 am #

        Jason,
        This is an amazing piece of code that has been very beneficial.

        Doesn’t that mean that only expected[0] and expected[1] will ever be set to 1 for this test data?

        Thank you,

        • Avatar
          Jason Brownlee November 16, 2018 at 1:58 pm #

          Sorry, I don’t understand your question?

          • Avatar
            kmillen November 17, 2018 at 1:44 am #

            If I understand Python (which I may not), row[-1] represents the last item in the row. Since the last value in each of the 10 rows is only either 0 or 1, expected[row[-1]] = 1 will only ever set expected[0] or expected[1] to the value of 1. Or, what am I missing?

          • Avatar
            Jason Brownlee November 17, 2018 at 5:50 am #

            Are you referring to this: expected[row[-1]] = 1

            If so:

            “expected” is all zeros, e.g. [0, 0]
            “row” is an example, e.g. […] where the value at -1 is either 0 or 1

            Therefore row[-1] is an index of either 0 or 1 and we are marking the value in expected at that index as 1.

            We have created a one hot vector.

        • Avatar
          kmillen November 17, 2018 at 2:54 am #

          Disregard my previous question; I found the answer in a previous reply. Thank you again for this example.

  112. Avatar
    kelvin March 21, 2018 at 2:14 am #

    Hi, I would like to use softmax as the activation function for output layer. However, I do not know how to write the code for the derivative of softmax. Can you show me the code how to change the sigmoid function from your code to softmax?

    • Avatar
      kelvin March 21, 2018 at 2:20 am #

      I do try few ways to change the sigmoid to softmax, however, all of them are not working. Can you show me how to create a softmax layer?

      for transfer():
      first case:
      def transfer(input_value):
      exp_scores = np.exp(input_value)
      return exp_scores / np.sum(exp_scores, axis=1, keepdims=True)

      second case:
      def transfer(input_value):
      input_value -= np.max(input_value)
      return np.exp(input_value) / np.sum(np.exp(input_value))

      third case:
      def transfer(input_value):
      input_value -= np.max(input_value)
      result = (np.exp(input_value).T / np.sum(np.exp(input_value))).T
      return result

      for transfer_derivative():
      first case:
      def transfer_derivative(output):
      s = output.reshape(-1, 1)
      return np.diagflat(s) – np.dot(s, s.T)

      second case:
      def transfer_derivative(output):
      jacobian_m = np.diag(output)
      for i in range(len(jacobian_m)):
      for j in range(len(jacobian_m)):
      if i == j:
      jacobian_m[i][j] = output[i] * (1 – output[i])
      else:
      jacobian_m[i][j] = -output[i] * output[j]
      return jacobian_m

    • Avatar
      Jason Brownlee March 21, 2018 at 6:38 am #

      Perhaps use Keras instead?

  113. Avatar
    Suede March 29, 2018 at 6:40 am #

    hey Jason, this is very helpful. I have run the code but i keep on getting this error, can you please help me out? the error is:

    NameError Traceback (most recent call last)
    in ()
    186 str_column_to_float(dataset, i)
    187 # convert class column to integers
    –> 188 str_columnto_int(dataset, len(dataset[0])-1)
    189 # normalize input variables
    190 minmax = dataset_minmax(dataset)

    NameError: name ‘str_columnto_int’ is not defined

    • Avatar
      Jason Brownlee March 29, 2018 at 6:42 am #

      The code was written for Python 2.7, confirm that you are using this version of Python?

  114. Avatar
    Fahri Güreşçi April 15, 2018 at 7:03 am #

    The csv file is not working. Edited csv file > bit.ly/2GYX2dF
    you can use python 2 or 3
    results:
    python2 > Mean Accuracy: 95.238%
    python3 > Mean Accuracy: 93.333%

    Why different?

  115. Avatar
    Fahad April 18, 2018 at 8:58 pm #

    I have altered the code to work with MNIST (digit numbers) , the problem I have faced that forward_propagate function returns [1 ,1 ,1 ,1 ,1 ,1 ,1 ,1 ,1 ,1 ] for each instance !

    Any help

    • Avatar
      Jason Brownlee April 19, 2018 at 6:30 am #

      Well done.

      The model will require tuning for the problem.

  116. Avatar
    Fahad April 19, 2018 at 7:09 am #

    Could you explain more in some details please.

  117. Avatar
    Fahad April 19, 2018 at 8:17 pm #

    As I mentioned that forward_propagation function returns [1,1,1,1,1,1,1,1,1,1], what is the possible alter to come over this problem

  118. Avatar
    Fahad April 23, 2018 at 5:34 am #

    I altered the code to work with XOR problem and it was working perfectly. Then, I altered the code to work with digit numbers MNIST, but as I told you there was a problem with the the forward_propagation function that it returned all outputs to be [1,1,1,…] instead of a probabilities for each output.
    I think it is not an optimization problem, there is something wrong with the forward_propagate function.

    Here is the code after alteration [it is working but with a fixed error during all training epochs

    from random import seed
    from random import randrange
    from random import random
    from csv import reader
    from math import exp
    global network
    global gl_errors

    # Load a CSV file
    def load_csv(filename):
    dataset = list()
    with open(filename, ‘r’) as file:
    csv_reader = reader(file)
    for row in csv_reader:
    if not row:
    continue
    dataset.append(row)
    return dataset

    # Convert string column to float
    def str_column_to_float(dataset, column):
    for row in dataset:
    row[column] = float(row[column].strip())

    def str_column_to_intX(dataset, column):
    for row in dataset:
    row[column] = int(row[column].strip())

    # Convert string column to integer
    def str_column_to_int(dataset, column):
    class_values = [row[column] for row in dataset]
    unique = set(class_values)
    lookup = dict()
    for i, value in enumerate(unique):
    lookup[value] = i
    for row in dataset:
    row[column] = lookup[row[column]]
    return lookup

    # Find the min and max values for each column
    def dataset_minmax(dataset):
    minmax = list()
    stats = [[min(column), max(column)] for column in zip(*dataset)]
    return stats

    # Rescale dataset columns to the range 0-1
    def normalize_dataset(dataset):
    for row in dataset:
    for i in range(1,len(row)):
    # row[i] = (row[i] – minmax[i][0]) / (minmax[i][1] – minmax[i][0])
    if row[i]>10:
    row[i]=1
    else:
    row[i]=0

    # Split a dataset into k folds
    def cross_validation_split(dataset, n_folds):
    dataset_split = list()
    dataset_copy = list(dataset)
    fold_size = int(len(dataset) / n_folds)
    for i in range(n_folds):
    fold = list()
    while len(fold) epoch=%d, lrate=%.3f, error=%.3f’ % (epoch, l_rate, sum_error))

    # Calculate neuron activation for an input
    def activate(weights, inputs):

    activation = weights[-1]
    for i in range(len(weights)-1):
    activation += weights[i] * inputs[i]
    return activation

    # Transfer neuron activation
    def transfer(activation):
    return 1.0 / (1.0 + exp(-activation))

    # Forward propagate input to a network output
    def forward_propagate(network, row):
    inputs = row[1:]
    i=0

    for layer in network:
    new_inputs = []
    i+=1
    for neuron in layer:
    activation = activate(neuron[‘weights’], inputs)
    neuron[‘output’] = transfer(activation)
    new_inputs.append(neuron[‘output’])
    inputs = new_inputs

    return inputs

    # Calculate the derivative of an neuron output
    def transfer_derivative(output):
    return output * (1.0 – output)

    # Backpropagate error and store in neurons
    def backward_propagate_error(network, expected):
    # err =0
    for i in reversed(range(len(network))):
    layer = network[i]
    errors = list()
    if i != len(network)-1:
    for j in range(len(layer)):
    error = 0.0
    for neuron in network[i + 1]:
    error += (neuron[‘weights’][j] * neuron[‘delta’])
    errors.append(error)
    else:
    for j in range(len(layer)):
    neuron = layer[j]
    errors.append(expected[j] – neuron[‘output’])

    for j in range(len(layer)):
    neuron = layer[j]
    neuron[‘delta’] = errors[j] * transfer_derivative(neuron[‘output’])

    # Update network weights with error
    def update_weights(network, row, l_rate):

    for i in range(len(network)):
    inputs = row[1:]
    if i != 0:
    inputs = [neuron[‘output’] for neuron in network[i – 1]]
    for neuron in network[i]:
    for j in range(len(inputs)):
    neuron[‘weights’][j] += l_rate * neuron[‘delta’] * inputs[j]
    neuron[‘weights’][-1] += l_rate * neuron[‘delta’]

    # Initialize a network
    def initialize_network(n_inputs, n_hidden, n_outputs):
    global network
    network = list()
    hidden_layer1 = [{‘weights’:[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
    network.append(hidden_layer1)

    hidden_layer2 = [{‘weights’:[random() for i in range(n_hidden + 1)]} for i in range(100)]
    network.append(hidden_layer2)

    hidden_layer3 = [{‘weights’:[random() for i in range(100 + 1)]} for i in range(50)]
    network.append(hidden_layer3)

    output_layer = [{‘weights’:[random() for i in range(50 + 1)]} for i in range(n_outputs)]
    network.append(output_layer)
    return network

    # Make a prediction with a network
    def predict(network, row):
    outputs = forward_propagate(network, row)
    return outputs.index(max(outputs))

    # Test Backprop on Seeds dataset
    seed(1)
    # load and prepare data
    filename = ‘dataset/train2.csv’
    dataset = load_csv(filename)

    for i in range(1,len(dataset[0])):
    str_column_to_float(dataset, i)

    # convert class column to integers
    str_column_to_int(dataset, 0)

    normalize_dataset(dataset)

    # evaluate algorithm
    n_folds = 5
    l_rate = 0.5
    n_epoch = 100
    n_hidden = 500

    scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
    print(‘Scores: %s’ % scores)
    print(‘Mean Accuracy: %.3f%%’ % (sum(scores)/float(len(scores))))

    • Avatar
      Jason Brownlee April 23, 2018 at 6:26 am #

      Sorry, I don’t have the capacity to debug the modified code for you.

  119. Avatar
    Rahmad ars April 26, 2018 at 3:23 am #

    Sir, can you help me?
    this is my question..

    https://stackoverflow.com/questions/50027886/need-help-for-check-my-backprop-ann-using-python

    • Avatar
      Jason Brownlee April 26, 2018 at 6:37 am #

      Perhaps you can summarize your question in a sentence or two?

  120. Avatar
    Rahmad ars April 26, 2018 at 7:19 am #

    your original code sir, only have 1 hidden layer with 2 neurons. then, I modify it, so the ANN have 3 hidden layers, each consist of (128, 64, 32). and i have my own dataset, so i change it (the dataset and input neurons). when i run this code, everything looks fine but the error value is not changing…

    here’s the screen: https://i.stack.imgur.com/NQbNd.png

    modified code: https://stackoverflow.com/questions/50027886/need-help-for-check-my-backprop-ann-using-python

    thanks sir

  121. Avatar
    Fahad April 28, 2018 at 9:20 pm #

    I have the same problem of Rahmad

    The same problem occurs when you change the original code from 5 neurons in the hidden layer to 31 neuron ( the error value does not change).

    I know 31 hidden neuron is not a right number of neurons for seed data set. But I would like to know what is the wrong when you increase the number of neurons.

    Logically, it should be fine and the error value decreases. But when you change the number of neurons to 30 it is still working , when change it to 31 neurons it does not decrease !

    I think if this problem is fixed, then the problem of Rahmad will be fixed too.

  122. Avatar
    Rocha May 2, 2018 at 9:58 pm #

    Hi dude, I’m stuck in this error, could you help me?

    # Forward propagate input to a network output
    def forward_propagate(network, row):
    inputs = row
    for layer in network:
    new_inputs = []
    for neuron in layer:
    —>>> activation = activate(neuron[‘weights’], inputs)
    neuron[‘output’] = transfer(activation)
    new_inputs.append(neuron[‘output’])
    inputs = new_inputs
    return inputs

    That line is giving me this: TypeError: list indices must be integers or slices, not str

    Should be the python version? I’m using python 3…

  123. Avatar
    Kamrun Nahar Nisha May 8, 2018 at 4:00 pm #

    hello.please help me.
    I want to use breast cancer dataset instead of seed dataset.

    seed(1)
    # load and prepare data
    filename = ‘seeds_dataset.csv’
    dataset = load_csv(filename)
    for i in range(len(dataset[0])-1):
    str_column_to_float(dataset, i)
    # convert class column to integers
    str_column_to_int(dataset, len(dataset[0])-1)
    # normalize input variables
    minmax = dataset_minmax(dataset)
    normalize_dataset(dataset, minmax)
    # evaluate algorithm
    n_folds = 5
    l_rate = 0.3
    n_epoch = 500
    n_hidden = 5
    scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
    print(‘Scores: %s’ % scores)
    print(‘Mean Accuracy: %.3f%%’ % (sum(scores)/float(len(scores))))

    In this part of your code I also want to print the error and it will be like

    epoch=0, lrate=0.500, error=6.350
    >epoch=1, lrate=0.500, error=5.531
    >epoch=2, lrate=0.500, error=5.221
    >epoch=3, lrate=0.500, error=4.951
    >epoch=4, lrate=0.500, error=4.519
    >epoch=5, lrate=0.500, error=4.173
    >epoch=6, lrate=0.500, error=3.835
    >epoch=7, lrate=0.500, error=3.506
    >epoch=8, lrate=0.500, error=3.192
    >epoch=9, lrate=0.500, error=2.898
    >epoch=10, lrate=0.500, error=2.626
    >epoch=11, lrate=0.500, error=2.377
    >epoch=12, lrate=0.500, error=2.153
    >epoch=13, lrate=0.500, error=1.953
    >epoch=14, lrate=0.500, error=1.774
    >epoch=15, lrate=0.500, error=1.614
    >epoch=16, lrate=0.500, error=1.472
    >epoch=17, lrate=0.500, error=1.346
    >epoch=18, lrate=0.500, error=1.233
    >epoch=19, lrate=0.500, error=1.132
    [{‘output’: 0.029980305604426185, ‘weights’: [-1.4688375095432327, 1.850887325439514, 1.0858178629550297], ‘delta’: -0.0059546604162323625}, {‘output’: 0.9456229000211323, ‘weights’: [0.37711098142462157, -0.0625909894552989, 0.2765123702642716], ‘delta’: 0.0026279652850863837}]
    [{‘output’: 0.23648794202357587, ‘weights’: [2.515394649397849, -0.3391927502445985, -0.9671565426390275], ‘delta’: -0.04270059278364587}, {‘output’: 0.7790535202438367, ‘weights’: [-2.5584149848484263, 1.0036422106209202, 0.42383086467582715], ‘delta’: 0.03803132596437354}]

    please tell me the code . Using breast cancer dataset not wheat seed dataset. I am not so good in coding that’s why I need your help immediately.

    • Avatar
      Jason Brownlee May 9, 2018 at 6:09 am #

      I’m eager to help, but I do not have the capacity to outline the changes or write the code for you.

  124. Avatar
    Akefar May 11, 2018 at 4:41 am #

    hi Jason,
    I tried your code in my data set ,shape of my data is (576,16) .the problem is

    IndexError: list assignment index out of range

    is there any need to change your code for (576,16) data shape .
    Thanks

    —————————————————————————
    IndexError Traceback (most recent call last)
    in ()
    195 n_epoch = 500
    196 n_hidden = 1
    –> 197 scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
    198 print(‘Scores: %s’ % scores)
    199 print(‘Mean Accuracy: %.3f%%’ % (sum(scores)/float(len(scores))))

    in evaluate_algorithm(dataset, algorithm, n_folds, *args)
    79 test_set.append(row_copy)
    80 row_copy[-1] = None
    —> 81 predicted = algorithm(train_set, test_set, *args)
    82 actual = [row[-1] for row in fold]
    83 accuracy = accuracy_metric(actual, predicted)

    in back_propagation(train, test, l_rate, n_epoch, n_hidden)
    171 n_outputs = len(set([row[-1] for row in train]))
    172 network = initialize_network(n_inputs, n_hidden, n_outputs)
    –> 173 train_network(network, train, l_rate, n_epoch, n_outputs)
    174 predictions = list()
    175 for row in test:

    in train_network(network, train, l_rate, n_epoch, n_outputs)
    148 outputs = forward_propagate(network, row)
    149 expected = [0 for i in range(n_outputs)]
    –> 150 expected[row[-1]] = 1
    151 backward_propagate_error(network, expected)
    152 update_weights(network, row, l_rate)

    IndexError: list assignment index out of range

    • Avatar
      Jason Brownlee May 11, 2018 at 6:39 am #

      You may need to change your data to match the model or the model to match the data.

  125. Avatar
    Pradeep May 14, 2018 at 3:31 am #

    Hi Jason, I tried your code on the same sample dataset, i am getting the following type error in the function activate. I am doing it in python3.6. hope to hear from you soon

    Traceback (most recent call last):
    File “neural_network.py”, line 94, in
    train_network(network, dataset, 0.5, 20, n_outputs)
    File “neural_network.py”, line 76, in train_network
    outputs = forward_propagate(network, row)
    File “neural_network.py”, line 31, in forward_propagate
    activation = activate(neuron[‘weights’], inputs)
    File “neural_network.py”, line 18, in activate
    activation += weights[i] * inputs[i]
    TypeError: can’t multiply sequence by non-int of type ‘float’

  126. Avatar
    Deepak D May 17, 2018 at 6:49 pm #

    Hi Jason Brownlee,

    I tried your code and experienced the some error applying the Backpropagation algorithm to the wheat seeds dataset. I am using python 2.7.

    Error type:

    File “C:\Python27\programs\backpropagation.py”, line 186,
    in str_column_to_float(dataset, i)

    File “C:\Python27\programs\backpropagation.py”, line 22,
    in str_column_to_float row[column] = float(row[column].strip())
    ValueError: could not convert string to float:

  127. Avatar
    Dhanya Hegde May 19, 2018 at 1:53 am #

    Hey Jason! Great work. Really helpful. I didn’t understand one part of your code. On what basis does predict function return the predicted value as 0 or 1, after taking the maximum of the two output neuron values?

    • Avatar
      Jason Brownlee May 19, 2018 at 7:44 am #

      The summation of the activation is passed through a sigmoid transfer function.

      • Avatar
        Dhanya Hegde May 21, 2018 at 3:37 am #

        I didn’t understand this part of the code

        outputs.index(max(outputs)

        Is one hot encoding used or binary classification?
        If so, how is the actual mapping done?
        And when is the iteration process stopped?

        • Avatar
          Jason Brownlee May 21, 2018 at 6:35 am #

          As stated in the text above the code, it returns an integer for the class with the largest probability.

  128. Avatar
    Ionut May 27, 2018 at 12:05 am #

    Hi,

    I’m a beginner in neural networks and I don’t understand the dataset from the section “4.2. Train Network”. Can anyone explain me what x1, x2 and y means?

  129. Avatar
    Rishik Mani May 28, 2018 at 7:26 am #

    Hi Jason, thank you for the highly informative post. But could you please clarify me upon this petty little issue.

    In section 4.2 Train network, you considered n_inputs = len(dataset[0]) – 1. Why did you put a -1 here, while the number of the inputs should exactly be of the length of the dataset.

    • Avatar
      Jason Brownlee May 28, 2018 at 2:32 pm #

      To exclude the output variable from the number of inputs.

  130. Avatar
    Samih Eisa June 2, 2018 at 9:13 pm #

    Thank you, jason.

  131. Avatar
    Kie Woo Nam June 5, 2018 at 3:01 am #

    Hi,

    I guess I’m likely mistaken, so please but when i != 0, isn’t the last line updating the last weight for the second time?

    So, shouldn’t it be “inputs = [neuron[‘output’] for neuron in network[i – 1]][:-1]” (add “[:-1]]” at the end)?

    If I’m wrong, I’ll read the code again more carefully, so please let me know.

    • Avatar
      Jason Brownlee June 5, 2018 at 6:46 am #

      No. There are more weights than inputs and the -1 index of the weights is the bias.

      • Avatar
        Kie Woo Nam June 5, 2018 at 7:06 pm #

        Ah, right. Now I see it from “output_layer = [{‘weights’:[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]”.

        Thank you for your quick reply.

  132. Avatar
    Thomas Specht July 10, 2018 at 4:46 am #

    Hi Jason,

    Great tutorial to get into ML coding. I have one question:

    What library would you recommend for projects and why? I want to use NN for regression problems.

    • Avatar
      Jason Brownlee July 10, 2018 at 6:53 am #

      I recommend Keras because it is computationally efficient, fast for development and fun:
      https://mac