How to Code a Neural Network with Backpropagation In Python (from scratch)

The backpropagation algorithm is used in the classical feed-forward artificial neural network.

It is the technique still used to train large deep learning networks.

In this tutorial, you will discover how to implement the backpropagation algorithm for a neural network from scratch with Python.

After completing this tutorial, you will know:

  • How to forward-propagate an input to calculate an output.
  • How to back-propagate error and train a network.
  • How to apply the backpropagation algorithm to a real-world predictive modeling problem.

Let’s get started.

How to Implement the Backpropagation Algorithm From Scratch In Python

How to Implement the Backpropagation Algorithm From Scratch In Python
Photo by NICHD, some rights reserved.


This section provides a brief introduction to the Backpropagation Algorithm and the Wheat Seeds dataset that we will be using in this tutorial.

Backpropagation Algorithm

The Backpropagation algorithm is a supervised learning method for multilayer feed-forward networks from the field of Artificial Neural Networks.

Feed-forward neural networks are inspired by the information processing of one or more neural cells, called a neuron. A neuron accepts input signals via its dendrites, which pass the electrical signal down to the cell body. The axon carries the signal out to synapses, which are the connections of a cell’s axon to other cell’s dendrites.

The principle of the backpropagation approach is to model a given function by modifying internal weightings of input signals to produce an expected output signal. The system is trained using a supervised learning method, where the error between the system’s output and a known expected output is presented to the system and used to modify its internal state.

Technically, the backpropagation algorithm is a method for training the weights in a multilayer feed-forward neural network. As such, it requires a network structure to be defined of one or more layers where one layer is fully connected to the next layer. A standard network structure is one input layer, one hidden layer, and one output layer.

Backpropagation can be used for both classification and regression problems, but we will focus on classification in this tutorial.

In classification problems, best results are achieved when the network has one neuron in the output layer for each class value. For example, a 2-class or binary classification problem with the class values of A and B. These expected outputs would have to be transformed into binary vectors with one column for each class value. Such as [1, 0] and [0, 1] for A and B respectively. This is called a one hot encoding.

Wheat Seeds Dataset

The seeds dataset involves the prediction of species given measurements seeds from different varieties of wheat.

There are 201 records and 7 numerical input variables. It is a classification problem with 3 output classes. The scale for each numeric input value vary, so some data normalization may be required for use with algorithms that weight inputs like the backpropagation algorithm.

Below is a sample of the first 5 rows of the dataset.

Using the Zero Rule algorithm that predicts the most common class value, the baseline accuracy for the problem is 28.095%.

You can learn more and download the seeds dataset from the UCI Machine Learning Repository.

Download the seeds dataset and place it into your current working directory with the filename seeds_dataset.csv.

The dataset is in tab-separated format, so you must convert it to CSV using a text editor or a spreadsheet program.

Update, download the dataset in CSV format directly:


This tutorial is broken down into 6 parts:

  1. Initialize Network.
  2. Forward Propagate.
  3. Back Propagate Error.
  4. Train Network.
  5. Predict.
  6. Seeds Dataset Case Study.

These steps will provide the foundation that you need to implement the backpropagation algorithm from scratch and apply it to your own predictive modeling problems.

1. Initialize Network

Let’s start with something easy, the creation of a new network ready for training.

Each neuron has a set of weights that need to be maintained. One weight for each input connection and an additional weight for the bias. We will need to store additional properties for a neuron during training, therefore we will use a dictionary to represent each neuron and store properties by names such as ‘weights‘ for the weights.

A network is organized into layers. The input layer is really just a row from our training dataset. The first real layer is the hidden layer. This is followed by the output layer that has one neuron for each class value.

We will organize layers as arrays of dictionaries and treat the whole network as an array of layers.

It is good practice to initialize the network weights to small random numbers. In this case, will we use random numbers in the range of 0 to 1.

Below is a function named initialize_network() that creates a new neural network ready for training. It accepts three parameters, the number of inputs, the number of neurons to have in the hidden layer and the number of outputs.

You can see that for the hidden layer we create n_hidden neurons and each neuron in the hidden layer has n_inputs + 1 weights, one for each input column in a dataset and an additional one for the bias.

You can also see that the output layer that connects to the hidden layer has n_outputs neurons, each with n_hidden + 1 weights. This means that each neuron in the output layer connects to (has a weight for) each neuron in the hidden layer.

Let’s test out this function. Below is a complete example that creates a small network.

Running the example, you can see that the code prints out each layer one by one. You can see the hidden layer has one neuron with 2 input weights plus the bias. The output layer has 2 neurons, each with 1 weight plus the bias.

Now that we know how to create and initialize a network, let’s see how we can use it to calculate an output.

2. Forward Propagate

We can calculate an output from a neural network by propagating an input signal through each layer until the output layer outputs its values.

We call this forward-propagation.

It is the technique we will need to generate predictions during training that will need to be corrected, and it is the method we will need after the network is trained to make predictions on new data.

We can break forward propagation down into three parts:

  1. Neuron Activation.
  2. Neuron Transfer.
  3. Forward Propagation.

2.1. Neuron Activation

The first step is to calculate the activation of one neuron given an input.

The input could be a row from our training dataset, as in the case of the hidden layer. It may also be the outputs from each neuron in the hidden layer, in the case of the output layer.

Neuron activation is calculated as the weighted sum of the inputs. Much like linear regression.

Where weight is a network weight, input is an input, i is the index of a weight or an input and bias is a special weight that has no input to multiply with (or you can think of the input as always being 1.0).

Below is an implementation of this in a function named activate(). You can see that the function assumes that the bias is the last weight in the list of weights. This helps here and later to make the code easier to read.

Now, let’s see how to use the neuron activation.

2.2. Neuron Transfer

Once a neuron is activated, we need to transfer the activation to see what the neuron output actually is.

Different transfer functions can be used. It is traditional to use the sigmoid activation function, but you can also use the tanh (hyperbolic tangent) function to transfer outputs. More recently, the rectifier transfer function has been popular with large deep learning networks.

The sigmoid activation function looks like an S shape, it’s also called the logistic function. It can take any input value and produce a number between 0 and 1 on an S-curve. It is also a function of which we can easily calculate the derivative (slope) that we will need later when backpropagating error.

We can transfer an activation function using the sigmoid function as follows:

Where e is the base of the natural logarithms (Euler’s number).

Below is a function named transfer() that implements the sigmoid equation.

Now that we have the pieces, let’s see how they are used.

2.3. Forward Propagation

Forward propagating an input is straightforward.

We work through each layer of our network calculating the outputs for each neuron. All of the outputs from one layer become inputs to the neurons on the next layer.

Below is a function named forward_propagate() that implements the forward propagation for a row of data from our dataset with our neural network.

You can see that a neuron’s output value is stored in the neuron with the name ‘output‘. You can also see that we collect the outputs for a layer in an array named new_inputs that becomes the array inputs and is used as inputs for the following layer.

The function returns the outputs from the last layer also called the output layer.

Let’s put all of these pieces together and test out the forward propagation of our network.

We define our network inline with one hidden neuron that expects 2 input values and an output layer with two neurons.

Running the example propagates the input pattern [1, 0] and produces an output value that is printed. Because the output layer has two neurons, we get a list of two numbers as output.

The actual output values are just nonsense for now, but next, we will start to learn how to make the weights in the neurons more useful.

3. Back Propagate Error

The backpropagation algorithm is named for the way in which weights are trained.

Error is calculated between the expected outputs and the outputs forward propagated from the network. These errors are then propagated backward through the network from the output layer to the hidden layer, assigning blame for the error and updating weights as they go.

The math for backpropagating error is rooted in calculus, but we will remain high level in this section and focus on what is calculated and how rather than why the calculations take this particular form.

This part is broken down into two sections.

  1. Transfer Derivative.
  2. Error Backpropagation.

3.1. Transfer Derivative

Given an output value from a neuron, we need to calculate it’s slope.

We are using the sigmoid transfer function, the derivative of which can be calculated as follows:

Below is a function named transfer_derivative() that implements this equation.

Now, let’s see how this can be used.

3.2. Error Backpropagation

The first step is to calculate the error for each output neuron, this will give us our error signal (input) to propagate backwards through the network.

The error for a given neuron can be calculated as follows:

Where expected is the expected output value for the neuron, output is the output value for the neuron and transfer_derivative() calculates the slope of the neuron’s output value, as shown above.

This error calculation is used for neurons in the output layer. The expected value is the class value itself. In the hidden layer, things are a little more complicated.

The error signal for a neuron in the hidden layer is calculated as the weighted error of each neuron in the output layer. Think of the error traveling back along the weights of the output layer to the neurons in the hidden layer.

The back-propagated error signal is accumulated and then used to determine the error for the neuron in the hidden layer, as follows:

Where error_j is the error signal from the jth neuron in the output layer, weight_k is the weight that connects the kth neuron to the current neuron and output is the output for the current neuron.

Below is a function named backward_propagate_error() that implements this procedure.

You can see that the error signal calculated for each neuron is stored with the name ‘delta’. You can see that the layers of the network are iterated in reverse order, starting at the output and working backwards. This ensures that the neurons in the output layer have ‘delta’ values calculated first that neurons in the hidden layer can use in the subsequent iteration. I chose the name ‘delta’ to reflect the change the error implies on the neuron (e.g. the weight delta).

You can see that the error signal for neurons in the hidden layer is accumulated from neurons in the output layer where the hidden neuron number j is also the index of the neuron’s weight in the output layer neuron[‘weights’][j].

Let’s put all of the pieces together and see how it works.

We define a fixed neural network with output values and backpropagate an expected output pattern. The complete example is listed below.

Running the example prints the network after the backpropagation of error is complete. You can see that error values are calculated and stored in the neurons for the output layer and the hidden layer.

Now let’s use the backpropagation of error to train the network.

4. Train Network

The network is trained using stochastic gradient descent.

This involves multiple iterations of exposing a training dataset to the network and for each row of data forward propagating the inputs, backpropagating the error and updating the network weights.

This part is broken down into two sections:

  1. Update Weights.
  2. Train Network.

4.1. Update Weights

Once errors are calculated for each neuron in the network via the back propagation method above, they can be used to update weights.

Network weights are updated as follows:

Where weight is a given weight, learning_rate is a parameter that you must specify, error is the error calculated by the backpropagation procedure for the neuron and input is the input value that caused the error.

The same procedure can be used for updating the bias weight, except there is no input term, or input is the fixed value of 1.0.

Learning rate controls how much to change the weight to correct for the error. For example, a value of 0.1 will update the weight 10% of the amount that it possibly could be updated. Small learning rates are preferred that cause slower learning over a large number of training iterations. This increases the likelihood of the network finding a good set of weights across all layers rather than the fastest set of weights that minimize error (called premature convergence).

Below is a function named update_weights() that updates the weights for a network given an input row of data, a learning rate and assume that a forward and backward propagation have already been performed.

Remember that the input for the output layer is a collection of outputs from the hidden layer.

Now we know how to update network weights, let’s see how we can do it repeatedly.

4.2. Train Network

As mentioned, the network is updated using stochastic gradient descent.

This involves first looping for a fixed number of epochs and within each epoch updating the network for each row in the training dataset.

Because updates are made for each training pattern, this type of learning is called online learning. If errors were accumulated across an epoch before updating the weights, this is called batch learning or batch gradient descent.

Below is a function that implements the training of an already initialized neural network with a given training dataset, learning rate, fixed number of epochs and an expected number of output values.

The expected number of output values is used to transform class values in the training data into a one hot encoding. That is a binary vector with one column for each class value to match the output of the network. This is required to calculate the error for the output layer.

You can also see that the sum squared error between the expected output and the network output is accumulated each epoch and printed. This is helpful to create a trace of how much the network is learning and improving each epoch.

We now have all of the pieces to train the network. We can put together an example that includes everything we’ve seen so far including network initialization and train a network on a small dataset.

Below is a small contrived dataset that we can use to test out training our neural network.

Below is the complete example. We will use 2 neurons in the hidden layer. It is a binary classification problem (2 classes) so there will be two neurons in the output layer. The network will be trained for 20 epochs with a learning rate of 0.5, which is high because we are training for so few iterations.

Running the example first prints the sum squared error each training epoch. We can see a trend of this error decreasing with each epoch.

Once trained, the network is printed, showing the learned weights. Also still in the network are output and delta values that can be ignored. We could update our training function to delete these data if we wanted.

Once a network is trained, we need to use it to make predictions.

5. Predict

Making predictions with a trained neural network is easy enough.

We have already seen how to forward-propagate an input pattern to get an output. This is all we need to do to make a prediction. We can use the output values themselves directly as the probability of a pattern belonging to each output class.

It may be more useful to turn this output back into a crisp class prediction. We can do this by selecting the class value with the larger probability. This is also called the arg max function.

Below is a function named predict() that implements this procedure. It returns the index in the network output that has the largest probability. It assumes that class values have been converted to integers starting at 0.

We can put this together with our code above for forward propagating input and with our small contrived dataset to test making predictions with an already-trained network. The example hardcodes a network trained from the previous step.

The complete example is listed below.

Running the example prints the expected output for each record in the training dataset, followed by the crisp prediction made by the network.

It shows that the network achieves 100% accuracy on this small dataset.

Now we are ready to apply our backpropagation algorithm to a real world dataset.

6. Wheat Seeds Dataset

This section applies the Backpropagation algorithm to the wheat seeds dataset.

The first step is to load the dataset and convert the loaded data to numbers that we can use in our neural network. For this we will use the helper function load_csv() to load the file, str_column_to_float() to convert string numbers to floats and str_column_to_int() to convert the class column to integer values.

Input values vary in scale and need to be normalized to the range of 0 and 1. It is generally good practice to normalize input values to the range of the chosen transfer function, in this case, the sigmoid function that outputs values between 0 and 1. The dataset_minmax() and normalize_dataset() helper functions were used to normalize the input values.

We will evaluate the algorithm using k-fold cross-validation with 5 folds. This means that 201/5=40.2 or 40 records will be in each fold. We will use the helper functions evaluate_algorithm() to evaluate the algorithm with cross-validation and accuracy_metric() to calculate the accuracy of predictions.

A new function named back_propagation() was developed to manage the application of the Backpropagation algorithm, first initializing a network, training it on the training dataset and then using the trained network to make predictions on a test dataset.

The complete example is listed below.

A network with 5 neurons in the hidden layer and 3 neurons in the output layer was constructed. The network was trained for 500 epochs with a learning rate of 0.3. These parameters were found with a little trial and error, but you may be able to do much better.

Running the example prints the average classification accuracy on each fold as well as the average performance across all folds.

You can see that backpropagation and the chosen configuration achieved a mean classification accuracy of about 93% which is dramatically better than the Zero Rule algorithm that did slightly better than 28% accuracy.


This section lists extensions to the tutorial that you may wish to explore.

  • Tune Algorithm Parameters. Try larger or smaller networks trained for longer or shorter. See if you can get better performance on the seeds dataset.
  • Additional Methods. Experiment with different weight initialization techniques (such as small random numbers) and different transfer functions (such as tanh).
  • More Layers. Add support for more hidden layers, trained in just the same way as the one hidden layer used in this tutorial.
  • Regression. Change the network so that there is only one neuron in the output layer and that a real value is predicted. Pick a regression dataset to practice on. A linear transfer function could be used for neurons in the output layer, or the output values of the chosen dataset could be scaled to values between 0 and 1.
  • Batch Gradient Descent. Change the training procedure from online to batch gradient descent and update the weights only at the end of each epoch.

Did you try any of these extensions?
Share your experiences in the comments below.


In this tutorial, you discovered how to implement the Backpropagation algorithm from scratch.

Specifically, you learned:

  • How to forward propagate an input to calculate a network output.
  • How to back propagate error and update network weights.
  • How to apply the backpropagation algorithm to a real world dataset.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

      You must use careful experimentation to get answers to each of these questions.

    Himanshu November 15, 2018 at 6:12 pm #

    Hi Jason
    I have few more questions on this.

    First: why you assigned “activation = weights[-1]” , why not other weights or any random value.
    Second: why you are looping only for two values
    for i in range(len(weights) – 1) though we have three values.
    Third: Why you have considered only two outputs here though we have four
    output for layer one =.9643898158763548
    output for layer two =.9185258960543243
    output for layer three = .8094918973879515
    output for layer two = .7734292563511262

    why you considered only last two values why not first two or any other combination.

    Four: here i guess some problem by mistake you have update wrong value
    expected[row[-1]] = 1
    after this line you have updated expected as [1,0] from [0,0]
    and why we in this i have other question why we are updating this value.

      Jason Brownlee November 16, 2018 at 6:12 am #

      Too many questions for one comment (I’m a simple human), one at a time and can you please reference specific examples/lines of code otherwise I can’t help.

    Karim November 18, 2018 at 8:14 am #

    Hi Jason — How can i add more hidden layers? Thankx

    Himanshu November 22, 2018 at 5:24 pm #

    Hi Jason,

    How to apply mathematical implementation of gradient descent and logistic regression,classification in real time data.
    For example if i want use this in survivors of Titanic data how to start with.

    John Sald December 6, 2018 at 2:05 pm #

    Hello, could you show me an example of using one of the extensions you mentioned, which can give us a gain in performance?

    Such as using matrix operations (in the weights) and vectors (inputs, intermediate signals and outputs)

    Pipo December 17, 2018 at 7:25 am #

    How could i change the loss to mse in this code? I can’t wrap my head around it. Thanks

      Jason Brownlee December 17, 2018 at 2:13 pm #

      Calculate error between actual and predicted using a MSE function. That’s it.

        Brett August 24, 2019 at 4:07 am #

        I’m confused about why you chose MSE for a classification problem. I was trying to use this tutorial to discern the differences between a classification and function approximation implementation, and the use of MSE for classification really threw me off. I know that it technically works, but it’s probably good to mention that it’s not ideal. It would have been nice to get exposure to taking the derivative of a different loss function, so that someone who is new to back-propagation will start to grasp how different functions change the derivative, etc. Otherwise, the code is understandable and could be modified slightly to make a good tutorial for function approximation.

          Brett August 24, 2019 at 4:19 am #

          Just to clarify what I’m saying, and to answer Pipo’s question, this implementation is already using MSE. The derivative of MSE with respect to the output is: (output – expected). The fact that you’re multiplying this by the transfer derivative just means that you’re passing the MSE back through the activation of the output node. So if you want the code to work for function approximation, you simply don’t multiply by the transfer derivative. However, if you want classification to work better, you could use the derivative of a different loss function with respect to the output and predicted, and multiply that by your transfer derivative.

            Jason Brownlee August 24, 2019 at 7:59 am #


            Marcel June 7, 2021 at 6:51 pm #

            Hello Brett,

            could you please point out where exactly the MSE loss is calculated? And where you would put the Cross entropy loss? Could you please demonstrate this with a short code example based on the tutorial ? I would be very pleased. Thank you in advance.

          Jason Brownlee August 24, 2019 at 7:58 am #

          Yes, it was what we did in the 90s. Cross entropy would be preferred today, I agree.

          I really need to do a series on coding neural nets from scratch to really dig into this. Thanks for the kick!

            Brett August 24, 2019 at 9:26 am #

            Thanks for the reply. I just found this post on finding the derivative of cross entropy, and it turns out that you can do a really nice simplification of the math to basically get (output – expected) or (expected – output) for your implementation, when combining the cross entropy derivative and sigmoid derivative. So I’m pretty sure that if you simply stop multiplying by the transfer derivative to get your output error, you should see a big increase in performance of the algorithm. Worth a try at least. Here is the link, with the conclusion I mentioned at the very end:


            Jason Brownlee August 25, 2019 at 6:30 am #

            Thanks for sharing.

    muhammad December 21, 2018 at 8:48 am #

    hi, thanks for this code.
    I’m trying to understand why are u adding on the update weights, shouldnt be
    wi←wi−η∂E/∂wi like this?

    Sangeeth January 20, 2019 at 4:40 am #


    This website provides a good introduction for almost all topics in machine learning. Thanks for your work.

    In backpropagation, the error at each neuron is the product of
    1. Change in error w.r.t y_out
    2. Change in y_out w.r.t y
    3. Change in y w.r.t weight.

    Could you please tell how you just multiplied 1 and 2 in backward_propagate_error (from the last layer) and then used 3 in update_weights (from the first layer). Should we not do all steps in backward_propagate_error and then use it to update_weights?.

      Jason Brownlee January 20, 2019 at 5:41 am #

      I show exactly how in the above tutorial.

        sangeeth January 20, 2019 at 5:44 am #

        Sorry, I just realized what I said is same as what you did.

        About the error, should we not use 2*error (derivative of MSE)?

          Jason Brownlee January 21, 2019 at 5:26 am #

          No, we calculate the derivative of the error against the non linear activation function, not the derivative of the loss function itself.

            sangeeth January 21, 2019 at 12:11 pm #

            Ok. I got it. Thanks,

            I think this is online learning using SGD. Would you have an implementation for offline learning using mini Batch Gradient descent?

            Jason Brownlee January 22, 2019 at 6:16 am #

            Correct, you can modify the above example to use batch or mini-batch gradient descent.

            sangeeth January 22, 2019 at 1:14 am #

            Is the sum_error variable same as loss in output?..I get different loss values when testing the same datasets on and your model. Could you tell me why this is?

          Brett August 24, 2019 at 4:25 am #

          The loss function used in this tutorial is: (1/2)(out – expect)^2. The derivative of which with respect to the output is: (out – expect) * 1, or simply (out – expected). This is then multiplied by the transfer derivative, because the error is being passed backward via the chain rule. You always have to take the derivative with respect to the loss function itself first. I hope this clears up any confusion.

    Gary January 22, 2019 at 8:54 am #

    Hi Jason.

    In the “full” seeds example you call user defined function evaluate_algorithm(). However, the “heavy lifting” inside it is performed by the function algorithm(). That function looks like it’s a part of some standard Python library, but I can’t find it in any reference. Also you don’t comment at all at its use.

    What’s the deal here?

    Thank you,


      Jason Brownlee January 22, 2019 at 11:43 am #

      The “algorithm” is a reference to a function that is passed in as an argument.

    Gary January 22, 2019 at 1:24 pm #

    Yes, thank you, I already realized that.



    sangeeth January 28, 2019 at 8:39 am #


    For online machine learning, should we perform epochs?. Should not we update the model based only on the present time input and then predict the next time step. If we do epochs that means the model is getting updated for the whole data set up to the present time. Am I correct?. Thanks

      Jason Brownlee January 28, 2019 at 11:44 am #

      It depends on the problem and the data. Yes, if often makes sense to update the model with new data and with a little of the old data.

      Note, online gradient descent does not have to be used for online learning.

    kmillen February 12, 2019 at 10:49 am #

    Good afternoon Jason. I have thoroughly enjoyed this solution both in Python and my conversion to C#. I guess for all the learning I’ve gleaned, one thing still seems to be a mystery to me. What exactly are the five scores telling me? Do they annotate how well the data fits a curve for each fold?

      Jason Brownlee February 12, 2019 at 1:59 pm #


      The mean of the scores is our estimate of the model’s performance when making predictions on unseen data.

      • Avatar
        kmillen February 26, 2019 at 8:10 am #

        Thank you.

    MathewP February 20, 2019 at 3:25 am #

    I think there is a mistake update_weights function.
    inputs = row[:-1]
    If we have, say 2 inputs and 1 neuron in hidden layer then only one weight is going to be updated, which is clearly wrong. Correct me if I am wrong. The code works fine just taking row as inputs.

      Jason Brownlee February 20, 2019 at 8:11 am #

      I don’t follow the possible issue, can you please elaborate?

      • Avatar
        Romel Rudon October 20, 2019 at 11:25 pm #

        The issue is that the ‘row’ list should represent the outputs from the preceding layer (counting in the direction from input layer to output layer). having row[:-1] seems to exclude the very last output from the preceding layer, which doesn’t seem to be warranted in this case.

          Romel Rudon October 21, 2019 at 2:05 am #

          I see now why a few people (including myself) were thrown off by this line. The last element of the row list ( i.e. row[-1]) is not an actual part of the input data, but the label or the ‘correct answer’ of the input data, which is why it’s left out.


    Venkat February 22, 2019 at 4:53 am #

    Hi Jason Brownlee, back propagation implementation really excellent ,because of without using any predefined library just use functions list, set, and dictionary. I need a suggestion how to write a code for implement activation function like a sigmoidal at hidden layer neurons and a tangent at output neurons. could u help me.

    Venkat February 22, 2019 at 4:49 pm #

    Hi Jason Brownlee, yes , iam not asking how to write code for implementation of tanh, sigmoidial . My request is how to modified code in forward_propagate function to implement suppose x is a activation at hidden layer and y is another activation function at output layer.

    in the above code u r calling transfer function for the hidden neurons and also output neurons . I request u to give suggestion to call different activation functions for hidden and output neurons.

      Jason Brownlee February 23, 2019 at 6:26 am #

      Change the code in the activation function itself.

      Does that help?

    Danh Nguyen February 24, 2019 at 12:09 pm #

    Example is great! The totally clean CSV wheat seed dataset is here:

    I tried Jason’s link
    and the UCI Repo link and the CSVs still had double commas and so we got the str_column_to_float error

    Anyway, posting this here so others won’t run into the same problem I did! Thanks

    vartika sharma February 27, 2019 at 3:49 am #

    Hey Jason,
    While I am ruuning the following code, I am getting this error
    >>> scores=evaluate_algorithm(dataset,back_propagation,n_folds,l_rate,n_epoch,n_hidden)
    Traceback (most recent call last):
    File “”, line 1, in
    File “”, line 13, in evaluate_algorithm
    File “”, line 5, in back_propagation
    File “”, line 6, in train_network
    TypeError: list indices must be integers, not str

    Andy March 5, 2019 at 12:51 pm #

    Hello Jason,

    I would like to ask, can you make the data split between training data and test data, instead of using k folds variation, I would like to get some insight in this, thanks

    Andy March 5, 2019 at 6:33 pm #

    Hello Jason, it’s me again.

    I would like to ask another question, how do you predict using this trained network ?
    Lets say I have 100 data, and I split the training and test by 70:30 ratio. I’ve trained the network using 70 data, how do I predict the rest (30 data) ?

    Dini M March 11, 2019 at 12:21 am #

    # Convert string column to integer
    def str_column_to_int(dataset, column):
    class_values = [row[column] for row in dataset]
    unique = set(class_values)
    lookup = dict()
    for i, value in enumerate(unique):
    lookup[value] = i
    for row in dataset:
    row[column] = lookup[row[column]]
    return lookup

    I got the error “class_values = [row[column] for row in dataset]”
    IndexError: list index out of range

    Dini M March 11, 2019 at 12:28 am #

    I trying your code example and seeds_dataset.csv

    giuseppe March 15, 2019 at 7:08 am #

    Hi thanks for the code is amazing,
    I’ve included it in a class for a project, I’ve modified it so I can decide how many neurons put in each layer because, but I have a question: during the train process using for example 4 neurons in the first layer and 3 in the second one I get nice result, around 85% / 92 %. At this point I save all the weights of the neurons and I call another function that just load the weight that I’ve saved(skipping in this way the traning process) and using all the dataset(the same I’ve used for train the network) as test set it gives me a really bad score, most of the time is around 30%. I’m using the “IrisDataTrain” and what I’ve noticed is that the networks fails to recognise one of the 3 classes. Do you have any suggestion about what could be? Thanks 🙂

      Jason Brownlee March 15, 2019 at 2:25 pm #

      Perhaps the weights are not being loaded correctly?

      • Avatar
        giuseppe March 28, 2019 at 8:11 pm #

        Hi sorry for the late replay, actually the problem of save and loading the nn is not that important so maybe i’ll try to solve it later. At the moment the problem is that i should reach the 99. % on the “IrisDataTrain” set. What i’ve noticed is that the accurancy can change a lot repeating the same training process with the same configuration. In order to get a better result I’ve tryied to repeat the same traning process different times with the same configuration, I’ve choosen the configuration that give me the best result in mean and variance. Now in order to improve the accurancy I’ve modified the code so that I can connect easily the output of a nn with the input of another one so that I can create a cascade of neural networks connected in different ways. At this point i’m stucked at 96% in mean. To improve the accurancy I’ve implemented the relu activation function (but i’m not sure it’s correctly implemented) and adam optimizer (but it doesn’t work at all).
        I’ll link the code on pastbin (I don’t know if there is any better way to do that) in particular what I’ve done is just insert everything in a class and modified:
        1. the initialization function so that I can chose the number of neurons in each layer
        2. the back_propagation_error function trying to add the relu and adam optimizer
        3. the update weights function trying to implement adam optimizer (it doesnt’ work at all)
        In the code I’m going to share I’ve just removed many parts just for a readability reason, after cleaning it I will send it to you if you want. Sorry for the long message and thanks for you help 🙂

        • Avatar
          Jason Brownlee March 29, 2019 at 8:31 am #

          Nice work!

          It might be time to graduate to Keras where everything is implemented for you and you can just use it directly and focus on tuning the model.

    MLnovice March 21, 2019 at 10:53 pm #

    Hello sir,
    I am playing with your code and I am trying to figure out this error:

    line 185, in
    str_column_to_float(dataset, i)
    line 21, in str_column_to_float
    row[column] = float(row[column].strip())
    ValueError: could not convert string to float:

    Do you have any insides of why this is happening?

    Matty March 25, 2019 at 9:55 am #

    Thank you for the post Jason.

    Reading this post, it seems to me that I can split the process of back propagation in large networks into multiple steps. Am I right?

    I have a large network that my current GPU runs out of memory when I try to train it. I was wondering if I can split my network into two sub-networks, and first calculate the updates for the deeper part(that has the ground truth outputs) and obtain the error that should be passed to the other sub-network. Then, use the provided error to calculate the updates for the second sub-network as well. Do you think it’s possible? Do you have any suggestion (or source that can be helpful) for implementing this back propagation with existing tensorflow or pytorch builtin functions?


      Jason Brownlee March 25, 2019 at 2:17 pm #

      Yes, by node or by layer.

      It might be possible, but also a massive pain.

      It might be cheaper (in time/money) to rent an AWS EC2 instance with more GPU RAM for a few hours?

        Matty March 26, 2019 at 1:03 pm #

        Thanks, Jason. I think I found an easy way to split the back-propagation in tensorflow.

        We can define two separate optimizations with different trainable variable lists. Something similar to:

        self.optim_last_layers = tf.train.AdamOptimizer(lr, beta1=beta1) \
        .minimize(loss, var_list=vars_of_last_layers)

        self.optim_first_layers = tf.train.AdamOptimizer(lr, beta1=beta1) \
        .minimize(loss, var_list=vars_of_first_layers)

        And in each iteration, we can call the optimizations separately.

        I did a small sanity check with a two-layer network, and it seems both the two-step optimization and the one-step optimization with all the trainable parameters results in the same updates.

    Novia Puspitasari March 28, 2019 at 12:17 am #

    thankyou so much jason, for your post about it
    i have some problem in “‘float’ object has no attribute ‘append'”

    Traceback (most recent call last):
    File “”, line 200, in
    scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
    File “”, line 80, in evaluate_algorithm
    predicted = algorithm(train_set, test_set, *args)
    File “”, line 172, in back_propagation
    train_network(network, train, l_rate, n_epoch, n_outputs)
    File “”, line 150, in train_network
    backward_propagate_error(network, expected)
    File “”, line 123, in backward_propagate_error
    AttributeError: ‘float’ object has no attribute ‘append’

  172. Avatar
    Kevin March 29, 2019 at 1:21 am #

  173. Avatar
    wancong zhang March 30, 2019 at 12:45 pm #

    Hi Jason, very cool tutorial.

    I notice that your neural network only has 3 layers.

    If I change your “initialize network” method to initialize multiple hidden layers with arbitrary width, will your program still work? In other words does your algorithm generalize to deeper networks?


      Jason Brownlee March 31, 2019 at 9:26 am #

      No idea – it is for educational purposes only, try it and see.

  174. Avatar
    manoj April 19, 2019 at 7:38 am #

    Hi Jason!

    Its really a helpful post, that you very much.

    I wanted to see the plots of training error and testing error. (like how they finally converged by epochs by epochs). What would be the easiest way to plot those training and testing graphs

  175. Avatar
    Danial April 20, 2019 at 5:46 pm #

    Hi jason.
    My question is how I can see my CNN code is using BP framework?

      Jason Brownlee April 21, 2019 at 8:20 am #

      You can save the model weights to a file.

      Is that what you mean?

        Danial April 21, 2019 at 1:27 pm #

        Yes. How I can see model weights? How cnn use BP framework if it is not shown in code?
        for i in range(len(test)):
        # Forecast the data
        test_X, test_y = test[i, 0:-1], test[i, -1]
        X_ = test_X.reshape(1, 28, 28, 1)
        predict = model.predict(X_, batch_size=1)
        # Replacing value in test scaled with the predicted value.
        test_prediction = [predict] + test_prediction
        if len(test_prediction) > sequence_length+1:
        test_prediction = test_prediction[:-1]
        if i+1 sequence_length+1:
        test[i+1] = test_prediction
        test[i+1] = np.concatenate((test_prediction, test[i+1, i+1:]), axis=0)

        # Inverse transform
        predict = inverse_transform(scaler, test_X, predict)
        # Inverse the features
        predict = inverse_features(data_set, predict, len(test)+1-i) – maxVal
        if predict < 0:
        predict = 0
        # Round the value
        predict = np.round(predict, 2)
        # store forecast
        expected = data_set[len(train) + i + 1]
        predict_data.append(predict )
        real_data.append(expected )
        if expected != 0:

          Jason Brownlee April 22, 2019 at 6:15 am #

          You can get the model weights from a Keras model by calling the get_weights() function on a give layer.

  176. Avatar
    Hello Jason , You’re the best teacher.

  177. Avatar
    Nice Post !

    Here is another very nice tutorial with step by step Mathematical explanation and full coding.

    Zahra May 6, 2019 at 4:28 pm #

    Hello, I’m so confuse.
    I try to run this code in command prompt. But, I use my dataset (not Wheat Seeds dataset).

    And why this happened? What’s wrong? What should I do? What should I change?
    Please, help me!

    Traceback (most recent call last):
    File “”, line 197, in
    scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
    File “”, line 81, in evaluate_algorithm
    predicted = algorithm(train_set, test_set, *args)
    File “”, line 173, in back_propagation
    train_network(network, train, l_rate, n_epoch, n_outputs)
    File “”, line 150, in train_network
    expected[row[-1]] = 1
    IndexError: list assignment index out of range

    Zahra May 9, 2019 at 1:11 am #

    Hello, How to import my dataset using that codes?

    For example (in this codes), how to use my dataset (use excel file) in this codes.. How to import my dataset in this codes? Can you teach me more detail, please..

    # Test training backprop algorithm
    dataset = [[2.7810836,2.550537003,0],
    n_inputs = len(dataset[0]) – 1
    n_outputs = len(set([row[-1] for row in dataset]))
    network = initialize_network(n_inputs, 2, n_outputs)
    train_network(network, dataset, 0.5, 20, n_outputs)
  180. Avatar
    Nirmala May 10, 2019 at 5:37 am #

    I got an error called->

    IndexError: list assignment index out of range.

    but I m using python 3 itself.

  181. Avatar
    Ido Berenbaum May 11, 2019 at 9:39 pm #

    Hi Jason,
    thanks for the great tutorial, I learned a lot from it.
    There is one thing I didn’t really understand though,
    when you update the weights you add to the weight the calculated change that is needed.
    but’from what I read in other sites like wikipedia, the change to the weight needs to be multiplied by -1 and then added to ensure it
    changes the weight in the opposite direction of the gradient and so getting it closer to the local minimum.
    like muhammad said in December 21, 2018:
    “hi, thanks for this code.
    I’m trying to understand why are u adding on the update weights, shouldnt be
    wi←wi−η∂E/∂wi like this?”

    and I tried to change line 141 to: neuron[‘weights’][j] -= l_rate * neuron[‘delta’] * inputs[j]
    basically doing -= and not += but it just made the sum error of the network to increase after each epoch.

    so, I will be grateful if you could explain to me why are you adding and not subtracting.


    • Avatar
      Jason Brownlee May 12, 2019 at 6:43 am #

      There are many ways to implement the algorithm description.

      This implementation is based on the description in “neural smithing”:

    • Avatar
      cocoa July 16, 2019 at 8:37 am #

      Jason seem to use mean square error as loss. partial derivative of loss should be (output-expected). In his “backward” function, he did (expected-output). That’s why he came up with “+=” not “-=”

  182. Avatar
    Nirmala May 16, 2019 at 4:23 pm #

    In training code and testing code I want to link the onther dataset .txt file but it will not work.please can u send a code for that..

      Jason Brownlee May 17, 2019 at 5:49 am #

      Sorry, I don’t have the capacity to develop custom code for you.

    Arthur May 22, 2019 at 4:03 pm #

    Hi Jason, first of all thank you very much for this post, I’m learning ML at the moment, and writing a neural network with backpropagation in C# to help the process.

    When using the wheat seeds dataset, and the same network layout as you suggest, I get very similar results to yours in terms of accuracy.

    I’m trying to understand why it happens with this particular data, or whether my implementation fails somehow. Note that I do get good results most of the time, but with a certain weight initialization the exploding gradients can happen.

  184. Avatar
    Zahra Nabila May 27, 2019 at 4:00 pm #

    Hello, I have problem. Why output must be integers, not float (decimal)? Specially in Train.. How to change output data type to float?

    TypeError Traceback (most recent call last)

    in train_network(network, train, l_rate, n_epoch, n_outputs)
    93 outputs = forward_propagate(network, row)
    94 expected = [0 for i in range(n_outputs)]
    —> 95 expected[row[-1]] = 1
    96 sum_error += sum([(expected[i]-outputs[i])**2 for i in range(len(expected))])
    97 backward_propagate_error(network, expected)

    TypeError: list indices must be integers or slices, not numpy.float64

    Jeny June 3, 2019 at 10:06 am #

    Calculate the output of a recurrent neural network with tanh activation
    and a linear layer on top
    x: input matrix [n_timesteps * n_samples * 2]
    w: non-recurrent weights
    r: recurrent weights
    b: biases
    wo: output-layer weights
    bo: output-layer biases
    h: matrix of activations (n_timesteps, n_samples, n_hiddens)
    o: final predictions

    def forward_path(x, w, r, b, wo, bo):
    h = np.empty([t_max, n, w.shape[0]], dtype=np.float32) # storage for the hidden activations
    for t in range(t_max):
    z =[t], w.T) + b
    if t > 0:
    z +=[t-1], r.T)
    h[t] = np.tanh(z)
    o =[-1], wo.T) + bo
    return h, o

    def backward_path(x, h, w, b, r, wo, bo, o, y):
    n, t_max, _ = x.shape
    dw = np.zeros_like(w)
    db = np.zeros_like(b)
    dr = np.zeros_like(r)
    dwo = 0
    dbo = 0

    return dw, dr, db, dwo, dbo

    def loss(w, r, b, wo, bo, x, y):
    _, o = forward_path(x, w, r, b, wo, bo)
    err = 0.5*np.sum(np.square(o-y))
    return err

    • Avatar
      Jason Brownlee June 3, 2019 at 2:34 pm #

      Sorry, I don’t have the capacity to debug your code, perhaps try posting to stackoverflow?

  186. Avatar
    Zahra Nabila Izdihar June 10, 2019 at 11:53 pm #

    How to display “predicted” value in your code?

    Because I need to display the predicted or forecast value..

    • Avatar
      Jason Brownlee June 11, 2019 at 7:54 am #

      The forward_propagate() function makes a prediction.

      • Avatar
        Zahra Nabila Izdihar June 13, 2019 at 2:36 am #

        I got it. So, “output” (in ForwardPropagation code)= prediction result?

        But, I don’t understand How to determine the weights in forward propagation? What is the formula?

        Thank you

        • Avatar
          Jason Brownlee June 13, 2019 at 6:21 am #

          What do you mean exactly?

          The weights are learned during training.

          • Avatar
            Zahra June 16, 2019 at 4:51 am #

            Did you mean that “output” (in forward propagatation) is predicted result?

            # test forward propagation
            network = [[{‘weights’: [0.13436424411240122, 0.8474337369372327, 0.763774618976614]}],
            [{‘weights’: [0.2550690257394217, 0.49543508709194095]}, {‘weights’: [0.4494910647887381, 0.651592972722763]}]]
            row = [1, 0, None]
            output = forward_propagate(network, row)

          • Avatar
            Jason Brownlee June 16, 2019 at 7:16 am #


            Perhaps this is too advanced. I recommend starting with Keras instead:

    Leo July 2, 2019 at 11:25 pm #

    It is crazy that nobody complain the readability of your codes. Thanks anyway

    • Avatar
      Jason Brownlee July 3, 2019 at 8:34 am #

      Sorry that you think that the code is not readable. I thought it was very readable.

  188. Avatar
    Femi July 14, 2019 at 12:17 am #

  189. Avatar
    Ravi July 19, 2019 at 9:52 pm #

    Hi Dr. Jason

    I have developed and trained a neural network (3 layers: 1 input, 1 hidden and 1 output) for following situation

    (The code was written step by step, as i do not want to use a tool without understanding the computations)

    Data set (40 input patterns):

    Input: 40 samples 5 elements
    Output: 40 samples 1 element
    number of neurons (Input = 5; hidden = 5; output = 1)

    Using the delta rule with backpropagation algorithm, i was able to achieve error = 9.39E-06 for 1000 iterations

    My final “input to hidden layer” weight matrix size is 200 x 5 (as i have 40 samples x 5 input neurons and 5 hidden neurons)

    “hidden to output layer” weight matrix size is 200 x 1 (as i have 40 samples x 5 hidden neurons and 1 output neuron)

    Now my question is for a given test sample having 5 elements (input is 1 sample 5 elements),

    i need to run feed-forward computation to get a single element output.

    For running this which weights i need to select in “input to hidden layer” and “hidden to output layer” from the trained set??

    I have 200 x 5 and 200 x 1 weight matrices; but i require only 5 x 5 and 5 x 1 weight matrices for testing.

    Kindly let me know if i am missing something here?

  190. Avatar
    Chrissie Li July 25, 2019 at 9:06 pm #

  191. Avatar
    Femi July 25, 2019 at 9:39 pm #

  192. Avatar
    Femi July 31, 2019 at 12:39 am #

    Sir, I guessed you use scipy environment. am i right?

      Jason Brownlee July 31, 2019 at 6:53 am #

      For this tutorial, a simple Python environment is enough.

  193. Avatar
    Majed August 4, 2019 at 8:10 am #

    I wrote a neural network that consists of three layers as follows:[ 4 input neurones – 5 hidden neurones – 3 output neurones]. first, I standerdized the data using the z-score. The accuracy of my model exceeded 67. Note: I didn’t use the regularisation terms yet.
    here is my implementation of both feedforward and back prop ..

    • Avatar
      Majed August 4, 2019 at 8:11 am #

      The data set that I worked on is the Iris data set

    • Avatar
      Jason Brownlee August 5, 2019 at 6:43 am #

      Well done!

  194. Avatar
    Ekundayo August 6, 2019 at 9:40 pm #

  195. Avatar
    Mohammed August 14, 2019 at 12:20 pm #

    Hi Dr. Jason

    Thank you for this post, it is really very helpful.

    I have one question about backpropagation in unsupervised model, e.g. extract features.
    Is it possible to apply this code for it, and only replaces loss function of unsupervised model by the loss function of supervised?


      Jason Brownlee August 14, 2019 at 2:10 pm #

      Backpropagation is for supervised learning, not unsupervised learning.

      • Avatar
        Mohammed August 15, 2019 at 5:08 pm #

        Oh! many thanks,So, can help me what is the way for learning parameters in unsupervised approach.
        if i need to extract the features from data as low dimension nested of data with large dimension.

  196. Avatar
    Mohammed August 15, 2019 at 5:10 pm #

    such as Unsupervised feature learning with Sparse Filtering!

    • Avatar
      Jason Brownlee August 16, 2019 at 7:48 am #

      Sorry, I don’t have a tutorial on that topic, perhaps in the future.

    Mohammed August 16, 2019 at 11:20 am #

  198. Avatar
    Cherinet Mores August 20, 2019 at 4:44 pm #

    Jason Brownlee
    Thank you for your continues help
    Here I have one questions,
    In case, if i want to solve the regression problem (Meaning, if I have 3 real value outputs from the input parameters) which part of the code should be modified and How?

    • Avatar
      Steven Pauly August 20, 2019 at 9:59 pm #

      Hi Cherinet, I’ve changed the n_outputs to 1 and the function train_network, I’ve changed the below. I’ve increased the n_epoch to a lot higher, because else it will give you the average. Be sure to normalize your input & output values, though.

      • Avatar
        Thanks for sharing.

      • Avatar
        Dear Steven Pauly thank you very much for your help.

      • Avatar
        Charles September 17, 2019 at 1:01 pm #


        By normalizing input and output, do you mean modifying the forward_propogate method like this?

    • Avatar
      Change the output to be a linear activation and the loss function to mse.

  199. Avatar
    Steven Pauly August 20, 2019 at 9:56 pm #

    Well Done, Jason! Great stuff!!!

  200. Avatar
    George Shannon September 25, 2019 at 12:09 am #

    Dear Dr. Brownlee:

  201. Avatar
    Harini October 6, 2019 at 3:45 am #

    Dear Sir,

    This tutorial is really helpful for a beginner like me. I couldn’t understand where the input and output nodes are mentioned in the code. How to change number of nodes for input and output layer. Kindly help me with it.


  202. Avatar
    Víctor October 21, 2019 at 5:06 am #

    Hello Jason,

    • Avatar
      Jason Brownlee October 21, 2019 at 6:25 am #

      I recommend using sklearn for real projects, this code is for learning purposes only.

      That being said, you can save the “network” prepared in the backpropagation function.

  203. Avatar
    chamodi October 31, 2019 at 5:53 pm #

  204. Avatar
    Jaya November 9, 2019 at 3:10 pm #

  205. Avatar
    Jean November 27, 2019 at 2:08 am #

    Hello Jason,
    Thanks for great content.
    Nevertheless, I think it would much better if you could also write down the mathematical equation behind the code(s). It would be much easier to understand how all “those scary math” are implemented.
  206. Avatar
    Tobias December 7, 2019 at 7:27 am #

    It does not work for xor but it works for the first data you used. Why?

      Jason Brownlee December 8, 2019 at 6:00 am #

      The network was designed for a specific dataset.

  207. Avatar
    Tobias December 7, 2019 at 7:54 am #

    this is what it is outputting

    Expected=1, Got=0
    Expected=1, Got=0
    Expected=0, Got=0
    Expected=0, Got=0

    • Avatar
      Jason Brownlee December 8, 2019 at 6:02 am #

  208. Avatar
    Samara Silva Santos December 8, 2019 at 1:13 am #

    Hii, I would like to know what do you mean when you say that “Using the Zero Rule algorithm that predicts the most common class value, the baseline accuracy for the problem is 28.095%.”

    If I use this algoritm for another use case, the accuracity is just 28%?

    Please, look what I have:
    I need to modify this approach to use Quasi-newton method to calculate the error, instead of gradient method. The gradient method, what you have used, use partial derivative to calculate if the error is growing on. I see that you implemented derivative this way:

    def transfer_derivative(output):
    return output * (1.0 – output)

    And what I know is that derivate is calculated this way:

    ( f(x + h) – f( x) ) /h

    this both way are equivalent?

    I already have quasi-newton method implemented but it is now really difficult to me make this modification.

    Please, let me know if you could help me. I really appreciate your help.

      Jason Brownlee December 8, 2019 at 6:14 am #

      I mean predicting the majority class. It is a naive classifier sometimes called the zero rule.

      Sorry, I don’t have the capacity to help you adapt the example to use a different optimization algorithm.

  209. Avatar
    Jeff Myzek December 9, 2019 at 9:49 am #

    Hey Jason,
    I am trying to use your code to run back propagation on MNIST with the following parameters but i am having trouble: 784 input units, a hidden layer of 100, and a Softmax group of 10 units as the output layer, cross-entropy loss objective function. I want to compute the weight update based on the entire training set, using the error backpropagation algorithm. learning rate that’s small enough for all practical purposes, but not so small that the network doesn’t learn. And I want to stop when the weight update becomes zero. Optimally i would want to see the weight vector and loss at each step. would you be able to assist me?

      Jason Brownlee December 9, 2019 at 1:43 pm #

      I would recommend using mini-batches to approximate the error gradient.

  210. Avatar
    Sabarish December 11, 2019 at 6:20 pm #

    Back Propagate Error:

    We are using the sigmoid transfer function, the derivative of which can be calculated as follows:

    derivative = output * (1.0 – output)

    What does it mean? I am not clear. Could you please help me understand?
    Sigmoid function =1/1+e**-x
    • Avatar
      Jason Brownlee December 12, 2019 at 6:15 am #

      The gradient or slope at a point on the function.

      • Avatar
        Job December 10, 2021 at 3:49 am #

        Yes but the derivative of 1/1+e**-x
        is equel to (e**-x)/((e**-x)**2)
        and not x*(1-x)
        is it so that the error rises as you get further from x = 0??

        • Adrian Tam
          Adrian Tam December 10, 2021 at 4:26 am #

          It is y = 1/(1+e**-x)
          and then differentiation is y’ = y*(1-y)

    Sylvan December 17, 2019 at 7:25 am #

    Hello here!

    I am very new with Python in Data Science and Artificial Intelligence. Can anyone here help me with this AI assignment below due by December 19 2019, please? I am seriously stuck. Here is the question:

    <> End of the question.

    Below is the indicator simple code:

    # Import Built-Ins
    import logging

    # Import Third-Party
    import pandas as pd
    import numpy as np

    # Import Homebrew

    # Init Logging Facilities
    log = logging.getLogger(__name__)

    from alpha_vantage.timeseries import TimeSeries
    import matplotlib.pyplot as plt

    # Add get_price() def from get_price_alphavantagepy code
    def get_prices():
    apikey = “BW4V00IXHSAE829D”

    ts = TimeSeries(key=apikey, output_format=’pandas’)
    data, meta_data = ts.get_intraday(symbol=’MSFT’,interval=’1min’, outputsize=’full’)
    data[‘4. close’]

    # End add get_price() def from get_price_alphavantagepy code

    #plt.title(‘Intraday Times Series for the MSFT stock (1 min)’)
    return data[‘4. close’] #return price

    #if __name__ == “__main__”:
    # get_prices()

    def rsi(price, n=14): #rsi(prices, n=14):
    deltas = np.diff(prices)
    seed = deltas[:n+1]
    up = seed[seed>=0].sum()/n
    down = -seed[seed0:
    upval = delta
    downval = 0.
    upval = 0.
    downval = -delta
    up = (up*(n-1) + upval)/n
    down = (down*(n-1) + downval)/n

    rs = up/down
    rsi[i] = 100. – 100./(1.+rs)
    return rsi
    prices = get_prices()


    • Avatar
      Jason Brownlee December 17, 2019 at 7:34 am #

      Perhaps try posting your code and question to stackoverflow?

  212. Avatar
    bismeet December 22, 2019 at 5:12 pm #

    row = [1, 0, None]
    • Avatar
      Jason Brownlee December 23, 2019 at 6:44 am #

      The final value in the row is the class label. Here we set None, as in no class label.

      • Avatar
        bismeet December 24, 2019 at 12:40 pm #

        I still don’t understand , how can an input have no class label?

        • Avatar
          Jason Brownlee December 24, 2019 at 4:58 pm #

  213. Avatar
    bismeet December 22, 2019 at 9:40 pm #

    Why are there two formulas for error?

    error = (expected – output) * transfer_derivative(output)

    error = (weight_k * error_j) * transfer_derivative(output)

    • Avatar
      Jason Brownlee December 23, 2019 at 6:49 am #

      They are the same, but one for the output of the model and one for credit assignment for each weight.

    Vaishu December 24, 2019 at 9:23 pm #

  215. Avatar
    Ansist January 19, 2020 at 1:49 am #

    Hi, I am a little new to the implementing neural networks and the underlying mathematics. I wanted to know why the target variable (y-variable) is usually binary in nature ([0 or 1]). Why can’t I have, say for example returns (usually between [-1,1] continuous)?

    Secondly, is it always advised to transform your X and Y variables before feeding them into the neural network?

  216. Avatar
    rafael gamboa January 23, 2020 at 10:54 am #

  217. Avatar
    Bram January 28, 2020 at 6:00 pm #

    hi jason,
    I just finished the tutorial, this tutorial is very helpfull for me as the beginner in python and neural network. i have some question for the k-fold validation

    in the tutorial above I see if every fold process need to initialize a new network. Does the neural network work like that? i think the network will only be initialized once and the network will be used in the next fold? not initialize a new one. what if i use it in a real case ?
    I might be wrong, please correct me.

  218. Avatar
    ssrinath February 19, 2020 at 3:22 am #

    hello jason brownlee

  219. Avatar
    Salvador February 20, 2020 at 9:43 am #

    Hello Jason,
    I receive this error in spyder :
    IndexError: list assignment index out of range.
  220. Avatar
    Melkamu February 22, 2020 at 1:04 pm #

  221. Avatar
    Melkamu February 22, 2020 at 1:15 pm #

    Hello Jason i am new for Python and i seriously follow your tutorial because i wanna to design my own prediction model using neural network with back propagation algorithm. but when i try to write this code on Jupiter notebook on python 3.6 “list index out of range ” error message displayed. Could you correct me ? the code i tried and error is from random import seed
    from random import random
    from math import exp

    # Initialize a network
    def initialize_network(n_inputs, n_hidden, n_outputs):
    network = list()
    hidden_layer = [{‘weights’:[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
    output_layer = [{‘weights’:[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
    return network

    network = initialize_network(5, 6, 1)
    for layer in network:
    #Calculate neuron activation for an input
    def activate(weights, inputs):
    activation = weights[-1]
    for i in range(len(weights)-1):
    activation += weights[i] * inputs[i]
    return activation

    # Transfer neuron activation
    def transfer(activation):
    return 1.0 / (1.0 + exp(-activation))

    # Forward propagate input to a network output
    def forward_propagate(network, row):
    inputs = row
    for layer in network:
    new_inputs = []
    for neuron in layer:
    activation = activate(neuron[‘weights’], inputs)
    neuron[‘output’] = transfer(activation)
    inputs = new_inputs
    return inputs

    # test forward propagation
    network = [[{‘weights’: [0.13436424411240122,
    ‘output’: 0.7853169772903308}],
    [{‘weights’: [0.651592972722763,
    [{‘weights’: [0.762280082457942,
    [{‘weights’: [0.9014274576114836,
    [{‘weights’: [0.21659939713061338,
    [{‘weights’: [0.23308445025757263,
    [{‘weights’: [0.8375779756625729,
    row = [0, 1, 0, 0, 0]
    output = forward_propagate(network, row)
    and the error is
    IndexError Traceback (most recent call last)
    85 0.12088995980580641]}]]
    86 row = [0, 1, 0, 0, 0]
    —> 87 output = forward_propagate(network, row)
    88 print(output)

    in forward_propagate(network, row)
    33 new_inputs = []
    34 for neuron in layer:
    —> 35 activation = activate(neuron[‘weights’], inputs)
    36 neuron[‘output’] = transfer(activation)
    37 new_inputs.append(neuron[‘output’])

    in activate(weights, inputs)
    20 activation = weights[-1]
    21 for i in range(len(weights)-1):
    —> 22 activation += weights[i] * inputs[i]
    23 return activation

  222. Avatar
    Pavitra February 29, 2020 at 12:12 am #

    Hello Jason,
  223. Avatar
    Lucas March 3, 2020 at 2:20 am #

    Hello Jason,

    i’ve noticed that (in chapter 4) you use the labels (0,1) as a bias constant for the bias weight multiplication in the activation function. Shouldn’t you theoratically set the all labels temporarily to 1, otherwise samples with label 0 will have no bias?

      Jason Brownlee March 3, 2020 at 6:01 am #


      Why do you think this?

      • Avatar
        Lucas March 5, 2020 at 1:55 am #

        Oh nvm sorry, i missed that you add the bias explicitly “activation = weights[-1]”.

        Most of the books I’ve read add a temporary “1” to the inputs, so that the dot product doesn’t exclude the bias. So i falsly asumed you wanted to set the labels temporarly to one “activation = activate(neuron[‘weights’], inputs)” because the inputs include the labels (i already thought this would be a weird way to do it).

        Btw thanks for the excellent tutorial.

        I also tried to implement a multi-layer nn which uses np.arrays for efficent matrix multiplication. But somehow my weights get really small really fast. Is this a general problem with nn’s or is it probably a problem with my activation function?
        I use reLu for the hidden layers and sigmoid for the output layer.

          Jason Brownlee March 5, 2020 at 6:39 am #

          NN can be hard to debug, it could be a hyperparameter or it could be a bug in your implementation.

          Moving to a standard lib is highly recommended at some point.

  224. Avatar
    Heritiera fomes March 3, 2020 at 5:50 am #

    Hello Jason,

    I have to implement a placement problem, where I need to place some students in different classes, where every classes have some capacity. In that case how can I relate ANN with these?
    If I want to add some constraints in the ANN, how can I add these constraint? for example when a test case (student) is going to be predict in which class it is assigned. My porblem needs to check the capacity of the class, then all the students must be assigned to a class.
    It would be great help if I hear from you.

    A Kranthi Kiran March 4, 2020 at 9:06 pm #

    can I know how to build a front end for this model using flask? or
    is there any other best way to build a front end rather than flask?

    • Avatar
      Jason Brownlee March 5, 2020 at 6:34 am #

      I don’t have an example, sorry.

      Eunike Kamase Elisabeth August 8, 2020 at 5:28 pm #

      hello, have you already know how to build the front end for backpropagation with flask?

  226. Avatar
    Prabhu Prasad Dev March 6, 2020 at 11:42 pm #

    Is there any code or how to implement Spiking Neural Network(SNN).. I am very much interested to know about SNN bcoz it is the 3rd generation of neural network..Can u plz help me of details about SNN???

  227. Avatar
    Namitha Dsouza March 8, 2020 at 4:43 pm #

    I am new to this field. I am sorry if you do not get my question.

    # Train a network for a fixed number of epochs
    def train_network(network, train, l_rate, n_epoch, n_outputs):
    for epoch in range(n_epoch):
    for row in train:
    outputs = forward_propagate(network, row)
    expected = [0 for i in range(n_outputs)]
    expected[row[-1]] = 1
    backward_propagate_error(network, expected)
    update_weights(network, row, l_rate)

    What is the use of these two lines? Is it only for binary classification or any classification with 3 or more classes can use this? Because this works perfectly for binary classification. But for other classifications, it gives an error.

    expected = [0 for i in range(n_outputs)]
    expected[row[-1]] = 1

  228. Avatar
    Carlos Meza March 13, 2020 at 1:47 pm #

    Hello!! Im new on this. If I want to add 9 input variables instead of 7. What do I need to change in the code in order to make it work. Amazing publication!

  229. Avatar
    Alex Ramirez March 16, 2020 at 11:15 am #

    Hello! How to calculate the recall/precision/F1Score from this excersise?

  230. Avatar
    Sumanta Das March 30, 2020 at 3:01 am #

    How to modify the code to work with GPUs, without using fancy libraries?

      Jason Brownlee March 30, 2020 at 5:37 am #

      Not sure you can. Fancy libraries (keras on tensorflow) let you use the GPU.

  231. Avatar
    Abhishek March 30, 2020 at 10:13 pm #

    Hi Jason,

    Trying to execute, but I’m facing this Error. I’m running the Code on Spyder (Python 3.7)

    Traceback (most recent call last):

    File “”, line 1, in
    runfile(‘C:/Users/duppa/Desktop/Wheat Seed Code’, wdir=’C:/Users/duppa/Desktop’)

    File “C:\Users\duppa\Anaconda3\lib\site-packages\spyder_kernels\customize\”, line 786, in runfile
    execfile(filename, namespace)

    File “C:\Users\duppa\Anaconda3\lib\site-packages\spyder_kernels\customize\”, line 110, in execfile
    exec(compile(, filename, ‘exec’), namespace)

    File “C:/Users/duppa/Desktop/Wheat Seed Code”, line 204, in
    scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)

    File “C:/Users/duppa/Desktop/Wheat Seed Code”, line 82, in evaluate_algorithm
    train_set = sum(train_set, [])

    File “C:\Users\duppa\Anaconda3\lib\site-packages\numpy\core\”, line 2076, in sum

    File “C:\Users\duppa\Anaconda3\lib\site-packages\numpy\core\”, line 86, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)

    TypeError: ‘list’ object cannot be interpreted as an integer

  232. Avatar
    Robin April 8, 2020 at 1:04 am #

    Hi Jason,

    Just wanted to extend my thanks for your tutorial. Two months ago I wanted to learn Python and was in the middle of learning more about AI and ML and used your tutorial to help me implement my first neural network. A lot of literature and texts use a more mathematical and lower level approach to neural networks using matrices, etc which isn’t intuitive to me but your tutorial just clicked as it was easy to conceptualize and was a more simple approach.

    One of the ways I cement my own knowledge is writing about things I’m working on or worked on, and to really commit neural networks to memory, I wrote a tutorial myself for a neural network using learning rate and momentum parameters.

    I would love to put your web page in an acknowledgements section if you were okay with that as I don’t think I would have figured out neural networks if it weren’t for your site.


      Jason Brownlee April 8, 2020 at 7:55 am #

      Thanks Robin, well done on your progress!

      Yes, please link back.

  233. Avatar
    tom April 12, 2020 at 2:39 pm #

    Hey Jason, amazing tutorial on how to implement a 3 layer neural network with 200 lines of code.

    I have one question though, for the back propagation part, why is that error of (layer j, neutron i) = summation of (weight_k * error_j) * transfer_derivative(output)? Can you explain a little bit on mathematics? I know the error is derivatives of cost function, but how do you know the connection between error of current layer and error of next layer?

    • Avatar
      Jason Brownlee April 13, 2020 at 6:11 am #


      Sorry, I don’t dive into the theory, I recommend a good textbook like the 2016 “deep learning” or 1999 “neuralsmithing”.

  234. Avatar
    Quang Huy Chu April 12, 2020 at 11:51 pm #

    Hi, first of all, thank you for posting this, it helps me very much in my Master’s research. But at my research, 4 output is required and at try to put my dataset, which is 48 samples of 16 inputs node and 4 output.
    My question is:
    As my research, with my small size dataset, choosing k = dataset size (48) is needed, apply k = 48, l_rate = 0.3, n_hidden = 42. But according to the result, the prediction is always repeated [0,0,0,0,0,0] ; [2,2,2,2,2,2] ; [0,0,0,0,0,0] ; [3,3,3,3,3,3], with different k also give the same prediction result (0,2,0,3). Can you figure it out why my NN give that strange result ?

    Thank you very much.

    • Avatar
      Quang Huy Chu April 12, 2020 at 11:58 pm #

      For example, here is my result:
      Predicted: [0, 0, 0, 0, 0, 0]
      Actual: [0, 1, 0, 3, 0, 2]
      Predicted: [2, 2, 2, 2, 2, 2]
      Actual: [2, 1, 2, 3, 0, 1]
      Predicted: [0, 0, 0, 0, 0, 0]
      Actual: [0, 2, 1, 0, 1, 3]
      Predicted: [3, 3, 3, 3, 3, 3]
      Actual: [1, 1, 3, 2, 0, 3]
      Predicted: [0, 0, 0, 0, 0, 0]
      Actual: [0, 0, 0, 1, 1, 0]
      Predicted: [2, 2, 2, 2, 2, 2]
      Actual: [2, 3, 1, 2, 0, 2]
      Predicted: [2, 2, 2, 2, 2, 2]
      Actual: [3, 2, 2, 1, 3, 3]
      Predicted: [1, 1, 1, 1, 1, 1]
      Actual: [3, 3, 1, 2, 3, 2]
      scores: [50.0, 33.33333333333333, 33.33333333333333, 33.33333333333333, 66.66666666666666, 50.0, 33.33333333333333, 16.666666666666664]
      Mean Accuracy: 39.583%

      Can you figure it out why I have this problem or the problem is my dataset is not good?

    • Avatar
      Jason Brownlee April 13, 2020 at 6:17 am #

      You’re welcome.

      Perhaps use the Keras API instead, it will be much easier for you:

      • Avatar
        Quang Huy Chu April 13, 2020 at 11:17 am #

        Hi Jason, thanks for the reply, after using your model and test with other datasets I found on the internet, your model works properly. Maybe it is my dataset is the problem.

    Bia April 14, 2020 at 5:14 am #

    Hi Jason!

    First, Thank you for this good tutorial, it really helps a lot.
  236. Avatar
    audrey April 14, 2020 at 5:48 am #

  237. Avatar
    Alex April 16, 2020 at 4:56 am #

    If the transfer derivative is = output * (1.0 – output), when calculating errors in a hidden layer, which has a bias = 1, doesn’t that mean the transfer derivative is always 1 * (1-1) = 0 for the bias node? Therefore the error of the bias node is always 0 because error = (weight_k * error_j) * transfer_derivative(output)?

    If that’s true, then the weights from the bias node never update because you multiply by the error.

    I assume I’m missing something. How do the weights from a bias node get updated (i.e. how is the error of the bias ever not 0)?

  238. Avatar
    Alex April 16, 2020 at 7:18 am #

    I guess I’m not so much interested in why at the moment, but that in your implementation, which as far as I can tell is correct (it matches logically identically with a C/C++ I came up with), there is no way to update the weights coming off the bias.

    This line [ neuron[‘delta’] = errors[j] * transfer_derivative(neuron[‘output’]) ] seems as if it will always result in neuron[‘delta’] == 0 for the bias. I noticed it my C++ implementation and when I went looking for answers, came across your post, and it looks like yours would result in 0 also.

    So I’m more interested in if you found that to be the case. If so, it can escape detection because the network will still learn, just not as well or as fast, so with toy data this will not be noticed.

    • Avatar
      Jason Brownlee April 16, 2020 at 1:20 pm #

      Interesting, thanks for sharing.

      I have not observed this issue, have you confirmed that indeed bias weights in the above implementation are unchanged after initialization?

      • Avatar
        Alex June 4, 2020 at 3:47 am #

        Sorry for the long delay in answering, I got distracted, but remembered to come back to this. I did look at the weights and you are correct, they do update. However, there is still something missing (admittedly, most likely in my understanding). You create a bias node by adding 1 weight vector per layer (weights from bias to next level neurons), but I do not see anywhere where the bias node activation it is explicitly set to 1. It seems that the activation of the bias node is treated like all other nodes, and is free to change value (I printed them during training and they are never 1.0). So the weights are also updating because bias does not equal 1.

        If the bias activation is forced to stay at 1 (which seems correct for the algorithm), the weights from the bias cannot update because in the transfer derivative: 1 * (1-1) = 0. I did try that as well, and it shows the weights do not update. If I’m correct, this seems like a very subtle flaw, which would be undetectable in any simple learning problem because the network will learn and predict with or without a bias (I tried that too, and it does work either way).

        With that said, I’m still not convinced I’m correct. I might still be missing something, but after several hours in the code, I can’t find any way the bias activation holds to 1.0. When it isn’t 1.0, the bias weights will update because the transfer derivative is non-zero, but that violates the role of the bias node. When the bias activation is forced to 1.0 which is the correct value for the bias, the weights do not update because 1 * (1-1) = 0. So I’m still confused, but still open to the possibility I’m just not understanding something about the code.

        • Avatar
          Jason Brownlee June 4, 2020 at 6:30 am #

          To understand the bias consider the forward and backward pass.

          For the forward pass, see the activate() function and notice that the bias activation (stored as the last element of the list) is added to the activation first. The same as 1*bias_weight.

          For the backward pass, the update_weights() function update the bias weight first, then the other weights.

          Perhaps re-read the text and code of the tutorial. This is all discussed.

  239. Avatar
    Maria Campero April 17, 2020 at 2:15 am #

    Hi Jason,
    First of all thank you very much for this post it is really helpful,
    At the moment I’m writing a code for neural network with backpropagation in Phyton. I have 8 inputs and 7 outputs with one hidden layer(1neuron). I scaled the dataset then tryed to used your code but I have error of alignment. Can you please give me some advice to fix that issue

    sameer sakkhari April 25, 2020 at 1:16 am #

    I have a dataset isolet in the form of csv file. I want to implement a Neural Network with backpropagation in python using tensorflow . How do I start ? How do I load my data?

    You said to save the dataset in csv format in current working directory. But it is not able to recognize isolet

  241. Avatar
    João Guilherme Cotta April 26, 2020 at 2:37 am #

    Hello Jason,

    Thanks for this tutorial, it is very helpful.

    I would like to modify this code to use MLP with BP to predict the velocity of a car based on different inputs, such as velocity, acceleration, pedal position, etc. I am having some difficulty adapting the ‘expected’ part of the code, since in your example you are using only zeroes and ones, and my study case would have different values of velocity given by the dataset.

    Do you have any advice regarding this?

  242. Avatar
    Sid April 26, 2020 at 1:30 pm #

  243. Avatar
    Q.H.Chu May 1, 2020 at 12:24 pm #

    Hi Jason, According to the post, Your Network is using only 1 Hidden layer (Maybe its called Shallow Feed forward NN) , is that hidden layer represent Logistic Regression step? And is there a way to add more hidden-layer, many hidden-layer will improve the accurate of netowk

    And one more question, How can I choose a fit parameter (epoch, learning rate or hidden layer neurons number) for this network ? does it depends on output and input neurons?

    Once again, thank you very much for posting this helpful post and looking for see your reply

    • Avatar
      Jason Brownlee May 1, 2020 at 2:04 pm #

      Yes, one hidden layer. No not logistic regression.

      Tune the parameters of your model to your data.

  244. Avatar
    Ahmed Gad May 7, 2020 at 9:25 am #


    Please help with adapting the code to include Upper and Lower weights for each neuron without biases ==> Rough Neural Network.


  245. Avatar
    Ahmed Gad May 7, 2020 at 9:49 am #

    The RNN structure replaces the traditional neuron by two neurons (lower neuron, upper neuron ) to represent lower and upper approximations of each attribute in the CTG data set, Its structure formed from 4 layers input, 2 hidden and output layers. The hidden layers have rough neurons which overlap and exchange information between each other, While the input and output layers consists of traditional neurons as in the figure(1):

    This image illustrates the idea more:

      Jason Brownlee May 7, 2020 at 11:51 am #

      Thanks for sharing.

      • Avatar
        Ahmed Gad May 8, 2020 at 10:44 am #

        Please i need help where and how to customize the code!

  246. Avatar
    John Pillar May 10, 2020 at 4:43 pm #

    HI Jason – thanks very much for a wonderfully clear and understandable description. I really appreciate your ‘gentle’ approach. I’ve re-coded your python into C – I find it’s the best way for me to really learn what’s going on.

    Please – I have a couple of question – to apply softmax to the output – it’s easy enough to map the outputs using softmax so that they are ‘probabilities’ that sum to one, but – what changes do I need to make to the transfer function derivative in the backpropagation code. I’ve read several descriptions that say that backpropagation of the output layer errors after softmax follows exactly the same as sigmoid – so I’m confused. I think it should be different, but I may be missing something.

    Also – cross-entropy loss is commonly described as a natural ‘partner’ to softmax, but actually, in practice, is the ‘error’ still (expected_value) – (predicted value) , just like you have in your code?

    Thanks very much if you have time to consider my question – much appreciated.

  247. Avatar
    Sandeep Kumar Dash May 29, 2020 at 9:41 pm #

  248. Avatar
    Andirian Ahmad May 30, 2020 at 5:19 pm #

  249. Avatar
    Anon June 9, 2020 at 6:59 pm #

  250. Avatar
    Zach June 11, 2020 at 5:13 am #

  251. Avatar
    Muhammad Basit Umair June 11, 2020 at 11:04 pm #

    Sir kindly guide me about difference between “multilayer feed-forward network” and deep neural network (DNN).
    Or can we say that, a multilayer feed-forward network is a deep neural network?


    • Avatar
      Jason Brownlee June 12, 2020 at 6:10 am #

      MLP can be made deep by adding many layers, so can a CNN, LSTM or any type of network.

  252. Avatar
    I am getting this error, please help

    File “F:/khaise/”, line 80, in activate
    activation += weights[i] * inputs[i]

    TypeError: can’t multiply sequence by non-int of type ‘float’

    navid July 7, 2020 at 2:20 am #

    Hello , i have question. this code only support 1 hidden layer?

  254. Avatar
    DavidHE July 24, 2020 at 1:00 pm #

    I’m trying to implement this tutorial in a language different than python, If during the training of the net the value of the varibale sum_error gets stuck or even goes up a little and down again, that means an error in the implementation?

    • Avatar
      Jason Brownlee July 24, 2020 at 1:38 pm #

      Perhaps run the same code with the same initial weights and compare the output of each step?

  255. Avatar
    How can I get a loss vs epoch graph for this code?

    • Avatar
      Jason Brownlee August 7, 2020 at 6:22 am #

      Yes, you will have to implement it yourself though and use a train/test split instead of k-fold cross-validation.

  256. Avatar
    Hai In Initialize Network function hidden_layer variable store three random weight ,but we are given one hidden layer is used one weight for feed forward another one weight backward remaining one which propose used ??. similarly output_layer got three weight i did not under can you explain

    • Avatar
      Jason Brownlee August 9, 2020 at 5:33 am #

      Sorry, I don’t follow. Can you please rephrase or elaborate on your question?

    arun August 8, 2020 at 8:09 pm #

    Hi i need predicted = algorithm(train_set, test_set, *args) behind this line operation ?

  258. Avatar
    do you know how to extract the “deltas” for each input and synaptic weight?

    • Avatar
      Jason Brownlee August 14, 2020 at 1:18 pm #

      Yes, the element added to each weight would be the deltas as you describe them.

      • Avatar
        Nate August 15, 2020 at 12:54 am #

        But how would you extract them in the code? I want to print the weight before and after the delta was added. What part of the code would you modify?

        • Avatar
          Jason Brownlee August 15, 2020 at 6:32 am #

          You could retrieve them from the part of the code that updates the model weights, in the backward_propagate_error function I guess.

    Niloo September 5, 2020 at 9:22 pm #

    I am interested in machine learning. I read your code at this website and now I am willing to add some features so I have a question. How can we add more layer to this neural network or in the other words how can we make the number of hidden layers flexible? Could you please explain me or send me a link to learn more?
    Thanks for your attention.

  260. Avatar
    Harsha September 25, 2020 at 8:04 am #

    Oh Legend!!

    As an aspiring ML engineer this is what all I needed. You will be remembered forever as the mentor who taught me ANN from scratch

  261. Avatar
    Jon October 15, 2020 at 3:31 am #

    I tried to convert this example to use ReLU by changing the transfer function to be:

    def transfer(activation):
    return 0.0 if activation <= 0.0 else activation

    and the transfer_derivative to be:

    def transfer_derivative(output):
    return 0.0 if output <= 0.0 else 1.0

    This seem to break the training system however and the error is never reduced.

    Any thoughts?

    • Avatar
      Jason Brownlee October 15, 2020 at 6:16 am #

      Perhaps try cross entropy loss.
      Perhaps try changing the model architecture.
      Perhaps try changing the learning hyperparameters.

      • Avatar
        Jon October 16, 2020 at 5:31 pm #

        Ok thanks Jason, sounds interesting, I’ll certainly take a look.

  262. Avatar
    Dinesh Kumar October 17, 2020 at 10:32 am #

    Hi Jason,

    Thanks for your help to understand the Back-props concepts with python. could you please help me how we will implement based computational graph


    JG October 31, 2020 at 10:19 pm #

    Hi Jason,

    I decided to follow this “old” tutorial, by the possibility of understand at low level the main AI’s functions such as network model definition (through a list of weights per neuron and layers on dictionary), how to activate manually the inputs and output of a neuron), network forward input propagation, but specially for me the core of the AI nets: the back error propagation), etc.

    Finally I decided to jump into high level Api Keras model, to wrapper of these detailed functions into a more integrated ones such as Model/Sequential, with their methods of .fit, .evaluate, .predict, and tools such as to_categorical, etc.Besides sklearn libraries for normalization, kfold, onehotencoding, etc.

    Of course I got better accuracies (97.5% as mean kfold) because I could used “relu”, activations functions, and output her layers types, etc…

    So one more time, thanks for this tutorial to have the chance to understand better the motor that it is running below tensorflow and specially under Keras High level structures…

    • Avatar
      Jason Brownlee November 1, 2020 at 7:30 am #

      Nice work!

      The tutorial really should be updated to use cross entropy and relu, e.g. modern ideas. I wrote this implementation like we used to in the 90s.

  264. Avatar
    JG October 31, 2020 at 10:52 pm #

    More particularly the “backpropagation” ML concept, I rather prefer to cal it more intuitively and personal as “distribution of output error between all the weights / biases of neurons of all layers of the model” .

    So “error’s distribution” between all errors contributors (model’s weights/biases) it is for me a better name and key idea than standard one of “backpropagation”…

    • Avatar
      Jason Brownlee November 1, 2020 at 7:31 am #

      Agreed. That is the key learning from this tutorial!

  265. Avatar
    Lia Jusmai Theresia November 7, 2020 at 5:19 pm #

    Hello, can you help me create a rainfall prediction code using the neural network in python?
  266. Avatar
    Pedro H November 9, 2020 at 5:31 am #

    Hi, is this code rigged for the 2:1:2 layout?

    If yes can you point me some good articles to better understand the back forward prop?

    Anyway, great work! It REALY helped me!

      Jason Brownlee November 9, 2020 at 6:16 am #

      Not really.

      You can adapt the architecture of the model directly.

  267. Avatar
    Chris Mahoney November 9, 2020 at 9:05 am #

    Hi Jason,

    I LOVED this article. It helped me immensely in learning about the intricacies of Neural Networks and Deep Learning in recently months. Thank you so much!

    I note here that you do a node-by-node method of implementation. But there is also another method using matrix multiplication and linear algebra.

    I’ve taken these concepts and processes, and written up a similar article. Except, I’ve used R and the matrix method. I’d love to know your thoughts on it:

    Chris M

  268. Avatar
    Joey Hung November 22, 2020 at 3:26 am #

    Hi Jason,

    Thanks for your code.
    However, when I was running it, it has below problem and I don’t know how to fix it. Could you help to fix it?

    Traceback (most recent call last):
    File “”, line 187, in
    scores = evaluate_algorithm(df, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
    File “”, line 59, in evaluate_algorithm
  269. Avatar
    Chiso Buso December 1, 2020 at 7:13 am #

    Thank you very much, I would be glad if I can get python code for ANN-Using BP for a regression problem. Like the inputs of 10 parameters and outputs of the continuous value of 5 parameters.

    • Avatar
      Jason Brownlee December 1, 2020 at 8:06 am #

      The above example can be adapted for your regression problem directly.

  270. Avatar
    Lokman Hakim December 5, 2020 at 2:49 am #

    Is it because it is related to bias?

  271. Avatar
    Sina Birecik December 21, 2020 at 9:17 am #

    Hello Jason,
    At first, many thanks for the tutorial, it has cleared a lot of things about DL in my mind.
    I would like to make a contribution. It includes the code that allows your network to get deeper. People here can upgrade your code to multi hidden layers, each neuron number in each hidden layer can be adjusted. Just follow the instructions:
    1) Copy the code in the link below and overwrite the whole “initialize_network” method.
    2) Replace any “n_hidden” term with “hidden_list”.
    3) At the end of the code, you can adjust hidden layers like the example below:
    hidden_list = [5, 3, 7]
    means there are 3 hidden layers, each hidden layer has 5, 3 and 7 neurons, respectively.

  272. Avatar
    Frank December 22, 2020 at 12:01 pm #

    Hello, I have a simple question, your data have 3 classes like 1-2-3 , for example if we have different number of classes and if they are string like YES NO , do we need to convert them to 0-1 or it does not matter . I checked the code for find the answer but i could not be able to find it. Please answer me .

  273. Avatar
    GreakBoy December 23, 2020 at 5:28 am #

    Hı, ıt was a really good article , thank you for doing such a good work.
    I have a question , in your csv data your all coloumns are integer and your classes are at the last coloum, numbers but what if
    • Avatar
      Jason Brownlee December 23, 2020 at 5:37 am #


      Perhaps scale the values prior to model and integer encode labels.

  274. Avatar
    Gloria January 22, 2021 at 8:38 pm #

    Thanks for this useful article!
    I want to change Sigmoid function into ReLU, so I modified the following 2 functions: (1) transfer(activation) and (2) transfer_derivative(output) as follows:

    (1) return 1.0 / (1.0 + exp(-activation)) => return max(activation,0)
    (2) return output * (1.0 – output) => return 1 if output>0 else 0

    However, the network isn’t learning (accuracy ~33.3% for each fold, even when I train more epoches). Did I get something wrong? Thanks in advance!

    • Avatar
      Jason Brownlee January 23, 2021 at 7:03 am #

      You’re welcome.

      Nice work!

      Perhaps check you didn’t change the activation function in the output layer!
      Perhaps change the loss to cross entropy?
      Perhaps change the architecture?
      Perhaps change the learning hyperparameters?

      • Avatar
        Gloria January 25, 2021 at 7:18 pm #

        Hi Jason, thanks for the reply!
        I’ve checked the activations and weights, and found that it is the problem of ‘dying ReLU’. Some units (in this case the 3 output unit) always output 0 and cannot recover with further training.

        • Avatar
          Jason Brownlee January 26, 2021 at 5:50 am #

          Try an alternate weight initialization, like “he”.

          Try scaling inputs to the range 0-1.

  275. Avatar
    Unnikrishnan February 11, 2021 at 1:28 pm #

    Thanks Jason. Nice article.
  276. Avatar
    AbdulAhad February 14, 2021 at 1:00 am #

    Thanks for the great article. Still helping who want to know exactly what and how it happens from coding presepective.

  277. Avatar
    David February 15, 2021 at 6:45 am #

    Hi Jason,

    • Avatar
      Jason Brownlee February 15, 2021 at 8:11 am #

      You’re welcome David!

      Yes, the code works with Python 3.

  278. Avatar
    Lukasz March 12, 2021 at 9:25 pm #

    Hi Jason,

    Thank you for great article! I run the code on my own dataset where I was predicting three label classes. The algorithm gave me a great results with the accuracy of the prediction above 97%. Right now I am trying to use the trained network to predict the results of the new dataset, where I would not provide any labels to calculate the accuracy. Do you have any recommendation for me? Thank you!

    • Avatar
      Jason Brownlee March 13, 2021 at 5:31 am #

      You can remove the evaluation of the model, fit the model on all available data and call predict on new data.

  279. Avatar
    Hi Jason, why do we copy the row and set row[-1] to None?
    This is in the function evaluate_algorithm:

    for row in fold:
    row_copy = list(row)
    row_copy[-1] = None

    it seems like you could just do

    for row in fold:

    Thanks for your help!

      Jason Brownlee March 16, 2021 at 4:51 am #

      So that the expected output value is not available to the model.

      • Avatar
        vitor January 28, 2024 at 8:47 am #

        but why on inicialization we do:

        n_inputs = len(row) – 1


          James Carmichael January 29, 2024 at 7:05 am #

          Hi vitor…Please clarify the code portion you are referring to. Also, are you experiencing an error with the code provided? That will enable us to better guide you.

  280. Avatar
    mlhan March 27, 2021 at 2:58 am #

    Do u have any idea if I want the user enters the can I do it 🙁

      Jason Brownlee March 29, 2021 at 5:54 am #

      Yes, but this is a programming question, not a machine learning question.

  281. Avatar
    Ashwini April 16, 2021 at 5:18 am #


    I tried adding softmax activation function in the output layer for doing a multi-class classification.

    # Forward propagate input to a network output
    def forward_propagate(network, row):
    inputs = row
    #hidden layer
    layer = network[0]
    new_inputs = []
    activation = activate(neuron[‘weights’], inputs)
    neuron[‘output’] = transfer(activation)
    inputs = new_inputs
    #output layer
    layer = network[1]
    new_inputs = []
    for neuron in layer:
    activation = activate(neuron[‘weights’], inputs)
    neuron[‘output’] = softmax(activation)
    inputs = new_inputs
    return inputs

    But when I’m running this the accuracy of the model is falling drastically from 77% to 35%.
    Can you please suggest me why this is happening or any other additional changes should i do to maintain the accuracy

  282. Avatar
    redouane kassa April 21, 2021 at 11:34 pm #

    The derivative of sigmoid function should be f'(x)=f(x)*(1-f(x)) and not f'(x)=x(1-x). am I right?
    Thank you

    • Avatar
      Jason Brownlee April 22, 2021 at 5:41 am #

      Yes, that is what we use. output is not x, output is f(x), e.g. f(x)=output.

  283. Avatar
    Scores: [90.47619047619048, 92.85714285714286, 97.61904761904762, 92.85714285714286, 92.85714285714286]
    Mean Accuracy: 93.333%

  284. Avatar
    winter May 27, 2021 at 11:53 pm #

    Hello , Jason
    Really thank you for your effort and nice resource.

    can I get some idea to visualize error, and validation like keras ?

    It would be a great help if I get some ideas.

    • Avatar
      Jason Brownlee May 28, 2021 at 6:48 am #

      You’re welcome.

      Yes, you can create learning curves if you wish, perhaps use matplotlib to create the plots.

  285. Avatar
    sudip May 30, 2021 at 11:12 pm #

    Hello Jason,
    Really thank you for the nice tutorial. I am using your tutorial to train my time series data where train values are almost similar for all the classes. Your tutorial gives pretty nice result. i am trying to visualize error and accuracy like keras but could’s figure out.
    By chance is there any tutorial or sources so that I could visualize my training informatin?

    • Avatar
      Jason Brownlee May 31, 2021 at 5:49 am #

      You’re welcome.

  286. Avatar
    sudip May 31, 2021 at 5:09 pm #

    Hi Jason,,
    I tried to follow your tutorial to visualize error,
    Whenever I tried to plot it say that ‘ expected is not define’
    In my understanding ‘expected’ is defined in a function where we used backpropagate.

    Can I get some idea to solve this problem and plot error line?

    • Avatar
      Jason Brownlee June 1, 2021 at 5:29 am #

  287. Avatar
    sacin June 19, 2021 at 4:44 am #

    Had a doubt,

    in section 3.2 shouldn’t, error = (expected – output) * transfer_derivative(output)
    error = (output – expected) * transfer_derivative(output)

    was thinking if this is flipped then the weights in the hidden layers might increase instead of decrease and vice versa.

    was referring to by andrew NG.

  288. Avatar
    SUDIP LAUDARI June 20, 2021 at 10:57 pm #

    • Avatar
      Jason Brownlee June 21, 2021 at 5:38 am #

      You must adapt the example to not use cross-validation, but a train/test split instead.

      Then evaluate the model’s performance on a training set and validation set each epoch (iteration).

      Sorry, I don’t have the capacity to prepare an example for you.

  289. Avatar
    AIbird August 30, 2021 at 10:57 pm #

    Hi Jason,
    I am using your code for my project. Looks great. It was working perfectly before. I changed my datasets format where values look almost similar and they are in the the range of 0.05 to 0.12. Training CSV which contains all the values have around 300 rows and 120 columns.

    Now my error doesn’t converge to near 0 . It is always like 89 or 91.
    I plotted all n_folds accuracy with epoch, looks so fluctuation in accuracies.

    Is there any idea or suggestion to make error near to 0 so that I can expect high mean accuracy?

    Any idea or suggestion would be really appreciated.

    • Adrian Tam
      Adrian Tam September 1, 2021 at 8:13 am #

      Would you try to use a scaler at preprocessing stage?

  290. Avatar
    Hello Adrian
    Thank you for your reply,

    I tried by doing this technique:

    from sklearn.preprocessing import MinMaxScaler

    scalar = MinMaxScaler()
    normalized = scalar.fit_transform(dataset)

    Do you mean by this?

    • Adrian Tam
      Adrian Tam September 1, 2021 at 11:27 am #


      • Avatar
        AIbird September 6, 2021 at 8:24 pm #

        Finally solved the problem. I had issue in my code.

        I have one more question. The mean accuracy is for training set right?
        How can I calculate the accuracy in prediction?
        Can I get some help?

          Adrian Tam September 7, 2021 at 6:13 am #

          The accuracy_metric() function is to do this.

  291. Avatar
    Filip September 19, 2021 at 2:45 am #

    • Adrian Tam
      Adrian Tam September 19, 2021 at 6:10 am #

      You’re welcomed. Glad you like it.

  292. Avatar
    Franck September 22, 2021 at 8:32 am #

    I tried to follow it step-by-step and ended-up with 2 questions.

    Question 1 : updating model doesn’t seem to be so costly, is this because this is a toy program? Does tensorflow (for instance) do more tricky stuffs that make model update costly? On this toy example, this is not easy to understand why model update would be costly.

    If I get it correctly, here you say that mini batch is a tradeoff between SGD and Batch GD and that mini batch is more efficient because model update is done only after it has been evaluated (back propagated). My implementation of mini batch on this toy example would be

    ~/machinelearningmastery> git diff diff --git a/ b/
    index 7b78b89..9e7bcd0 100644
    --- a/
    +++ b/
    @@ -141,15 +141,22 @@ def update_weights(network, row, l_rate):
    neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j]
    neuron['weights'][-1] += l_rate * neuron['delta']

    +def make_batch(iterable, batch_size=1):
    + n = len(iterable)
    + for i in range(0, n, batch_size):
    + yield iterable[i:min(i + batch_size, n)]
    # Train a network for a fixed number of epochs
    def train_network(network, train, l_rate, n_epoch, n_outputs):
    for epoch in range(n_epoch):
    - for row in train:
    - outputs = forward_propagate(network, row)
    - expected = [0 for i in range(n_outputs)]
    - expected[row[-1]] = 1
    - backward_propagate_error(network, expected)
    - update_weights(network, row, l_rate)
    + for batch in make_batch(train, batch_size=32):
    + for row in batch: # First backpropagate
    + outputs = forward_propagate(network, row)
    + expected = [0 for i in range(n_outputs)]
    + expected[row[-1]] = 1
    + backward_propagate_error(network, expected)
    + for row in batch: # Then update model
    + update_weights(network, row, l_rate)

    Looks like the exact same cost, no ? Did I miss something ?

    Question 2: how, when, where the loss function is supposed to be computed in this toy example? For me sum_error (in the first version of train_network, only used for printing error but not for computation / gradient descent) is useless and is why it disappeared in the final version of train_network.

    For classification, I expected cross entropy to be computed as error for output layer this way

    machinelearningmastery> git diff diff --git a/ b/
    index 7b78b89..0cb0444 100644
    --- a/
    +++ b/
    @@ -3,7 +3,7 @@ from random import seed
    from random import randrange
    from random import random
    from csv import reader
    -from math import exp
    +from math import exp, log

    # Load a CSV file
    def load_csv(filename):
    @@ -111,6 +111,9 @@ def forward_propagate(network, row):
    def transfer_derivative(output):
    return output * (1.0 - output)

    +def cross_entropy(p, q, eps=1e-15):
    + return -sum([p[i]*log(q[i]+eps) for i in range(len(p))])
    # Backpropagate error and store in neurons
    def backward_propagate_error(network, expected):
    for i in reversed(range(len(network))):
    @@ -123,9 +126,8 @@ def backward_propagate_error(network, expected):
    error += (neuron['weights'][j] * neuron['delta'])
    - for j in range(len(layer)):
    - neuron = layer[j]
    - errors.append(expected[j] - neuron['output'])
    + output = [neuron['output'] for neuron in layer]
    + errors.append(cross_entropy(expected, output))
    for j in range(len(layer)):
    neuron = layer[j]
    neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])

    … But the code breaks and I am not sure to get why?!…


    • Adrian Tam
      Adrian Tam September 23, 2021 at 3:53 am #

      It is too long for me to read it at once but let me answer the first question here. The training is costly because (1) there are many perceptrons to update and (2) there are many data to evaluate. If you consider the simplest gradient descent algorithm, your metric is the MSE function, which involves the entire dataset. If we have M perceptrons and N dataset, there are M weights to train (or more if there are bias terms) and the total number of gradients you need to compute is MxN in each iteration. If your toy example is small in both M and N, you will not notice that is a problem.

  293. Avatar
    Franck October 3, 2021 at 6:37 pm #

    Adrian, thanks for the answer: make sense!

    I tried to use cross entropy as loss function this way :

    diff --git a/ b/
    index 7b78b89..96617d7 100644
    --- a/
    +++ b/
    @@ -111,6 +112,9 @@ def forward_propagate(network, row):
    def transfer_derivative(output):
    return output * (1.0 - output)

    +def cross_entropy(p, q, eps=1e-15):
    + return -sum([p[i]*log2(q[i]+eps) for i in range(len(p))])
    # Backpropagate error and store in neurons
    def backward_propagate_error(network, expected):
    for i in reversed(range(len(network))):
    @@ -124,8 +128,9 @@ def backward_propagate_error(network, expected):
    for j in range(len(layer)):
    - neuron = layer[j]
    - errors.append(expected[j] - neuron['output'])
    + neuron_onehot = [0. for neuron in layer]
    + neuron_onehot[j] = layer[j]['output']
    + errors.append(cross_entropy(expected, neuron_onehot))
    for j in range(len(layer)):
    neuron = layer[j]
    neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])

    But classification results are bad : I plotted errors (can’t attach png) and I guess it’s because I’am victim of “vanishing gradient”. If you have any clue or advice, I would be glad to know 😀


    • Adrian Tam
      Adrian Tam October 6, 2021 at 8:02 am #

      Maybe try with a different activation function. People found that it is the key to mitigate vanishing gradient, but not always works. However, the example here is not deep. The issue of vanishing gradient should not pronounce.

  294. Avatar
    Franck October 21, 2021 at 11:05 pm #

    Got it to work with a different activation function.

    At this point, I feel like there is a bug in the code from the post : backprop should start from output with ds/ds = 1 (as far as I understood it with s = score) which is not the case unless I am wrong.

    Am I wrong ?

    • Adrian Tam
      Adrian Tam October 22, 2021 at 4:13 am #

      I understand why you’re thinking like that but that’s meaningless because ds/ds is always 1. We are looking for more interesting subjects such as ds/dw. After all, you can’t change the score. You can only change the weights in the neural network. Hence we prefer to start with ds/dw

  295. Avatar
    SLC October 24, 2021 at 4:46 am #

    Thank you for your code Mr. Jason. However, I am not understanding when you are going to predict the class using the trained network, why and from where are you giving the weights?
    Thanks in advance.

    • Adrian Tam
      Adrian Tam October 27, 2021 at 1:44 am #

  296. Avatar
    SD November 8, 2021 at 2:37 pm #

    Hello everyone ,

    I am training a huge dataset which have more than 90 features.

    Can any one give me some idea to add PCA in the existing code?

    So that I could reduce my the dimension of my data and use only main features while training.

    Any help or suggestion would be really appreciated.


  297. Avatar
    Dyah wardani November 22, 2021 at 3:22 am #

    I have a question about the function str_column_to_int(). Why the outputs from 1, 2, 3 change into 2, 0, 1 after use that function?

    • Adrian Tam
      Adrian Tam November 23, 2021 at 1:19 pm #

      I think in this case, your “1”, “2”, “3” are strings and 2, 0, 1 are integers. That’s the result of encoding strings into integers.

  298. Avatar
    Logan January 11, 2022 at 3:46 am #

    Here’s just a small correction (I’m sorry for being particular.):

    In the “3.1. Transfer Derivative” section, you’ve written
    “Given an output value from a neuron, we need to calculate it’s slope.”

    It wouldn’t make sense to say
    “Given an output value from a neuron, we need to calculate it is slope.”
    (“it is” instead of “it’s”)

    Therefore, it should be
    “Given an output value from a neuron, we need to calculate its slope.”
    (“its” instead of “it’s”)

    • Avatar
      James Carmichael January 11, 2022 at 8:40 am #

  299. Avatar
    Durga January 13, 2022 at 8:32 pm #

    Hi, Mr. Jason thanks for your great tutorial. Can you make me clear about if and else condition in backpropagation error calculation? (if possible please explain these codes in detail)

    # Backpropagate error and store in neurons
    def backward_propagate_error(network, expected):
    for i in reversed(range(len(network))):
    layer = network[i]
    errors = list()
    #— (2) Error computed for the hidden layers: error = (weight_k * error_j) * transfer_derivative(output)
    if i != len(network)-1:
    for j in range(len(layer)):
    error = 0.0
    #— (A) error = Sum(delta * weight linked to this delta)
    # for each neuron[LAYER N+1] linked to this neuron[LAYER N] (current layer)
    for neuron in network[i + 1]:
    error += (neuron[‘weights’][j] * neuron[‘delta’])
    #— (1) Error computed for the last layer: error = (expected – output) * transfer_derivative(output)
    #— (A) Store the difference between expected and output for each output neuron in errors[]
    for j in range(len(layer)):
    neuron = layer[j]
    errors.append(expected[j] – neuron[‘output’])
    # — (B) Store the error signal in delta for each neuron
    for j in range(len(layer)):
    neuron = layer[j]
    neuron[‘delta’] = errors[j] * transfer_derivative(neuron[‘output’])

    # test backpropagation of error
    network = [[{‘output’: 0.7105668883115941, ‘weights’: [0.13436424411240122, 0.8474337369372327, 0.763774618976614]}],
    [{‘output’: 0.6213859615555266, ‘weights’: [0.2550690257394217, 0.49543508709194095]}, {‘output’: 0.6573693455986976, ‘weights’: [0.4494910647887381, 0.651592972722763]}]]
    expected = [0, 1]
    backward_propagate_error(network, expected)
    for layer in network:

    • Avatar
      James Carmichael February 21, 2022 at 2:19 pm #

      Hi Durga…Please narrow the content of your post down to a single question/comment so that I may better assist you.

  300. Avatar
    Rudra Sonkusare January 27, 2022 at 7:49 pm #

    Is there any possible way to give a string for input? The string I am trying to give as input is not a meaningful word, for example string = “zgg7AiPkY37Yvne” and I want to give two of these strings as input to the neural network, any idea how this can be achieved? The current method I use is to convert each character into its decimal code then normalize it in range 0, 1 and thus convert in into a vector of floats.

  301. Avatar
    Andrii February 13, 2022 at 8:51 am #

    I’ve got an issue recently. I’ve implemented back propagation using your approach in C++, however epoch loss doesn’t go done. It may go done with smaller learning rate and bigger number of epochs, but at some point loss goes up to some value again. What can be a potential issue to it? I’ve checked that forward pass and backward pass both work fine.

  302. Avatar
    rebot333 February 17, 2022 at 2:02 pm #

    Thank you so much this is a great lesson

    • Avatar
      You are very welcome! Thank you for the support!

  303. Avatar
    CEN April 3, 2022 at 7:04 pm #

  304. Avatar
    Shraddha April 4, 2022 at 8:03 pm #

    Thank you, James. The codes were very useful.
    I tried to implement the above codes in my system. It worked as expected.
    I was modifying the above code for the MNIST dataset by increasing the number of layers in the existing code.

    # Initialize a network
    def initialize_network(n_inputs, n_hidden, n_outputs):
    network = list()
    hidden_layer = [{‘weights’:[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
    output_layer = [{‘weights’:[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
    return network

    So, can I write this function as

    # Initialize a network
    def initialize_network(n_inputs, n_hidden, n_outputs):
    network = list()
    hidden_layer1 = [{‘weights’:[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
    hidden_layer2 = [{‘weights’:[random() for i in range(n_hidden + 1)]} for i in range(n_hidden)]
    output_layer = [{‘weights’:[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
    return network

    Is this the correct way to do it?

    And one more question, what about the weight update function, do I need to make their changes also?

    • Avatar
      Hi Shraddha…Although I have not executed your code listing, I see no apparent deficiencies. Please let us know what you are specifically trying to accomplish with your code modifications so that we can better assist you.

  305. Avatar
    Shraddha Naik April 5, 2022 at 8:00 pm #

    Thank you, James Sir. The above code is very useful.

    I tried to implement the above codes in my system. It worked as expected.
    But, when I changed the dataset to MNIST, I am getting only 10% accuracy after 1000 epochs. After using mini-batch SGD.

  306. Avatar
    Shraddha Naik April 5, 2022 at 8:09 pm #

    def train_network(network, train, l_rate, n_epoch, n_outputs, kval):
    for epoch in range(n_epoch):
    import random
    temp = random.choices(train,k=kval)
    for row in temp:
    outputs = forward_propagate(network, row)
    expected = [0 for i in range(n_outputs)]
    expected[row[-1]] = 1
    backward_propagate_error(network, expected)
    update_weights(network, row, l_rate)

    • Avatar
      James Carmichael April 6, 2022 at 8:42 am #

      Thank you for the feedback Shraddha!

  307. Avatar
    Rahul May 14, 2022 at 5:22 pm #

    Hello James,
    I am working on a project with 9 types of variables and 1 output data. I want to use ANN to get weightage for each variables. I have tried this but I got only an Error in output and the expected data did not find equivalent weights for each individual.
    Pl, help me.

    • Avatar
      James Carmichael May 15, 2022 at 10:57 am #

  308. Avatar
    nicolas May 18, 2022 at 10:12 pm #

    Hello James,
    I am working on a project with 20 entries. But when I change the data packet it gives me errors in the code.
    Please help me!
    the data that i tried to put
    [[1539.64, 1006.43, 1549539.885],
    [1537.79, 1004.97, 1545432.816],
    [1535.63, 1003.84, 1541526.819],
    [1533.79, 1002.87645, 1538201.87],
    [1531.65, 1001.80229, 1534410.477],
    [1530.26316, 1000.99, 1531778.121],
    [1528.75778, 1000.46, 1529461.009],
    [1527.07, 999.89813, 1526914.437],
    [1525.76684, 999.40577, 1524860.184],
    [1524.24165, 999.11715, 1522895.973],
    [1523.03339, 998.80306, 1521210.41],
    [1521.88455, 998.56537, 1519701.209],
    [1520.41, 998.26825, 1517777.03],
    [1519.46802, 998.13243, 1516630.307],
    [1518.08149, 997.87776, 1514859.757],
    [1516.89304, 997.7, 1513404.186],
    [1515.94228, 997.6, 1512304.019],
    [1514.99151, 997.48, 1511173.731],
    [1514.15959, 997.32, 1510101.642],
    [1513.24844, 997.1, 1508860.02],
    [1512.32, 996.97637, 1507747.304]]

    • Avatar
      Hi Nicolas…Please specify what errors you are encountering so that we may better assist you.

  309. Avatar
    nicolas May 19, 2022 at 8:31 am #

    Hi! James
    Thank you for your time
    This is the code that I used
    And I changed the data but I’m having some errors in it
    Could you please help me?


    from math import exp
    from random import seed
    from random import random

    # Initialize a network
    def initialize_network(n_inputs, n_hidden, n_outputs):
    network = list()
    hidden_layer = [{‘weights’: [random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
    output_layer = [{‘weights’: [random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
    return network

    # Calculate neuron activation for an input
    def activate(weights, inputs):
    activation = weights[-1]
    for i in range(len(weights) – 1):
    activation += weights[i] * inputs[i]
    return activation

    # Transfer neuron activation
    def transfer(activation):
    return 1.0 / (1.0 + exp(-activation))

    # Forward propagate input to a network output
    def forward_propagate(network, row):
    inputs = row
    for layer in network:
    new_inputs = []
    for neuron in layer:
    activation = activate(neuron[‘weights’], inputs)
    neuron[‘output’] = transfer(activation)
    inputs = new_inputs
    return inputs

    # Calculate the derivative of an neuron output
    def transfer_derivative(output):
    return output * (1.0 – output)

    # Backpropagate error and store in neurons
    def backward_propagate_error(network, expected):
    for i in reversed(range(len(network))):
    layer = network[i]
    errors = list()
    if i != len(network) – 1:
    for j in range(len(layer)):
    error = 0.0
    for neuron in network[i + 1]:
    error += (neuron[‘weights’][j] * neuron[‘delta’])
    for j in range(len(layer)):
    neuron = layer[j]
    errors.append(neuron[‘output’] – expected[j])
    for j in range(len(layer)):
    neuron = layer[j]
    neuron[‘delta’] = errors[j] * transfer_derivative(neuron[‘output’])

    # Update network weights with error
    def update_weights(network, row, l_rate):
    for i in range(len(network)):
    inputs = row[:-1]
    if i != 0:
    inputs = [neuron[‘output’] for neuron in network[i – 1]]
    for neuron in network[i]:
    for j in range(len(inputs)):
    neuron[‘weights’][j] -= l_rate * neuron[‘delta’] * inputs[j]
    neuron[‘weights’][-1] -= l_rate * neuron[‘delta’]

    # Train a network for a fixed number of epochs
    def train_network(network, train, l_rate, n_epoch, n_outputs):
    for epoch in range(n_epoch):
    sum_error = 0
    for row in train:
    outputs = forward_propagate(network, row)
    expected = [0 for i in range(n_outputs)]
    expected[row[-1]] = 1
    sum_error += sum([(expected[i] – outputs[i]) ** 2 for i in range(len(expected))])
    backward_propagate_error(network, expected)
    update_weights(network, row, l_rate)
    print(‘>epoch=%d, lrate=%.3f, error=%.3f’ % (epoch, l_rate, sum_error))

    # Test training backprop algorithm
    dataset = [[1539.64, 1006.43, 1549539.885],
    [1537.79, 1004.97, 1545432.816],
    [1535.63, 1003.84, 1541526.819],
    [1533.79, 1002.87645, 1538201.87],
    [1531.65, 1001.80229, 1534410.477],
    [1530.26316, 1000.99, 1531778.121],
    [1528.75778, 1000.46, 1529461.009],
    [1527.07, 999.89813, 1526914.437],
    [1525.76684, 999.40577, 1524860.184],
    [1524.24165, 999.11715, 1522895.973],
    [1523.03339, 998.80306, 1521210.41],
    [1521.88455, 998.56537, 1519701.209],
    [1520.41, 998.26825, 1517777.03],
    [1519.46802, 998.13243, 1516630.307],
    [1518.08149, 997.87776, 1514859.757],
    [1516.89304, 997.7, 1513404.186],
    [1515.94228, 997.6, 1512304.019],
    [1514.99151, 997.48, 1511173.731],
    [1514.15959, 997.32, 1510101.642],
    [1513.24844, 997.1, 1508860.02],
    [1512.32, 996.97637, 1507747.304]]
    n_inputs = len(dataset[0]) – 1
    n_outputs = len(set([row[-1] for row in dataset]))
    network = initialize_network(n_inputs, 2, n_outputs)
    train_network(network, dataset, 0.5, 20, n_outputs)
    for layer in network:


    Traceback (most recent call last):
    File “C:\Users\Coder\Downloads\MLP_v1_1\”, line 119, in
    train_network(network, dataset, 0.5, 20, n_outputs)
    File “C:\Users\Coder\Downloads\MLP_v1_1\”, line 86, in train_network
    expected[row[-1]] = 1
    TypeError: list indices must be integers or slices, not float

      James Carmichael May 20, 2022 at 11:26 pm #

      Hi Nicolas…I do not see any issues from your code listing, however there could be formatting issues related to your code environment that are not readily apparent. Can you try the code in Google Colab?

      • Avatar
        nicolas May 25, 2022 at 8:02 pm #

  310. Avatar
    nicolas May 19, 2022 at 10:47 pm #

    Hi! James
    Thank you for your time
    This is the code that I used
    And I changed the data but I’m having some errors in it
    Could you please help me?


    from math import exp
    from random import seed
    from random import random

    # Initialize a network
    def initialize_network(n_inputs, n_hidden, n_outputs):
    network = list()
    hidden_layer = [{‘weights’: [random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
    output_layer = [{‘weights’: [random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
    return network

    # Calculate neuron activation for an input
    def activate(weights, inputs):
    activation = weights[-1]
    for i in range(len(weights) – 1):
    activation += weights[i] * inputs[i]
    return activation

    # Transfer neuron activation
    def transfer(activation):
    return 1.0 / (1.0 + exp(-activation))

    # Forward propagate input to a network output
    def forward_propagate(network, row):
    inputs = row
    for layer in network:
    new_inputs = []
    for neuron in layer:
    activation = activate(neuron[‘weights’], inputs)
    neuron[‘output’] = transfer(activation)
    inputs = new_inputs
    return inputs

    # Calculate the derivative of an neuron output
    def transfer_derivative(output):
    return output * (1.0 – output)

    # Backpropagate error and store in neurons
    def backward_propagate_error(network, expected):
    for i in reversed(range(len(network))):
    layer = network[i]
    errors = list()
    if i != len(network) – 1:
    for j in range(len(layer)):
    error = 0.0
    for neuron in network[i + 1]:
    error += (neuron[‘weights’][j] * neuron[‘delta’])
    for j in range(len(layer)):
    neuron = layer[j]
    errors.append(neuron[‘output’] – expected[j])
    for j in range(len(layer)):
    neuron = layer[j]
    neuron[‘delta’] = errors[j] * transfer_derivative(neuron[‘output’])

    # Update network weights with error
    def update_weights(network, row, l_rate):
    for i in range(len(network)):
    inputs = row[:-1]
    if i != 0:
    inputs = [neuron[‘output’] for neuron in network[i – 1]]
    for neuron in network[i]:
    for j in range(len(inputs)):
    neuron[‘weights’][j] -= l_rate * neuron[‘delta’] * inputs[j]
    neuron[‘weights’][-1] -= l_rate * neuron[‘delta’]

    # Train a network for a fixed number of epochs
    def train_network(network, train, l_rate, n_epoch, n_outputs):
    for epoch in range(n_epoch):
    sum_error = 0
    for row in train:
    outputs = forward_propagate(network, row)
    expected = [0 for i in range(n_outputs)]
    expected[row[-1]] = 1
    sum_error += sum([(expected[i] – outputs[i]) ** 2 for i in range(len(expected))])
    backward_propagate_error(network, expected)
    update_weights(network, row, l_rate)
    print(‘>epoch=%d, lrate=%.3f, error=%.3f’ % (epoch, l_rate, sum_error))

    # Test training backprop algorithm
    dataset = [[1539.64, 1006.43, 1549539.885],
    [1537.79, 1004.97, 1545432.816],
    [1535.63, 1003.84, 1541526.819],
    [1533.79, 1002.87645, 1538201.87],
    [1531.65, 1001.80229, 1534410.477],
    [1530.26316, 1000.99, 1531778.121],
    [1528.75778, 1000.46, 1529461.009],
    [1527.07, 999.89813, 1526914.437],
    [1525.76684, 999.40577, 1524860.184],
    [1524.24165, 999.11715, 1522895.973],
    [1523.03339, 998.80306, 1521210.41],
    [1521.88455, 998.56537, 1519701.209],
    [1520.41, 998.26825, 1517777.03],
    [1519.46802, 998.13243, 1516630.307],
    [1518.08149, 997.87776, 1514859.757],
    [1516.89304, 997.7, 1513404.186],
    [1515.94228, 997.6, 1512304.019],
    [1514.99151, 997.48, 1511173.731],
    [1514.15959, 997.32, 1510101.642],
    [1513.24844, 997.1, 1508860.02],
    [1512.32, 996.97637, 1507747.304]]
    n_inputs = len(dataset[0]) – 1
    n_outputs = len(set([row[-1] for row in dataset]))
    network = initialize_network(n_inputs, 2, n_outputs)
    train_network(network, dataset, 0.5, 20, n_outputs)
    for layer in network:


    Traceback (most recent call last):
    File “C:\Users\Coder\Downloads\MLP_v1_1\”, line 119, in
    train_network(network, dataset, 0.5, 20, n_outputs)
    File “C:\Users\Coder\Downloads\MLP_v1_1\”, line 86, in train_network
    expected[row[-1]] = 1
  311. Avatar
    wafiq June 1, 2022 at 5:12 pm #

  312. Avatar
    NOOR AMIRAH June 2, 2022 at 10:29 am #

  313. Avatar
    Noor Amirah June 2, 2022 at 10:30 am #

  314. Avatar
    Noor Amirah June 2, 2022 at 10:32 am #

    I meant did you did you have coding for train dataset that uses fletcher-reeves method?

    • Avatar
      James Carmichael June 3, 2022 at 9:14 am #

      Hi Noor…Did you try to implement the code listings that were provided in the tutorial?

      • Avatar
        Noor Amirah June 8, 2022 at 2:26 am #

        Yes, i do but my project is about backpropagation with fletcher-reeves not you have the coding for that?

    Eduardo M August 2, 2022 at 6:57 am #

    This line in the last for loop in the backpropagation function:

    neuron[‘delta’] = errors[j] * transfer_derivative(neuron[‘output’])

  316. Avatar
    Confused Coder August 5, 2022 at 3:33 am #

    Why do you add an extra weight to the hidden and output layers?

    • Avatar
      James Carmichael August 5, 2022 at 9:36 am #

  317. Avatar
    Amir Vahedi August 8, 2022 at 6:45 pm #

    I have a question:
    Why you haven’t used any python library such as NumPy and Pandas for this implementation?
    Why haven’t some nested loops been simplified with the vectorization technique?
    By doing these I bet the implementation would become more simple and also more efficient.
    If you didn’t these things on purpose, I am eager to know your reasons.

    Anyway, this post helped me a lot to understand the implementation behind the neural network, Thank you????

    • Avatar
      James Carmichael August 9, 2022 at 10:08 am #

  318. Avatar
    willow September 21, 2022 at 6:33 pm #

  319. Avatar
    Efemena January 13, 2023 at 11:46 pm #

  320. Avatar
    Efemena January 13, 2023 at 11:48 pm #

  321. Avatar
    Hapsoro March 21, 2023 at 10:33 am #

    Hi James. Thanks for the tutorial, I’m so appreciate…but there is one thing that confuses me, can you show for the output of 3 neurons especially for the n_outputs part.

    n_outputs = len([row[-1] for row in dataset])

    and my dataset

    [[0.21,0.34, 0.65,0, 0 ,1],
    [0.55, 0.67, 0.19, 0, 1, 0],
    [0.77, 0.20, 0.31, 1, 0, 0]]

    Thanks before

    • Avatar
      Hi Hapsoro…Thank you for feedback! Trying to understand your question. Did you execute your code? If so, what were your results?

  322. Avatar
    Kentaro March 29, 2023 at 7:38 am #

    Please add indentation for things besides just the functions, as the lack of indentation makes the code very hard to read.

    • Avatar
      James Carmichael March 30, 2023 at 7:10 am #

  323. Avatar
    Bhaskar September 21, 2023 at 7:42 pm #

    I need perfect code for Feed Forward Neural Network In r programming
  324. Avatar
    Jim October 27, 2023 at 4:21 am #

    I get the following error:

    derivative = output * (1 – output)
    TypeError: unsupported operand type(s) for -: ‘int’ and ‘list’

  325. Avatar
    Jim October 27, 2023 at 4:41 am #

    same with this code:

    error = (output – expected) * transfer_derivative(output)

  326. Avatar
    Michael Roy Ames October 29, 2023 at 10:54 am #

    Thank you very much for this well written tutorial, Jason. I quite enjoyed figuring it all out, though it took we a couple of weeks to get up-to-speed on the terminology and make it happen.

    After completing the basic assignment, I updated the code and tried:
    a) different seeds, learning rates, and epochs
    b) additional transfer functions: tanh, and gaussian
    c) multiple hidden layers
    d) multiple hidden layers of different sizes (numbers of neurons)

    One thing that got me stuck was the lack of a good visualization tool for viewing the network of layers as they are initialized and trained. I coded a primitive one to troubleshoot and improve my understanding, but there must be something better out there… any suggestions?

    Now I am looking forward to reading more of your (many!) books – and learning as much as I can.

  327. Avatar
    givonz November 8, 2023 at 5:55 am #

  328. Avatar
    givonz November 8, 2023 at 5:58 am #

    You should make a note that they need to toggle to code to get the properly formatted python code,.

    • Avatar
      James Carmichael November 8, 2023 at 10:01 am #

  329. Avatar
    Matthew December 1, 2023 at 12:20 pm #

Leave a Reply