How to Code a Neural Network with Backpropagation In Python (from scratch)

By Jason Brownlee on October 22, 2021 in Code Algorithms From Scratch 837

The backpropagation algorithm is used in the classical feed-forward artificial neural network.

It is the technique still used to train large deep learning networks.

In this tutorial, you will discover how to implement the backpropagation algorithm for a neural network from scratch with Python.

After completing this tutorial, you will know:

How to forward-propagate an input to calculate an output.
How to back-propagate error and train a network.
How to apply the backpropagation algorithm to a real-world predictive modeling problem.

Kick-start your project with my new book Machine Learning Algorithms From Scratch, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Update Nov/2016: Fixed a bug in the activate() function. Thanks Alex!
Update Jan/2017: Fixes issues with Python 3.
Update Jan/2017: Updated small bug in update_weights(). Thanks Tomasz!
Update Apr/2018: Added direct link to CSV dataset.
Update Aug/2018: Tested and updated to work with Python 3.6.
Update Sep/2019: Updated wheat-seeds.csv to fix formatting issues.
Update Oct/2021: Reverse the sign of error to be consistent with other literature.

How to Implement the Backpropagation Algorithm From Scratch In Python
Photo by NICHD, some rights reserved.

Description

This section provides a brief introduction to the Backpropagation Algorithm and the Wheat Seeds dataset that we will be using in this tutorial.

Backpropagation Algorithm

The Backpropagation algorithm is a supervised learning method for multilayer feed-forward networks from the field of Artificial Neural Networks.

Feed-forward neural networks are inspired by the information processing of one or more neural cells, called a neuron. A neuron accepts input signals via its dendrites, which pass the electrical signal down to the cell body. The axon carries the signal out to synapses, which are the connections of a cell’s axon to other cell’s dendrites.

The principle of the backpropagation approach is to model a given function by modifying internal weightings of input signals to produce an expected output signal. The system is trained using a supervised learning method, where the error between the system’s output and a known expected output is presented to the system and used to modify its internal state.

Technically, the backpropagation algorithm is a method for training the weights in a multilayer feed-forward neural network. As such, it requires a network structure to be defined of one or more layers where one layer is fully connected to the next layer. A standard network structure is one input layer, one hidden layer, and one output layer.

Backpropagation can be used for both classification and regression problems, but we will focus on classification in this tutorial.

In classification problems, best results are achieved when the network has one neuron in the output layer for each class value. For example, a 2-class or binary classification problem with the class values of A and B. These expected outputs would have to be transformed into binary vectors with one column for each class value. Such as [1, 0] and [0, 1] for A and B respectively. This is called a one hot encoding.

Wheat Seeds Dataset

The seeds dataset involves the prediction of species given measurements seeds from different varieties of wheat.

There are 201 records and 7 numerical input variables. It is a classification problem with 3 output classes. The scale for each numeric input value vary, so some data normalization may be required for use with algorithms that weight inputs like the backpropagation algorithm.

Below is a sample of the first 5 rows of the dataset.

15.26,14.84,0.871,5.763,3.312,2.221,5.22,1
14.88,14.57,0.8811,5.554,3.333,1.018,4.956,1
14.29,14.09,0.905,5.291,3.337,2.699,4.825,1
13.84,13.94,0.8955,5.324,3.379,2.259,4.805,1
16.14,14.99,0.9034,5.658,3.562,1.355,5.175,1

15.26,14.84,0.871,5.763,3.312,2.221,5.22,1

14.88,14.57,0.8811,5.554,3.333,1.018,4.956,1

14.29,14.09,0.905,5.291,3.337,2.699,4.825,1

13.84,13.94,0.8955,5.324,3.379,2.259,4.805,1

16.14,14.99,0.9034,5.658,3.562,1.355,5.175,1

Using the Zero Rule algorithm that predicts the most common class value, the baseline accuracy for the problem is 28.095%.

You can learn more and download the seeds dataset from the UCI Machine Learning Repository.

Download the seeds dataset and place it into your current working directory with the filename seeds_dataset.csv.

The dataset is in tab-separated format, so you must convert it to CSV using a text editor or a spreadsheet program.

Update, download the dataset in CSV format directly:

Download Wheat Seeds Dataset

Tutorial

This tutorial is broken down into 6 parts:

Initialize Network.
Forward Propagate.
Back Propagate Error.
Train Network.
Predict.
Seeds Dataset Case Study.

These steps will provide the foundation that you need to implement the backpropagation algorithm from scratch and apply it to your own predictive modeling problems.

1. Initialize Network

Let’s start with something easy, the creation of a new network ready for training.

Each neuron has a set of weights that need to be maintained. One weight for each input connection and an additional weight for the bias. We will need to store additional properties for a neuron during training, therefore we will use a dictionary to represent each neuron and store properties by names such as ‘weights‘ for the weights.

A network is organized into layers. The input layer is really just a row from our training dataset. The first real layer is the hidden layer. This is followed by the output layer that has one neuron for each class value.

We will organize layers as arrays of dictionaries and treat the whole network as an array of layers.

It is good practice to initialize the network weights to small random numbers. In this case, will we use random numbers in the range of 0 to 1.

Below is a function named initialize_network() that creates a new neural network ready for training. It accepts three parameters, the number of inputs, the number of neurons to have in the hidden layer and the number of outputs.

You can see that for the hidden layer we create n_hidden neurons and each neuron in the hidden layer has n_inputs + 1 weights, one for each input column in a dataset and an additional one for the bias.

You can also see that the output layer that connects to the hidden layer has n_outputs neurons, each with n_hidden + 1 weights. This means that each neuron in the output layer connects to (has a weight for) each neuron in the hidden layer.

# Initialize a network
def initialize_network(n_inputs, n_hidden, n_outputs):
	network = list()
	hidden_layer = [{'weights':[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
	network.append(hidden_layer)
	output_layer = [{'weights':[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
	network.append(output_layer)
	return network

# Initialize a network

def initialize_network(n_inputs, n_hidden, n_outputs):

network = list()

hidden_layer = [{'weights':[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]

network.append(hidden_layer)

output_layer = [{'weights':[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]

network.append(output_layer)

return network

Let’s test out this function. Below is a complete example that creates a small network.

from random import seed
from random import random

# Initialize a network
def initialize_network(n_inputs, n_hidden, n_outputs):
	network = list()
	hidden_layer = [{'weights':[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
	network.append(hidden_layer)
	output_layer = [{'weights':[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
	network.append(output_layer)
	return network

seed(1)
network = initialize_network(2, 1, 2)
for layer in network:
	print(layer)

from random import seed

from random import random

# Initialize a network

def initialize_network(n_inputs, n_hidden, n_outputs):

network = list()

hidden_layer = [{'weights':[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]

network.append(hidden_layer)

output_layer = [{'weights':[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]

network.append(output_layer)

return network

seed(1)

network = initialize_network(2, 1, 2)

for layer in network:

print(layer)

Running the example, you can see that the code prints out each layer one by one. You can see the hidden layer has one neuron with 2 input weights plus the bias. The output layer has 2 neurons, each with 1 weight plus the bias.

[{'weights': [0.13436424411240122, 0.8474337369372327, 0.763774618976614]}]
[{'weights': [0.2550690257394217, 0.49543508709194095]}, {'weights': [0.4494910647887381, 0.651592972722763]}]

1 2	[{'weights': [0.13436424411240122, 0.8474337369372327, 0.763774618976614]}] [{'weights': [0.2550690257394217, 0.49543508709194095]}, {'weights': [0.4494910647887381, 0.651592972722763]}]

Now that we know how to create and initialize a network, let’s see how we can use it to calculate an output.

2. Forward Propagate

We can calculate an output from a neural network by propagating an input signal through each layer until the output layer outputs its values.

We call this forward-propagation.

It is the technique we will need to generate predictions during training that will need to be corrected, and it is the method we will need after the network is trained to make predictions on new data.

We can break forward propagation down into three parts:

Neuron Activation.
Neuron Transfer.
Forward Propagation.

2.1. Neuron Activation

The first step is to calculate the activation of one neuron given an input.

The input could be a row from our training dataset, as in the case of the hidden layer. It may also be the outputs from each neuron in the hidden layer, in the case of the output layer.

Neuron activation is calculated as the weighted sum of the inputs. Much like linear regression.

activation = sum(weight_i * input_i) + bias

1	activation = sum(weight_i * input_i) + bias

Where weight is a network weight, input is an input, i is the index of a weight or an input and bias is a special weight that has no input to multiply with (or you can think of the input as always being 1.0).

Below is an implementation of this in a function named activate(). You can see that the function assumes that the bias is the last weight in the list of weights. This helps here and later to make the code easier to read.

# Calculate neuron activation for an input
def activate(weights, inputs):
	activation = weights[-1]
	for i in range(len(weights)-1):
		activation += weights[i] * inputs[i]
	return activation

# Calculate neuron activation for an input

def activate(weights, inputs):

activation = weights[-1]

for i in range(len(weights)-1):

activation += weights[i] * inputs[i]

return activation

Now, let’s see how to use the neuron activation.

2.2. Neuron Transfer

Once a neuron is activated, we need to transfer the activation to see what the neuron output actually is.

Different transfer functions can be used. It is traditional to use the sigmoid activation function, but you can also use the tanh (hyperbolic tangent) function to transfer outputs. More recently, the rectifier transfer function has been popular with large deep learning networks.

The sigmoid activation function looks like an S shape, it’s also called the logistic function. It can take any input value and produce a number between 0 and 1 on an S-curve. It is also a function of which we can easily calculate the derivative (slope) that we will need later when backpropagating error.

We can transfer an activation function using the sigmoid function as follows:

output = 1 / (1 + e^(-activation))

1	output = 1 / (1 + e^(-activation))

Where e is the base of the natural logarithms (Euler’s number).

Below is a function named transfer() that implements the sigmoid equation.

# Transfer neuron activation
def transfer(activation):
	return 1.0 / (1.0 + exp(-activation))

# Transfer neuron activation

def transfer(activation):

return 1.0 / (1.0 + exp(-activation))

Now that we have the pieces, let’s see how they are used.

2.3. Forward Propagation

Forward propagating an input is straightforward.

We work through each layer of our network calculating the outputs for each neuron. All of the outputs from one layer become inputs to the neurons on the next layer.

Below is a function named forward_propagate() that implements the forward propagation for a row of data from our dataset with our neural network.

You can see that a neuron’s output value is stored in the neuron with the name ‘output‘. You can also see that we collect the outputs for a layer in an array named new_inputs that becomes the array inputs and is used as inputs for the following layer.

The function returns the outputs from the last layer also called the output layer.

# Forward propagate input to a network output
def forward_propagate(network, row):
	inputs = row
	for layer in network:
		new_inputs = []
		for neuron in layer:
			activation = activate(neuron['weights'], inputs)
			neuron['output'] = transfer(activation)
			new_inputs.append(neuron['output'])
		inputs = new_inputs
	return inputs

# Forward propagate input to a network output

def forward_propagate(network, row):

inputs = row

for layer in network:

new_inputs = []

for neuron in layer:

activation = activate(neuron['weights'], inputs)

neuron['output'] = transfer(activation)

new_inputs.append(neuron['output'])

inputs = new_inputs

return inputs

Let’s put all of these pieces together and test out the forward propagation of our network.

We define our network inline with one hidden neuron that expects 2 input values and an output layer with two neurons.

from math import exp

# Calculate neuron activation for an input
def activate(weights, inputs):
	activation = weights[-1]
	for i in range(len(weights)-1):
		activation += weights[i] * inputs[i]
	return activation

# Transfer neuron activation
def transfer(activation):
	return 1.0 / (1.0 + exp(-activation))

# Forward propagate input to a network output
def forward_propagate(network, row):
	inputs = row
	for layer in network:
		new_inputs = []
		for neuron in layer:
			activation = activate(neuron['weights'], inputs)
			neuron['output'] = transfer(activation)
			new_inputs.append(neuron['output'])
		inputs = new_inputs
	return inputs

# test forward propagation
network = [[{'weights': [0.13436424411240122, 0.8474337369372327, 0.763774618976614]}],
		[{'weights': [0.2550690257394217, 0.49543508709194095]}, {'weights': [0.4494910647887381, 0.651592972722763]}]]
row = [1, 0, None]
output = forward_propagate(network, row)
print(output)

from math import exp

# Calculate neuron activation for an input

def activate(weights, inputs):

activation = weights[-1]

for i in range(len(weights)-1):

activation += weights[i] * inputs[i]

return activation

# Transfer neuron activation

def transfer(activation):

return 1.0 / (1.0 + exp(-activation))

# Forward propagate input to a network output

def forward_propagate(network, row):

inputs = row

for layer in network:

new_inputs = []

for neuron in layer:

activation = activate(neuron['weights'], inputs)

neuron['output'] = transfer(activation)

new_inputs.append(neuron['output'])

inputs = new_inputs

return inputs

# test forward propagation

network = [[{'weights': [0.13436424411240122, 0.8474337369372327, 0.763774618976614]}],

[{'weights': [0.2550690257394217, 0.49543508709194095]}, {'weights': [0.4494910647887381, 0.651592972722763]}]]

row = [1, 0, None]

output = forward_propagate(network, row)

print(output)

Running the example propagates the input pattern [1, 0] and produces an output value that is printed. Because the output layer has two neurons, we get a list of two numbers as output.

The actual output values are just nonsense for now, but next, we will start to learn how to make the weights in the neurons more useful.

[0.6629970129852887, 0.7253160725279748]

1	[0.6629970129852887, 0.7253160725279748]

3. Back Propagate Error

The backpropagation algorithm is named for the way in which weights are trained.

Error is calculated between the expected outputs and the outputs forward propagated from the network. These errors are then propagated backward through the network from the output layer to the hidden layer, assigning blame for the error and updating weights as they go.

The math for backpropagating error is rooted in calculus, but we will remain high level in this section and focus on what is calculated and how rather than why the calculations take this particular form.

This part is broken down into two sections.

Transfer Derivative.
Error Backpropagation.

3.1. Transfer Derivative

Given an output value from a neuron, we need to calculate it’s slope.

We are using the sigmoid transfer function, the derivative of which can be calculated as follows:

derivative = output * (1.0 - output)

1	derivative = output * (1.0 - output)

Below is a function named transfer_derivative() that implements this equation.

# Calculate the derivative of an neuron output
def transfer_derivative(output):
	return output * (1.0 - output)

# Calculate the derivative of an neuron output

def transfer_derivative(output):

return output * (1.0 - output)

Now, let’s see how this can be used.

3.2. Error Backpropagation

The first step is to calculate the error for each output neuron, this will give us our error signal (input) to propagate backwards through the network.

The error for a given neuron can be calculated as follows:

error = (output - expected) * transfer_derivative(output)

1	error = (output - expected) * transfer_derivative(output)

Where expected is the expected output value for the neuron, output is the output value for the neuron and transfer_derivative() calculates the slope of the neuron’s output value, as shown above.

This error calculation is used for neurons in the output layer. The expected value is the class value itself. In the hidden layer, things are a little more complicated.

The error signal for a neuron in the hidden layer is calculated as the weighted error of each neuron in the output layer. Think of the error traveling back along the weights of the output layer to the neurons in the hidden layer.

The back-propagated error signal is accumulated and then used to determine the error for the neuron in the hidden layer, as follows:

error = (weight_k * error_j) * transfer_derivative(output)

1	error = (weight_k * error_j) * transfer_derivative(output)

Where error_j is the error signal from the jth neuron in the output layer, weight_k is the weight that connects the kth neuron to the current neuron and output is the output for the current neuron.

Below is a function named backward_propagate_error() that implements this procedure.

You can see that the error signal calculated for each neuron is stored with the name ‘delta’. You can see that the layers of the network are iterated in reverse order, starting at the output and working backwards. This ensures that the neurons in the output layer have ‘delta’ values calculated first that neurons in the hidden layer can use in the subsequent iteration. I chose the name ‘delta’ to reflect the change the error implies on the neuron (e.g. the weight delta).

You can see that the error signal for neurons in the hidden layer is accumulated from neurons in the output layer where the hidden neuron number j is also the index of the neuron’s weight in the output layer neuron[‘weights’][j].

# Backpropagate error and store in neurons
def backward_propagate_error(network, expected):
	for i in reversed(range(len(network))):
		layer = network[i]
		errors = list()
		if i != len(network)-1:
			for j in range(len(layer)):
				error = 0.0
				for neuron in network[i + 1]:
					error += (neuron['weights'][j] * neuron['delta'])
				errors.append(error)
		else:
			for j in range(len(layer)):
				neuron = layer[j]
				errors.append(neuron['output'] - expected[j])
		for j in range(len(layer)):
			neuron = layer[j]
			neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])

# Backpropagate error and store in neurons

def backward_propagate_error(network, expected):

for i in reversed(range(len(network))):

layer = network[i]

errors = list()

if i != len(network)-1:

for j in range(len(layer)):

error = 0.0

for neuron in network[i + 1]:

error += (neuron['weights'][j] * neuron['delta'])

errors.append(error)

else:

for j in range(len(layer)):

neuron = layer[j]

errors.append(neuron['output'] - expected[j])

for j in range(len(layer)):

neuron = layer[j]

neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])

Let’s put all of the pieces together and see how it works.

We define a fixed neural network with output values and backpropagate an expected output pattern. The complete example is listed below.

# Calculate the derivative of an neuron output
def transfer_derivative(output):
	return output * (1.0 - output)

# Backpropagate error and store in neurons
def backward_propagate_error(network, expected):
	for i in reversed(range(len(network))):
		layer = network[i]
		errors = list()
		if i != len(network)-1:
			for j in range(len(layer)):
				error = 0.0
				for neuron in network[i + 1]:
					error += (neuron['weights'][j] * neuron['delta'])
				errors.append(error)
		else:
			for j in range(len(layer)):
				neuron = layer[j]
				errors.append(neuron['output'] - expected[j])
		for j in range(len(layer)):
			neuron = layer[j]
			neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])

# test backpropagation of error
network = [[{'output': 0.7105668883115941, 'weights': [0.13436424411240122, 0.8474337369372327, 0.763774618976614]}],
		[{'output': 0.6213859615555266, 'weights': [0.2550690257394217, 0.49543508709194095]}, {'output': 0.6573693455986976, 'weights': [0.4494910647887381, 0.651592972722763]}]]
expected = [0, 1]
backward_propagate_error(network, expected)
for layer in network:
	print(layer)

# Calculate the derivative of an neuron output

def transfer_derivative(output):

return output * (1.0 - output)

# Backpropagate error and store in neurons

def backward_propagate_error(network, expected):

for i in reversed(range(len(network))):

layer = network[i]

errors = list()

if i != len(network)-1:

for j in range(len(layer)):

error = 0.0

for neuron in network[i + 1]:

error += (neuron['weights'][j] * neuron['delta'])

errors.append(error)

else:

for j in range(len(layer)):

neuron = layer[j]

errors.append(neuron['output'] - expected[j])

for j in range(len(layer)):

neuron = layer[j]

neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])

# test backpropagation of error

network = [[{'output': 0.7105668883115941, 'weights': [0.13436424411240122, 0.8474337369372327, 0.763774618976614]}],

[{'output': 0.6213859615555266, 'weights': [0.2550690257394217, 0.49543508709194095]}, {'output': 0.6573693455986976, 'weights': [0.4494910647887381, 0.651592972722763]}]]

expected = [0, 1]

backward_propagate_error(network, expected)

for layer in network:

print(layer)

Running the example prints the network after the backpropagation of error is complete. You can see that error values are calculated and stored in the neurons for the output layer and the hidden layer.

[{'output': 0.7105668883115941, 'weights': [0.13436424411240122, 0.8474337369372327, 0.763774618976614], 'delta': 0.0005348048046610517}]
[{'output': 0.6213859615555266, 'weights': [0.2550690257394217, 0.49543508709194095], 'delta': 0.14619064683582808}, {'output': 0.6573693455986976, 'weights': [0.4494910647887381, 0.651592972722763], 'delta': -0.0771723774346327}]

[{'output': 0.7105668883115941, 'weights': [0.13436424411240122, 0.8474337369372327, 0.763774618976614], 'delta': 0.0005348048046610517}]

[{'output': 0.6213859615555266, 'weights': [0.2550690257394217, 0.49543508709194095], 'delta': 0.14619064683582808}, {'output': 0.6573693455986976, 'weights': [0.4494910647887381, 0.651592972722763], 'delta': -0.0771723774346327}]

Now let’s use the backpropagation of error to train the network.

4. Train Network

The network is trained using stochastic gradient descent.

This involves multiple iterations of exposing a training dataset to the network and for each row of data forward propagating the inputs, backpropagating the error and updating the network weights.

This part is broken down into two sections:

Update Weights.
Train Network.

4.1. Update Weights

Once errors are calculated for each neuron in the network via the back propagation method above, they can be used to update weights.

Network weights are updated as follows:

weight = weight - learning_rate * error * input

1	weight = weight - learning_rate * error * input

Where weight is a given weight, learning_rate is a parameter that you must specify, error is the error calculated by the backpropagation procedure for the neuron and input is the input value that caused the error.

The same procedure can be used for updating the bias weight, except there is no input term, or input is the fixed value of 1.0.

Learning rate controls how much to change the weight to correct for the error. For example, a value of 0.1 will update the weight 10% of the amount that it possibly could be updated. Small learning rates are preferred that cause slower learning over a large number of training iterations. This increases the likelihood of the network finding a good set of weights across all layers rather than the fastest set of weights that minimize error (called premature convergence).

Below is a function named update_weights() that updates the weights for a network given an input row of data, a learning rate and assume that a forward and backward propagation have already been performed.

Remember that the input for the output layer is a collection of outputs from the hidden layer.

# Update network weights with error
def update_weights(network, row, l_rate):
	for i in range(len(network)):
		inputs = row[:-1]
		if i != 0:
			inputs = [neuron['output'] for neuron in network[i - 1]]
		for neuron in network[i]:
			for j in range(len(inputs)):
				neuron['weights'][j] -= l_rate * neuron['delta'] * inputs[j]
			neuron['weights'][-1] -= l_rate * neuron['delta']

# Update network weights with error

def update_weights(network, row, l_rate):

for i in range(len(network)):

inputs = row[:-1]

if i != 0:

inputs = [neuron['output'] for neuron in network[i - 1]]

for neuron in network[i]:

for j in range(len(inputs)):

neuron['weights'][j] -= l_rate * neuron['delta'] * inputs[j]

neuron['weights'][-1] -= l_rate * neuron['delta']

Now we know how to update network weights, let’s see how we can do it repeatedly.

4.2. Train Network

As mentioned, the network is updated using stochastic gradient descent.

This involves first looping for a fixed number of epochs and within each epoch updating the network for each row in the training dataset.

Because updates are made for each training pattern, this type of learning is called online learning. If errors were accumulated across an epoch before updating the weights, this is called batch learning or batch gradient descent.

Below is a function that implements the training of an already initialized neural network with a given training dataset, learning rate, fixed number of epochs and an expected number of output values.

The expected number of output values is used to transform class values in the training data into a one hot encoding. That is a binary vector with one column for each class value to match the output of the network. This is required to calculate the error for the output layer.

You can also see that the sum squared error between the expected output and the network output is accumulated each epoch and printed. This is helpful to create a trace of how much the network is learning and improving each epoch.

# Train a network for a fixed number of epochs
def train_network(network, train, l_rate, n_epoch, n_outputs):
	for epoch in range(n_epoch):
		sum_error = 0
		for row in train:
			outputs = forward_propagate(network, row)
			expected = [0 for i in range(n_outputs)]
			expected[row[-1]] = 1
			sum_error += sum([(expected[i]-outputs[i])**2 for i in range(len(expected))])
			backward_propagate_error(network, expected)
			update_weights(network, row, l_rate)
		print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error))

# Train a network for a fixed number of epochs

def train_network(network, train, l_rate, n_epoch, n_outputs):

for epoch in range(n_epoch):

sum_error = 0

for row in train:

outputs = forward_propagate(network, row)

expected = [0 for i in range(n_outputs)]

expected[row[-1]] = 1

sum_error += sum([(expected[i]-outputs[i])**2 for i in range(len(expected))])

backward_propagate_error(network, expected)

update_weights(network, row, l_rate)

print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error))

We now have all of the pieces to train the network. We can put together an example that includes everything we’ve seen so far including network initialization and train a network on a small dataset.

Below is a small contrived dataset that we can use to test out training our neural network.

X1			X2			Y
2.7810836		2.550537003		0
1.465489372		2.362125076		0
3.396561688		4.400293529		0
1.38807019		1.850220317		0
3.06407232		3.005305973		0
7.627531214		2.759262235		1
5.332441248		2.088626775		1
6.922596716		1.77106367		1
8.675418651		-0.242068655		1
7.673756466		3.508563011		1

X1 X2 Y

2.7810836 2.550537003 0

1.465489372 2.362125076 0

3.396561688 4.400293529 0

1.38807019 1.850220317 0

3.06407232 3.005305973 0

7.627531214 2.759262235 1

5.332441248 2.088626775 1

6.922596716 1.77106367 1

8.675418651 -0.242068655 1

7.673756466 3.508563011 1

Below is the complete example. We will use 2 neurons in the hidden layer. It is a binary classification problem (2 classes) so there will be two neurons in the output layer. The network will be trained for 20 epochs with a learning rate of 0.5, which is high because we are training for so few iterations.

from math import exp
from random import seed
from random import random

# Initialize a network
def initialize_network(n_inputs, n_hidden, n_outputs):
	network = list()
	hidden_layer = [{'weights':[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
	network.append(hidden_layer)
	output_layer = [{'weights':[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
	network.append(output_layer)
	return network

# Calculate neuron activation for an input
def activate(weights, inputs):
	activation = weights[-1]
	for i in range(len(weights)-1):
		activation += weights[i] * inputs[i]
	return activation

# Transfer neuron activation
def transfer(activation):
	return 1.0 / (1.0 + exp(-activation))

# Forward propagate input to a network output
def forward_propagate(network, row):
	inputs = row
	for layer in network:
		new_inputs = []
		for neuron in layer:
			activation = activate(neuron['weights'], inputs)
			neuron['output'] = transfer(activation)
			new_inputs.append(neuron['output'])
		inputs = new_inputs
	return inputs

# Calculate the derivative of an neuron output
def transfer_derivative(output):
	return output * (1.0 - output)

# Backpropagate error and store in neurons
def backward_propagate_error(network, expected):
	for i in reversed(range(len(network))):
		layer = network[i]
		errors = list()
		if i != len(network)-1:
			for j in range(len(layer)):
				error = 0.0
				for neuron in network[i + 1]:
					error += (neuron['weights'][j] * neuron['delta'])
				errors.append(error)
		else:
			for j in range(len(layer)):
				neuron = layer[j]
				errors.append(neuron['output'] - expected[j])
		for j in range(len(layer)):
			neuron = layer[j]
			neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])

# Update network weights with error
def update_weights(network, row, l_rate):
	for i in range(len(network)):
		inputs = row[:-1]
		if i != 0:
			inputs = [neuron['output'] for neuron in network[i - 1]]
		for neuron in network[i]:
			for j in range(len(inputs)):
				neuron['weights'][j] -= l_rate * neuron['delta'] * inputs[j]
			neuron['weights'][-1] -= l_rate * neuron['delta']

# Train a network for a fixed number of epochs
def train_network(network, train, l_rate, n_epoch, n_outputs):
	for epoch in range(n_epoch):
		sum_error = 0
		for row in train:
			outputs = forward_propagate(network, row)
			expected = [0 for i in range(n_outputs)]
			expected[row[-1]] = 1
			sum_error += sum([(expected[i]-outputs[i])**2 for i in range(len(expected))])
			backward_propagate_error(network, expected)
			update_weights(network, row, l_rate)
		print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error))

# Test training backprop algorithm
seed(1)
dataset = [[2.7810836,2.550537003,0],
	[1.465489372,2.362125076,0],
	[3.396561688,4.400293529,0],
	[1.38807019,1.850220317,0],
	[3.06407232,3.005305973,0],
	[7.627531214,2.759262235,1],
	[5.332441248,2.088626775,1],
	[6.922596716,1.77106367,1],
	[8.675418651,-0.242068655,1],
	[7.673756466,3.508563011,1]]
n_inputs = len(dataset[0]) - 1
n_outputs = len(set([row[-1] for row in dataset]))
network = initialize_network(n_inputs, 2, n_outputs)
train_network(network, dataset, 0.5, 20, n_outputs)
for layer in network:
	print(layer)

100

101

from math import exp

from random import seed

from random import random

# Initialize a network

def initialize_network(n_inputs, n_hidden, n_outputs):

network = list()

hidden_layer = [{'weights':[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]

network.append(hidden_layer)

output_layer = [{'weights':[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]

network.append(output_layer)

return network

# Calculate neuron activation for an input

def activate(weights, inputs):

activation = weights[-1]

for i in range(len(weights)-1):

activation += weights[i] * inputs[i]

return activation

# Transfer neuron activation

def transfer(activation):

return 1.0 / (1.0 + exp(-activation))

# Forward propagate input to a network output

def forward_propagate(network, row):

inputs = row

for layer in network:

new_inputs = []

for neuron in layer:

activation = activate(neuron['weights'], inputs)

neuron['output'] = transfer(activation)

new_inputs.append(neuron['output'])

inputs = new_inputs

return inputs

# Calculate the derivative of an neuron output

def transfer_derivative(output):

return output * (1.0 - output)

# Backpropagate error and store in neurons

def backward_propagate_error(network, expected):

for i in reversed(range(len(network))):

layer = network[i]

errors = list()

if i != len(network)-1:

for j in range(len(layer)):

error = 0.0

for neuron in network[i + 1]:

error += (neuron['weights'][j] * neuron['delta'])

errors.append(error)

else:

for j in range(len(layer)):

neuron = layer[j]

errors.append(neuron['output'] - expected[j])

for j in range(len(layer)):

neuron = layer[j]

neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])

# Update network weights with error

def update_weights(network, row, l_rate):

for i in range(len(network)):

inputs = row[:-1]

if i != 0:

inputs = [neuron['output'] for neuron in network[i - 1]]

for neuron in network[i]:

for j in range(len(inputs)):

neuron['weights'][j] -= l_rate * neuron['delta'] * inputs[j]

neuron['weights'][-1] -= l_rate * neuron['delta']

# Train a network for a fixed number of epochs

def train_network(network, train, l_rate, n_epoch, n_outputs):

for epoch in range(n_epoch):

sum_error = 0

for row in train:

outputs = forward_propagate(network, row)

expected = [0 for i in range(n_outputs)]

expected[row[-1]] = 1

sum_error += sum([(expected[i]-outputs[i])**2 for i in range(len(expected))])

backward_propagate_error(network, expected)

update_weights(network, row, l_rate)

print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error))

# Test training backprop algorithm

seed(1)

dataset = [[2.7810836,2.550537003,0],

[1.465489372,2.362125076,0],

[3.396561688,4.400293529,0],

[1.38807019,1.850220317,0],

[3.06407232,3.005305973,0],

[7.627531214,2.759262235,1],

[5.332441248,2.088626775,1],

[6.922596716,1.77106367,1],

[8.675418651,-0.242068655,1],

[7.673756466,3.508563011,1]]

n_inputs = len(dataset[0]) - 1

n_outputs = len(set([row[-1] for row in dataset]))

network = initialize_network(n_inputs, 2, n_outputs)

train_network(network, dataset, 0.5, 20, n_outputs)

for layer in network:

print(layer)

Running the example first prints the sum squared error each training epoch. We can see a trend of this error decreasing with each epoch.

Once trained, the network is printed, showing the learned weights. Also still in the network are output and delta values that can be ignored. We could update our training function to delete these data if we wanted.

>epoch=0, lrate=0.500, error=6.350
>epoch=1, lrate=0.500, error=5.531
>epoch=2, lrate=0.500, error=5.221
>epoch=3, lrate=0.500, error=4.951
>epoch=4, lrate=0.500, error=4.519
>epoch=5, lrate=0.500, error=4.173
>epoch=6, lrate=0.500, error=3.835
>epoch=7, lrate=0.500, error=3.506
>epoch=8, lrate=0.500, error=3.192
>epoch=9, lrate=0.500, error=2.898
>epoch=10, lrate=0.500, error=2.626
>epoch=11, lrate=0.500, error=2.377
>epoch=12, lrate=0.500, error=2.153
>epoch=13, lrate=0.500, error=1.953
>epoch=14, lrate=0.500, error=1.774
>epoch=15, lrate=0.500, error=1.614
>epoch=16, lrate=0.500, error=1.472
>epoch=17, lrate=0.500, error=1.346
>epoch=18, lrate=0.500, error=1.233
>epoch=19, lrate=0.500, error=1.132
[{'weights': [-1.4688375095432327, 1.850887325439514, 1.0858178629550297], 'output': 0.029980305604426185, 'delta': 0.0059546604162323625}, {'weights': [0.37711098142462157, -0.0625909894552989, 0.2765123702642716], 'output': 0.9456229000211323, 'delta': -0.0026279652850863837}]
[{'weights': [2.515394649397849, -0.3391927502445985, -0.9671565426390275], 'output': 0.23648794202357587, 'delta': 0.04270059278364587}, {'weights': [-2.5584149848484263, 1.0036422106209202, 0.42383086467582715], 'output': 0.7790535202438367, 'delta': -0.03803132596437354}]

>epoch=0, lrate=0.500, error=6.350

>epoch=1, lrate=0.500, error=5.531

>epoch=2, lrate=0.500, error=5.221

>epoch=3, lrate=0.500, error=4.951

>epoch=4, lrate=0.500, error=4.519

>epoch=5, lrate=0.500, error=4.173

>epoch=6, lrate=0.500, error=3.835

>epoch=7, lrate=0.500, error=3.506

>epoch=8, lrate=0.500, error=3.192

>epoch=9, lrate=0.500, error=2.898

>epoch=10, lrate=0.500, error=2.626

>epoch=11, lrate=0.500, error=2.377

>epoch=12, lrate=0.500, error=2.153

>epoch=13, lrate=0.500, error=1.953

>epoch=14, lrate=0.500, error=1.774

>epoch=15, lrate=0.500, error=1.614

>epoch=16, lrate=0.500, error=1.472

>epoch=17, lrate=0.500, error=1.346

>epoch=18, lrate=0.500, error=1.233

>epoch=19, lrate=0.500, error=1.132

[{'weights': [-1.4688375095432327, 1.850887325439514, 1.0858178629550297], 'output': 0.029980305604426185, 'delta': 0.0059546604162323625}, {'weights': [0.37711098142462157, -0.0625909894552989, 0.2765123702642716], 'output': 0.9456229000211323, 'delta': -0.0026279652850863837}]

[{'weights': [2.515394649397849, -0.3391927502445985, -0.9671565426390275], 'output': 0.23648794202357587, 'delta': 0.04270059278364587}, {'weights': [-2.5584149848484263, 1.0036422106209202, 0.42383086467582715], 'output': 0.7790535202438367, 'delta': -0.03803132596437354}]

Once a network is trained, we need to use it to make predictions.

5. Predict

Making predictions with a trained neural network is easy enough.

We have already seen how to forward-propagate an input pattern to get an output. This is all we need to do to make a prediction. We can use the output values themselves directly as the probability of a pattern belonging to each output class.

It may be more useful to turn this output back into a crisp class prediction. We can do this by selecting the class value with the larger probability. This is also called the arg max function.

Below is a function named predict() that implements this procedure. It returns the index in the network output that has the largest probability. It assumes that class values have been converted to integers starting at 0.

# Make a prediction with a network
def predict(network, row):
	outputs = forward_propagate(network, row)
	return outputs.index(max(outputs))

# Make a prediction with a network

def predict(network, row):

outputs = forward_propagate(network, row)

return outputs.index(max(outputs))

We can put this together with our code above for forward propagating input and with our small contrived dataset to test making predictions with an already-trained network. The example hardcodes a network trained from the previous step.

The complete example is listed below.

from math import exp

# Calculate neuron activation for an input
def activate(weights, inputs):
	activation = weights[-1]
	for i in range(len(weights)-1):
		activation += weights[i] * inputs[i]
	return activation

# Transfer neuron activation
def transfer(activation):
	return 1.0 / (1.0 + exp(-activation))

# Forward propagate input to a network output
def forward_propagate(network, row):
	inputs = row
	for layer in network:
		new_inputs = []
		for neuron in layer:
			activation = activate(neuron['weights'], inputs)
			neuron['output'] = transfer(activation)
			new_inputs.append(neuron['output'])
		inputs = new_inputs
	return inputs

# Make a prediction with a network
def predict(network, row):
	outputs = forward_propagate(network, row)
	return outputs.index(max(outputs))

# Test making predictions with the network
dataset = [[2.7810836,2.550537003,0],
	[1.465489372,2.362125076,0],
	[3.396561688,4.400293529,0],
	[1.38807019,1.850220317,0],
	[3.06407232,3.005305973,0],
	[7.627531214,2.759262235,1],
	[5.332441248,2.088626775,1],
	[6.922596716,1.77106367,1],
	[8.675418651,-0.242068655,1],
	[7.673756466,3.508563011,1]]
network = [[{'weights': [-1.482313569067226, 1.8308790073202204, 1.078381922048799]}, {'weights': [0.23244990332399884, 0.3621998343835864, 0.40289821191094327]}],
	[{'weights': [2.5001872433501404, 0.7887233511355132, -1.1026649757805829]}, {'weights': [-2.429350576245497, 0.8357651039198697, 1.0699217181280656]}]]
for row in dataset:
	prediction = predict(network, row)
	print('Expected=%d, Got=%d' % (row[-1], prediction))

from math import exp

# Calculate neuron activation for an input

def activate(weights, inputs):

activation = weights[-1]

for i in range(len(weights)-1):

activation += weights[i] * inputs[i]

return activation

# Transfer neuron activation

def transfer(activation):

return 1.0 / (1.0 + exp(-activation))

# Forward propagate input to a network output

def forward_propagate(network, row):

inputs = row

for layer in network:

new_inputs = []

for neuron in layer:

activation = activate(neuron['weights'], inputs)

neuron['output'] = transfer(activation)

new_inputs.append(neuron['output'])

inputs = new_inputs

return inputs

# Make a prediction with a network

def predict(network, row):

outputs = forward_propagate(network, row)

return outputs.index(max(outputs))

# Test making predictions with the network

dataset = [[2.7810836,2.550537003,0],

[1.465489372,2.362125076,0],

[3.396561688,4.400293529,0],

[1.38807019,1.850220317,0],

[3.06407232,3.005305973,0],

[7.627531214,2.759262235,1],

[5.332441248,2.088626775,1],

[6.922596716,1.77106367,1],

[8.675418651,-0.242068655,1],

[7.673756466,3.508563011,1]]

network = [[{'weights': [-1.482313569067226, 1.8308790073202204, 1.078381922048799]}, {'weights': [0.23244990332399884, 0.3621998343835864, 0.40289821191094327]}],

[{'weights': [2.5001872433501404, 0.7887233511355132, -1.1026649757805829]}, {'weights': [-2.429350576245497, 0.8357651039198697, 1.0699217181280656]}]]

for row in dataset:

prediction = predict(network, row)

print('Expected=%d, Got=%d' % (row[-1], prediction))

Running the example prints the expected output for each record in the training dataset, followed by the crisp prediction made by the network.

It shows that the network achieves 100% accuracy on this small dataset.

Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1

Expected=0, Got=0

Expected=1, Got=1

Now we are ready to apply our backpropagation algorithm to a real world dataset.

6. Wheat Seeds Dataset

This section applies the Backpropagation algorithm to the wheat seeds dataset.

The first step is to load the dataset and convert the loaded data to numbers that we can use in our neural network. For this we will use the helper function load_csv() to load the file, str_column_to_float() to convert string numbers to floats and str_column_to_int() to convert the class column to integer values.

Input values vary in scale and need to be normalized to the range of 0 and 1. It is generally good practice to normalize input values to the range of the chosen transfer function, in this case, the sigmoid function that outputs values between 0 and 1. The dataset_minmax() and normalize_dataset() helper functions were used to normalize the input values.

We will evaluate the algorithm using k-fold cross-validation with 5 folds. This means that 201/5=40.2 or 40 records will be in each fold. We will use the helper functions evaluate_algorithm() to evaluate the algorithm with cross-validation and accuracy_metric() to calculate the accuracy of predictions.

A new function named back_propagation() was developed to manage the application of the Backpropagation algorithm, first initializing a network, training it on the training dataset and then using the trained network to make predictions on a test dataset.

The complete example is listed below.

# Backprop on the Seeds Dataset
from random import seed
from random import randrange
from random import random
from csv import reader
from math import exp

# Load a CSV file
def load_csv(filename):
	dataset = list()
	with open(filename, 'r') as file:
		csv_reader = reader(file)
		for row in csv_reader:
			if not row:
				continue
			dataset.append(row)
	return dataset

# Convert string column to float
def str_column_to_float(dataset, column):
	for row in dataset:
		row[column] = float(row[column].strip())

# Convert string column to integer
def str_column_to_int(dataset, column):
	class_values = [row[column] for row in dataset]
	unique = set(class_values)
	lookup = dict()
	for i, value in enumerate(unique):
		lookup[value] = i
	for row in dataset:
		row[column] = lookup[row[column]]
	return lookup

# Find the min and max values for each column
def dataset_minmax(dataset):
	minmax = list()
	stats = [[min(column), max(column)] for column in zip(*dataset)]
	return stats

# Rescale dataset columns to the range 0-1
def normalize_dataset(dataset, minmax):
	for row in dataset:
		for i in range(len(row)-1):
			row[i] = (row[i] - minmax[i][0]) / (minmax[i][1] - minmax[i][0])

# Split a dataset into k folds
def cross_validation_split(dataset, n_folds):
	dataset_split = list()
	dataset_copy = list(dataset)
	fold_size = int(len(dataset) / n_folds)
	for i in range(n_folds):
		fold = list()
		while len(fold) < fold_size:
			index = randrange(len(dataset_copy))
			fold.append(dataset_copy.pop(index))
		dataset_split.append(fold)
	return dataset_split

# Calculate accuracy percentage
def accuracy_metric(actual, predicted):
	correct = 0
	for i in range(len(actual)):
		if actual[i] == predicted[i]:
			correct += 1
	return correct / float(len(actual)) * 100.0

# Evaluate an algorithm using a cross validation split
def evaluate_algorithm(dataset, algorithm, n_folds, *args):
	folds = cross_validation_split(dataset, n_folds)
	scores = list()
	for fold in folds:
		train_set = list(folds)
		train_set.remove(fold)
		train_set = sum(train_set, [])
		test_set = list()
		for row in fold:
			row_copy = list(row)
			test_set.append(row_copy)
			row_copy[-1] = None
		predicted = algorithm(train_set, test_set, *args)
		actual = [row[-1] for row in fold]
		accuracy = accuracy_metric(actual, predicted)
		scores.append(accuracy)
	return scores

# Calculate neuron activation for an input
def activate(weights, inputs):
	activation = weights[-1]
	for i in range(len(weights)-1):
		activation += weights[i] * inputs[i]
	return activation

# Transfer neuron activation
def transfer(activation):
	return 1.0 / (1.0 + exp(-activation))

# Forward propagate input to a network output
def forward_propagate(network, row):
	inputs = row
	for layer in network:
		new_inputs = []
		for neuron in layer:
			activation = activate(neuron['weights'], inputs)
			neuron['output'] = transfer(activation)
			new_inputs.append(neuron['output'])
		inputs = new_inputs
	return inputs

# Calculate the derivative of an neuron output
def transfer_derivative(output):
	return output * (1.0 - output)

# Backpropagate error and store in neurons
def backward_propagate_error(network, expected):
	for i in reversed(range(len(network))):
		layer = network[i]
		errors = list()
		if i != len(network)-1:
			for j in range(len(layer)):
				error = 0.0
				for neuron in network[i + 1]:
					error += (neuron['weights'][j] * neuron['delta'])
				errors.append(error)
		else:
			for j in range(len(layer)):
				neuron = layer[j]
				errors.append(neuron['output'] - expected[j])
		for j in range(len(layer)):
			neuron = layer[j]
			neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])

# Update network weights with error
def update_weights(network, row, l_rate):
	for i in range(len(network)):
		inputs = row[:-1]
		if i != 0:
			inputs = [neuron['output'] for neuron in network[i - 1]]
		for neuron in network[i]:
			for j in range(len(inputs)):
				neuron['weights'][j] -= l_rate * neuron['delta'] * inputs[j]
			neuron['weights'][-1] -= l_rate * neuron['delta']

# Train a network for a fixed number of epochs
def train_network(network, train, l_rate, n_epoch, n_outputs):
	for epoch in range(n_epoch):
		for row in train:
			outputs = forward_propagate(network, row)
			expected = [0 for i in range(n_outputs)]
			expected[row[-1]] = 1
			backward_propagate_error(network, expected)
			update_weights(network, row, l_rate)

# Initialize a network
def initialize_network(n_inputs, n_hidden, n_outputs):
	network = list()
	hidden_layer = [{'weights':[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
	network.append(hidden_layer)
	output_layer = [{'weights':[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
	network.append(output_layer)
	return network

# Make a prediction with a network
def predict(network, row):
	outputs = forward_propagate(network, row)
	return outputs.index(max(outputs))

# Backpropagation Algorithm With Stochastic Gradient Descent
def back_propagation(train, test, l_rate, n_epoch, n_hidden):
	n_inputs = len(train[0]) - 1
	n_outputs = len(set([row[-1] for row in train]))
	network = initialize_network(n_inputs, n_hidden, n_outputs)
	train_network(network, train, l_rate, n_epoch, n_outputs)
	predictions = list()
	for row in test:
		prediction = predict(network, row)
		predictions.append(prediction)
	return(predictions)

# Test Backprop on Seeds dataset
seed(1)
# load and prepare data
filename = 'seeds_dataset.csv'
dataset = load_csv(filename)
for i in range(len(dataset[0])-1):
	str_column_to_float(dataset, i)
# convert class column to integers
str_column_to_int(dataset, len(dataset[0])-1)
# normalize input variables
minmax = dataset_minmax(dataset)
normalize_dataset(dataset, minmax)
# evaluate algorithm
n_folds = 5
l_rate = 0.3
n_epoch = 500
n_hidden = 5
scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
print('Scores: %s' % scores)
print('Mean Accuracy: %.3f%%' % (sum(scores)/float(len(scores))))

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

# Backprop on the Seeds Dataset

from random import seed

from random import randrange

from random import random

from csv import reader

from math import exp

# Load a CSV file

def load_csv(filename):

dataset = list()

with open(filename, 'r') as file:

csv_reader = reader(file)

for row in csv_reader:

if not row:

continue

dataset.append(row)

return dataset

# Convert string column to float

def str_column_to_float(dataset, column):

for row in dataset:

row[column] = float(row[column].strip())

# Convert string column to integer

def str_column_to_int(dataset, column):

class_values = [row[column] for row in dataset]

unique = set(class_values)

lookup = dict()

for i, value in enumerate(unique):

lookup[value] = i

for row in dataset:

row[column] = lookup[row[column]]

return lookup

# Find the min and max values for each column

def dataset_minmax(dataset):

minmax = list()

stats = [[min(column), max(column)] for column in zip(*dataset)]

return stats

# Rescale dataset columns to the range 0-1

def normalize_dataset(dataset, minmax):

for row in dataset:

for i in range(len(row)-1):

row[i] = (row[i] - minmax[i][0]) / (minmax[i][1] - minmax[i][0])

# Split a dataset into k folds

def cross_validation_split(dataset, n_folds):

dataset_split = list()

dataset_copy = list(dataset)

fold_size = int(len(dataset) / n_folds)

for i in range(n_folds):

fold = list()

while len(fold) < fold_size:

index = randrange(len(dataset_copy))

fold.append(dataset_copy.pop(index))

dataset_split.append(fold)

return dataset_split

# Calculate accuracy percentage

def accuracy_metric(actual, predicted):

correct = 0

for i in range(len(actual)):

if actual[i] == predicted[i]:

correct += 1

return correct / float(len(actual)) * 100.0

# Evaluate an algorithm using a cross validation split

def evaluate_algorithm(dataset, algorithm, n_folds, *args):

folds = cross_validation_split(dataset, n_folds)

scores = list()

for fold in folds:

train_set = list(folds)

train_set.remove(fold)

train_set = sum(train_set, [])

test_set = list()

for row in fold:

row_copy = list(row)

test_set.append(row_copy)

row_copy[-1] = None

predicted = algorithm(train_set, test_set, *args)

actual = [row[-1] for row in fold]

accuracy = accuracy_metric(actual, predicted)

scores.append(accuracy)

return scores

# Calculate neuron activation for an input

def activate(weights, inputs):

activation = weights[-1]

for i in range(len(weights)-1):

activation += weights[i] * inputs[i]

return activation

# Transfer neuron activation

def transfer(activation):

return 1.0 / (1.0 + exp(-activation))

# Forward propagate input to a network output

def forward_propagate(network, row):

inputs = row

for layer in network:

new_inputs = []

for neuron in layer:

activation = activate(neuron['weights'], inputs)

neuron['output'] = transfer(activation)

new_inputs.append(neuron['output'])

inputs = new_inputs

return inputs

# Calculate the derivative of an neuron output

def transfer_derivative(output):

return output * (1.0 - output)

# Backpropagate error and store in neurons

def backward_propagate_error(network, expected):

for i in reversed(range(len(network))):

layer = network[i]

errors = list()

if i != len(network)-1:

for j in range(len(layer)):

error = 0.0

for neuron in network[i + 1]:

error += (neuron['weights'][j] * neuron['delta'])

errors.append(error)

else:

for j in range(len(layer)):

neuron = layer[j]

errors.append(neuron['output'] - expected[j])

for j in range(len(layer)):

neuron = layer[j]

neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])

# Update network weights with error

def update_weights(network, row, l_rate):

for i in range(len(network)):

inputs = row[:-1]

if i != 0:

inputs = [neuron['output'] for neuron in network[i - 1]]

for neuron in network[i]:

for j in range(len(inputs)):

neuron['weights'][j] -= l_rate * neuron['delta'] * inputs[j]

neuron['weights'][-1] -= l_rate * neuron['delta']

# Train a network for a fixed number of epochs

def train_network(network, train, l_rate, n_epoch, n_outputs):

for epoch in range(n_epoch):

for row in train:

outputs = forward_propagate(network, row)

expected = [0 for i in range(n_outputs)]

expected[row[-1]] = 1

backward_propagate_error(network, expected)

update_weights(network, row, l_rate)

# Initialize a network

def initialize_network(n_inputs, n_hidden, n_outputs):

network = list()

hidden_layer = [{'weights':[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]

network.append(hidden_layer)

output_layer = [{'weights':[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]

network.append(output_layer)

return network

# Make a prediction with a network

def predict(network, row):

outputs = forward_propagate(network, row)

return outputs.index(max(outputs))

# Backpropagation Algorithm With Stochastic Gradient Descent

def back_propagation(train, test, l_rate, n_epoch, n_hidden):

n_inputs = len(train[0]) - 1

n_outputs = len(set([row[-1] for row in train]))

network = initialize_network(n_inputs, n_hidden, n_outputs)

train_network(network, train, l_rate, n_epoch, n_outputs)

predictions = list()

for row in test:

prediction = predict(network, row)

predictions.append(prediction)

return(predictions)

# Test Backprop on Seeds dataset

seed(1)

# load and prepare data

filename = 'seeds_dataset.csv'

dataset = load_csv(filename)

for i in range(len(dataset[0])-1):

str_column_to_float(dataset, i)

# convert class column to integers

str_column_to_int(dataset, len(dataset[0])-1)

# normalize input variables

minmax = dataset_minmax(dataset)

normalize_dataset(dataset, minmax)

# evaluate algorithm

n_folds = 5

l_rate = 0.3

n_epoch = 500

n_hidden = 5

scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)

print('Scores: %s' % scores)

print('Mean Accuracy: %.3f%%' % (sum(scores)/float(len(scores))))

A network with 5 neurons in the hidden layer and 3 neurons in the output layer was constructed. The network was trained for 500 epochs with a learning rate of 0.3. These parameters were found with a little trial and error, but you may be able to do much better.

Running the example prints the average classification accuracy on each fold as well as the average performance across all folds.

You can see that backpropagation and the chosen configuration achieved a mean classification accuracy of about 93% which is dramatically better than the Zero Rule algorithm that did slightly better than 28% accuracy.

Scores: [92.85714285714286, 92.85714285714286, 97.61904761904762, 92.85714285714286, 90.47619047619048]
Mean Accuracy: 93.333%

1 2	Scores: [92.85714285714286, 92.85714285714286, 97.61904761904762, 92.85714285714286, 90.47619047619048] Mean Accuracy: 93.333%

Extensions

This section lists extensions to the tutorial that you may wish to explore.

Tune Algorithm Parameters. Try larger or smaller networks trained for longer or shorter. See if you can get better performance on the seeds dataset.
Additional Methods. Experiment with different weight initialization techniques (such as small random numbers) and different transfer functions (such as tanh).
More Layers. Add support for more hidden layers, trained in just the same way as the one hidden layer used in this tutorial.
Regression. Change the network so that there is only one neuron in the output layer and that a real value is predicted. Pick a regression dataset to practice on. A linear transfer function could be used for neurons in the output layer, or the output values of the chosen dataset could be scaled to values between 0 and 1.
Batch Gradient Descent. Change the training procedure from online to batch gradient descent and update the weights only at the end of each epoch.

Did you try any of these extensions?
Share your experiences in the comments below.

Review

In this tutorial, you discovered how to implement the Backpropagation algorithm from scratch.

Specifically, you learned:

How to forward propagate an input to calculate a network output.
How to back propagate error and update network weights.
How to apply the backpropagation algorithm to a real world dataset.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

837 Responses to How to Code a Neural Network with Backpropagation In Python (from scratch)

Talk Data To Me November 7, 2016 at 9:28 pm #

That’s what I was looking for. Write a neural network without any libraries (scikit, keras etc.) Thnak you very much!

Jason Brownlee November 8, 2016 at 9:51 am #

I’m glad to hear it!

Reply
- sari dewi August 16, 2019 at 11:55 am #
  
  Hy Mr. jason , i try your code to make a neural network with backpropagation method, I using jupyter notebook anaconda and pyhton 3.7 64 bit, when i try this code
  
  seed(1)
  # load and prepare data
  filename =’datalatih.csv’
  dataset = load_csv(filename)
  for i in range(len(dataset[0])-1):
  str_column_to_float(dataset, i)
  # convert class column to integers
  str_column_to_int(dataset, len(dataset[0])-1)
  # normalize input variables
  minmax = dataset_minmax(dataset)
  normalize_dataset(dataset, minmax)
  # evaluate algorithm
  n_folds =5
  l_rate =0.3
  n_epoch =500
  n_hidden =5
  scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
  
  print (‘Scores: %s’ % scores)
  print (‘Mean Accuracy: %.3f%%’ % (sum(scores)/float(len(scores))))
  
  but I get error message
  
  IndexError Traceback (most recent call last)
  in
  196 n_epoch =500
  197 n_hidden =5
  –> 198 scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
  199
  200 print (‘Scores: %s’ % scores)
  
  in evaluate_algorithm(dataset, algorithm, n_folds, *args)
  79 test_set.append(row_copy)
  80 row_copy[-1] = None
  —> 81 predicted = algorithm(train_set, test_set, *args)
  82 actual = [row[-1] for row in fold]
  83 accuracy = accuracy_metric(actual, predicted)
  
  in back_propagation(train, test, l_rate, n_epoch, n_hidden)
  171 n_outputs = len(set([row[-1] for row in train]))
  172 network = initialize_network(n_inputs, n_hidden, n_outputs)
  –> 173 train_network(network, train, l_rate, n_epoch, n_outputs)
  174 predictions = list()
  175 for row in test:
  
  in train_network(network, train, l_rate, n_epoch, n_outputs)
  148 outputs = forward_propagate(network, row)
  149 expected = [0 for i in range(n_outputs)]
  –> 150 expected[row[-1]] = 1
  151 backward_propagate_error(network, expected)
  152 update_weights(network, row, l_rate)
  
  IndexError: list assignment index out of range
  
  what my mistake? is there missing code? thankyou
  
  Reply
  - Jason Brownlee August 16, 2019 at 2:11 pm #
    
    Sorry to hear that you are having trouble, I have some suggestions for you here:
    https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
    
    Reply
  - steven November 9, 2019 at 1:35 pm #
    
    this is the exact problem I face to. Do you have any suggestion? Thank you so much
    
    Reply
- Febry Triyadi November 22, 2019 at 6:53 pm #
  
  Hi Mr.Jason i have trouble with your code. Please check it, i not understand with expected[row[-1]] = 1
  
  IndexError Traceback (most recent call last)
  in ()
  13 n_epoch = 500
  14 n_hidden = 5
  —> 15 scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
  16 print(‘Scores: %s’ % scores)
  17 print(‘Mean Accuracy: %.3f%%’ % (sum(scores)/float(len(scores))))
  
  2 frames
  in train_network(network, train, l_rate, n_epoch, n_outputs)
  50 outputs = forward_propagate(network, row)
  51 expected = [0 for i in range(n_outputs)]
  —> 52 expected[row[-1]] = 1
  53 backward_propagate_error(network, expected)
  54 update_weights(network, row, l_rate)
  
  IndexError: list assignment index out of range
  
  Reply
  - Jason Brownlee November 23, 2019 at 6:50 am #
    
    Sorry to hear that, I have some suggestions here:
    https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
    
    Reply
WB February 20, 2018 at 3:07 pm #

I experienced the following applying the Backpropagation algorithm to the wheat seeds dataset. I am wondering how to resolve the errors? Thank you
—————————————————————————
ValueError Traceback (most recent call last)
in ()
184 dataset = load_csv(filename)
185 for i in range(len(dataset[0])-1):
–> 186 str_column_to_float(dataset, i)
187 # convert class column to integers
188 str_column_to_int(dataset, len(dataset[0])-1)

in str_column_to_float(dataset, column)
20 def str_column_to_float(dataset, column):
21 for row in dataset:
—> 22 row[column] = float(row[column].strip())
23
24 # Convert string column to integer

ValueError: could not convert string to float:

Reply
- Jason Brownlee February 21, 2018 at 6:35 am #
  
  Are you using Python 2?
  
  Reply
  - wb February 21, 2018 at 2:51 pm #
    
    Yes I am
    
    Reply
  - harshith October 5, 2018 at 8:28 pm #
    
    hi bro whass up
    
    Reply
- Mike Harney March 5, 2018 at 9:53 am #
  
  Hi wb, I’m on 3.6 and I found the same issue. Maybe you can answer this Jason, but it looks like the some of the data is misaligned in the sample. When opened in Excel, there are many open spaces followed by data jutted out to an extra column. I assume this is unintentional, and when I corrected the spacing, it appeared to work for me.
  
  Reply
  - Jason Brownlee March 6, 2018 at 6:08 am #
    
    The code was written and tested with Python 2.7.
    
    Reply
    - JU April 23, 2018 at 7:24 am #
      
      Mike is right – the dataset from the UCI website is slightly defective: It has two tabs in some places where there should be only one. This needs to be corrected during the conversion to CSV. In Excel the easiest way is to use the text importer and then click the “Treat consecutive delimiters as one” checkbox.
    - Jason Brownlee April 23, 2018 at 7:37 am #
      
      Here is the dataset ready to use:
      https://raw.githubusercontent.com/jbrownlee/Datasets/master/wheat-seeds.csv
- Alexis Batyk August 29, 2018 at 6:22 am #
  
  [SOLVED]
  i have the same issue with
  
  https://raw.githubusercontent.com/jbrownlee/Datasets/master/wheat-seeds.csv
  
  there is still dirty that csv
  
  use a text editor -> select search and replace tool -> search ‘,,’ replace ‘,’ and it works
  
  Reply
  - Jason Brownlee August 29, 2018 at 8:16 am #
    
    I don’t have such problems on Py 3.6.
    
    Reply
  - Jackson Scott October 1, 2018 at 9:08 am #
    
    thanks, this worked for me as well. The csv file had some tabbed over and others correct.
    
    Reply
    - Dharmendra Kumar September 3, 2019 at 7:38 pm #
      
      Thank you
    - Jason Brownlee September 4, 2019 at 5:56 am #
      
      You’re welcome.
- Deng October 14, 2018 at 5:50 pm #
  
  The data in the seeds_dataset file contains the backspace key, and it is ok to reset the data
  
  Reply

George Dong May 12, 2019 at 6:14 pm #

I echo that too!

Just one question please! In your code below, I could not understand why multiplication is used instead of division in the last line. Though division caused divide by zero problem.

# Backpropagate error and store in neurons
def backward_propagate_error(network, expected):
    for i in reversed(range(len(network))):
        layer = network[i]
        errors = list()
        if i != len(network)-1:
            for j in range(len(layer)):
                error = 0.0
                for neuron in network[i + 1]:
                    error += (neuron['weights'][j] * neuron['delta'])
                
                errors.append(error)
            for j in range(len(layer)):
                layer[j]['error'] = errors[j]
        else:
            for j in range(len(layer)):
                neuron = layer[j]
                errors.append(expected[j] - neuron['output'])
                neuron['error']=expected[j] - neuron['output']
        for j in range(len(layer)):
            neuron = layer[j]
            neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])
##            neuron['delta'] = errors[j] / transfer_derivative(neuron['output'])

# Backpropagate error and store in neurons

def backward_propagate_error(network, expected):

for i in reversed(range(len(network))):

layer = network[i]

errors = list()

if i != len(network)-1:

for j in range(len(layer)):

error = 0.0

for neuron in network[i + 1]:

error += (neuron['weights'][j] * neuron['delta'])

errors.append(error)

for j in range(len(layer)):

layer[j]['error'] = errors[j]

else:

for j in range(len(layer)):

neuron = layer[j]

errors.append(expected[j] - neuron['output'])

neuron['error']=expected[j] - neuron['output']

for j in range(len(layer)):

neuron = layer[j]

neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])

## neuron['delta'] = errors[j] / transfer_derivative(neuron['output'])

My understanding is gradient = dError / dWeights. Therefore, dWeights = dError / gradient
i.e. delta = errors[j] / derivative

Did we somehow make changes here, for calculation reasons, to use arctan instead of tan for gradient?

I’d be grateful if you could help.

Dhaila November 22, 2020 at 12:36 am #

Hi Dong,

I was looking into the code. And have the same ques as you raised above. That why we are multiplying. Can I please ask you if you get any understanding of that?

Reply
- Francisco December 6, 2022 at 10:11 pm #
  
  Hi Dhaila, sorry if this comes a bit late, but for anyone wondering why it is multiplied and not divided, it is due to the chain rule. The core idea of backpropagation is to find the gradient of the cost function i.e. error with respect to the weights, in other words, dE/dw. However, the error we have computed is (label-output), which is equivalent to dE/dy; then, we have computed the derivative from the neuron, which is dy/dw. Hence, by multiplying, you will get dE/dy *dy/dw = dE/dw which is what we are looking for. This explanation is simplified, if you would like a more in-depth answer, I would suggest reading chapter 8 from Deep Learning by Ian Goodfellow or Machine learning by Bishop. They go into more depth about this topic. Also, Jason, feel free to correct me if you think I might have misrepresented anything
  
  Reply

Maria January 12, 2020 at 5:28 pm #

Hi Jason ,I need code of back propagation artificial neural network for predicting population dynamics of insects pests.

Reply
- Jason Brownlee January 13, 2020 at 8:19 am #
  
  Sounds like a great project. Perhaps start here:
  https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
  
  Reply

MO November 8, 2016 at 9:26 am #

where can i see your data set, i want to see how it looked like

Reply
- Jason Brownlee November 8, 2016 at 10:01 am #
  
  Hi MO.
  
  The small contrived dataset used for testing is listed inline in the post in section 4.2
  
  The dataset used for the full example is on the UCI ML repository, linked in the section titled “Wheat Seeds Dataset”. Here is the direct link:
  http://archive.ics.uci.edu/ml/datasets/seeds
  
  Reply
  - Solene EBA March 4, 2022 at 11:56 pm #
    
    Hello do you have any ideas to calculate the Rsquared
    
    Reply
    - James Carmichael March 5, 2022 at 12:36 pm #
      
      Hi Solene..Please clarify what code listing you have a question about so that I may better assist you.
      
      Reply
prakash November 11, 2016 at 12:40 am #

in two class classification for 0 the expected value is [1,0] for 1 its is [0,1].
how will be the output vectors for more than two class??

Reply
- Jason Brownlee November 11, 2016 at 10:02 am #
  
  Hi prakash,
  
  For multi-class classification, we can extend the one hot encoding.
  
  Three class values for “red”, “green” “blue” can be represented as an output vector like:
  1, 0, 0 for red
  0, 1, 0 for green
  0, 0, 1 for blue
  
  I hope that helps.
  
  Reply
Rakesh November 13, 2016 at 3:41 pm #

Hi, Jason.
You’ve mentioned that there are 3 output classes.
How do we check the values which come under the 3 classes / clusters?
Could we print the data which fall under each class?

Reply
- Jason Brownlee November 14, 2016 at 7:35 am #
  
  Hi Rakesh,
  
  The data does belong to 3 classes. We can check the skill of our model by comparing the predicted classes to the actual/expected classes and calculate an accuracy measure.
  
  Reply
Alex November 16, 2016 at 12:35 pm #

I’m confused why the activation method iterates from 0 to len(inputs) – 1 instead of from 0 to len(weights) – 1. Am I missing something?

Reply
- Jason Brownlee November 17, 2016 at 9:47 am #
  
  Hi Alex,
  
  The length of weights is the length of the input + 1 (to accommodate the bias term).
  
  We add the bias term first, then we add the weighted inputs. This is why we iterate over input values.
  
  Does that help?
  
  Reply
  - Alex November 17, 2016 at 12:29 pm #
    
    When I step through the code above for the ‘forward_propagate’ test case, I see the code correctly generate the output for the single hidden node but that output doesn’t get correctly processed when determining the outputs for the output layer. As written above in the activate function ‘for i in range(len(inputs)-1):’, when the calculation gets to the activate function for the output node for class=0, since ‘inputs’ has a single element in it (the output from the single hidden node), ‘len(inputs) – 1’ equals 0 so the for loop never executes. I’m assuming the code is supposed to read ‘for i in range(len(weights) -1):’ Does that make sense?
    
    I’m just trying to make sure I don’t fundamentally misunderstand something and improve this post for other readers. This site has been really, really helpful for me.
    
    Reply
    - Jason Brownlee November 18, 2016 at 8:27 am #
      
      I’m with you now, thanks for helping me catch-up.
      
      Nice spot. I’ll fix up the tutorial.
      
      Update: Fixed. Thanks again mate!
      
      Reply
Tomasz Panek November 21, 2016 at 1:23 am #

# Update network weights with error
def update_weights(network, row, l_rate):
for i in range(len(network)):
inputs = row
if i != 0:
inputs = [neuron[‘output’] for neuron in network[i – 1]]
for neuron in network[i]:
for j in range(len(inputs)-1):
neuron[‘weights’][j] += l_rate * neuron[‘delta’] * inputs[j]
neuron[‘weights’][-1] += l_rate * neuron[‘delta’]

In this fragment:
for j in range(len(inputs)-1):
neuron[‘weights’][j] += l_rate * neuron[‘delta’] * inputs[j]
neuron[‘weights’][-1] += l_rate * neuron[‘delta’]

If inputs length = 1, you are not updating weights, it’s correct? You are updating only bias, because in hidden layer is only one neuron.

Reply
Tomasz November 21, 2016 at 1:34 am #

Hello. In method update_weight you are doing for j in range(len(inputs) – 1). If inputs lenght = 1, you aren’t updating weights. It’s correct? Hidden layer have one neuron so in output layer weights aren’t updated

Reply
- Jason Brownlee November 22, 2016 at 6:54 am #
  
  Hi Tomasz,
  
  The assumption here is that the input vector always contains at least one input value and an output value, even if the output is set to None.
  
  You may have found a bug though when updating the layers. I’ll investigate and get back to you.
  
  Reply
  - Jason Brownlee January 3, 2017 at 10:17 am #
    
    Thanks Tomasz, this was indeed a bug.
    
    I have updated the update_weights() function in the above code examples.
    
    Reply
    - Jerry Jones October 16, 2018 at 8:18 am #
      
      I don’t understand how update_weights updates the NN. There is no global variable or return from the function. What am I missing?
      
      Reply
      - Jason Brownlee October 16, 2018 at 2:33 pm #
        
        The weights are passed in by reference and modified in place.
        
        This is an advanced tutorial, I’d recommend using Keras for beginners.
Michael December 13, 2016 at 4:15 am #

Hi, Thanks for the tutorial, I’m doing a backpropagation project at the moment so its been really useful.

I was a little confused on the back-propagation error calculation function. Does “if i != len(network)-1:” mean that if the current layer isn’t the output layer then this following code is run or does it mean that the current layer is an output layer?

Reply
- Jason Brownlee December 13, 2016 at 8:08 am #
  
  Glad to hear it Michael.
  
  The line means if the index i is not equal to the index of the last layer of the network (the output layer), then run code inside the block.
  
  Reply
Michael January 5, 2017 at 7:53 am #

I have another question.
Would it be possible to extend the code from this tutorial and create a network that trains using the MNIST handwritten digit set? using a input unit to represent each pixel in the image. I’m also not sure whether/how I could use feature extractors for the images.

I have a project where I have to implement the Backpropagation algorithm with possibly the MNIST handwritten digit training set.

I hope my question makes sense!

Reply
- Jason Brownlee January 5, 2017 at 9:42 am #
  
  Sure Michael, but I would recommend using a library like Keras instead as this code is not written for performance.
  
  Load an image as a long list of pixel integer values, convert to floats and away you go. No feature extraction needed for a simple MLP implementation. You should get performance above 90%.
  
  Reply
Calin January 6, 2017 at 10:40 pm #

Hi Jason,

Great post!

I have a concern though:

In train_network method there are these two lines of code:

expected = [0 for i in range(n_outputs)]
expected[row[-1]] = 1

Couldn’t be the case that expected[row[-1]] = 1 will throw IndexError, as n_outputs is the size of the training set which is a subset of the dataset and row basically contains values from the whole dataset?

Reply
- Jason Brownlee January 7, 2017 at 8:37 am #
  
  Hi Calin,
  
  If I understand you correctly, No. The n_outputs var is the length of the number of possible output values.
  
  Maybe put some print() statements in to help you better understand what values variables have.
  
  Reply
  - Calin January 7, 2017 at 9:48 pm #
    
    Hmm..I ran the entire code (with the csv file downloaded from http://archive.ics.uci.edu/ml/datasets/seeds), added some breakpoints and this is what I got after a few iterations:
    
    n_outputs = 168
    row[-1] = 201
    
    which is causing IndexError: list assignment index out of range.
    
    Reply
    - Adriaan January 11, 2017 at 4:27 am #
      
      I’ve got the same error, That my list assignment index is out of range
      
      Reply
      - Jason Brownlee January 11, 2017 at 9:29 am #
        
        Sorry to hear that, did you try running the updated code?
      - Ivan January 16, 2017 at 10:28 am #
        
        This is error of csv read. Try to reformat it with commas. For me it worked
      - Jason Brownlee January 16, 2017 at 10:45 am #
        
        What was the problem and fix exactly Ivan?
      - Bob February 5, 2017 at 10:59 am #
        
        The data file (http://archive.ics.uci.edu/ml/machine-learning-databases/00236/seeds_dataset.txt) has a few lines with double tabs (\t\t) as the delimiter — removing the double tabs and changing tabs to commas fixed it.
        
        Thanks for the good article.
      - Jason Brownlee February 6, 2017 at 9:42 am #
        
        Thanks for the note Bob.
      - Rowen Bruce October 20, 2018 at 8:52 pm #
        
        updated code
- Adriaan January 11, 2017 at 5:50 am #
  
  I’ve had the same error at the ‘train_network’ function. Is your dataset fine? I’ve had some problems because the CSV file wasn’t loaded correctly due to my regional windows settings. I’ve had to adjust my settings and everything worked out alright.
  
  http://superuser.com/questions/783060/excel-save-as-csv-options-possible-to-change-comma-to-pipe-or-tab-instead
  
  Reply
Stanley January 8, 2017 at 3:15 pm #

Thanks for such a good article.

Just one question: in the equation “weight = weight + learning_rate * error * input”, why there is an “input”? IMO it should be: “weight = weight + learning_rate * error”?

Reply
- Jason Brownlee January 9, 2017 at 7:47 am #
  
  The var names and explanation are correct.
  
  The update equation is:
  
  weight = weight + learning_rate * error * input
  
  1
  
  weight = weight + learning_rate * error * input
  
  For the input layer the input are the input data, for hidden layers the input is the output of the prior layer.
  
  Reply
  - Herman October 21, 2021 at 6:33 pm #
    
    I think the formula should be weight = weight – learning_rate * error * input instead of +. Am I right?
    
    Reply
    - Adrian Tam October 22, 2021 at 3:50 am #
      
      You’re right if you comparing what it is done here to your textbook! However, notice the line “errors.append(expected[j] – neuron[‘output’])”, hence the error is expressed negative of what you expect. So this is corrected.
      
      Probably I should revise the code to make it consistent with other people’s implementation.
      
      Reply
Madwadasa January 13, 2017 at 3:31 am #

Jason,

Thanks for the code and post.
Why is “expected” in expected = [0 for i in range(n_outputs)] initialized to [0,0] ?
Should not the o/p values be taken as expected when training the model ?
i.e for example in case of Xor should not 1 be taken as the expected ?

Reply
- Jason Brownlee January 13, 2017 at 9:16 am #
  
  Hi Madwadasa,
  
  Expected is a one-hot encoding. All classes are “0” expect the actual class for the row which is marked as a “1” on the next line.
  
  Reply
Michael January 19, 2017 at 3:44 am #

Hello, I have a couple more questions. When training the network with a dataset, does the error at each epoch indicate the distance between the predicted outcomes and the expected outcomes together for the whole dataset? Also when the mean accuracy is given in my case being 13% when I used the MNIST digit set, does this mean that the network will be correct 13% of the time and would have an error rate of 87%?

Reply
- Jason Brownlee January 19, 2017 at 7:38 am #
  
  Hi Michael,
  
  The epoch error does capture how wrong the algorithm is on all training data. This may or may not be a distance depending on the error measure used. RMSE is technically not a distance measure, you could use Euclidean distance if you like, but I would not recommend it.
  
  Yes, in generally when the model makes predictions your understanding is correct.
  
  Reply
Bernardo Galvão January 24, 2017 at 3:51 am #

Hi Jason,

in the excerpt regarding error of a neuron in a hidden layer:

“Where error_j is the error signal from the jth neuron in the output layer, weight_k is the weight that connects the kth neuron to the current neuron and output is the output for the current neuron.”

is the k-th neuron a neuron in the output layer or a neuron in the hidden layer we’re “on”? What about the current neuron, are you referring to the neuron in the output layer? Sorry, english is not my native tongue.

Appreciate your work!

Bernardo

Reply
anonymous February 1, 2017 at 1:42 am #

It would have been better if recall and precision were printed. Can somebody tell me how to print them in the above code.

Reply
- Jason Brownlee February 1, 2017 at 10:51 am #
  
  You can learn more about precision and recall here:
  https://en.wikipedia.org/wiki/Precision_and_recall
  
  Reply
kehinde kolade February 6, 2017 at 8:29 pm #

Hello Jason, great tutorial, I am developer and I do not really know much about this machine learning thing but I need to extend this your code to incorporate the Momentum aspect to the training, can you please explain how I can achieve this extension?

Reply
- Jason Brownlee February 7, 2017 at 10:14 am #
  
  Sorry, I don’t have the capacity to write or spell out this change for you.
  
  My advice would be to read a good book on the topic, such as Neural Smithing: http://amzn.to/2ld9ds0
  
  Reply
ibrahim February 18, 2017 at 2:21 am #

Hi Jason,
I have my own code written in C++, which works similar to your code. My intention is to extend my code to convolutional deep neural nets, and i have actually written the convolution, Relu and pooling functions however i could not begin to apply the backpropagation i have used in my shallow neural net, to the convolutional deep net, cause i really cant imagine the transition of the backpropagation calculation between the convolutional layers and the standard shallow layers existing in the same system. I hoped to find a source for this issue however i always come to the point that there is a standard backpropagation algorithm given for shallow nets that i applied already. Can you please guide me on this problem?

Reply
- Jason Brownlee February 18, 2017 at 8:42 am #
  
  I”d love to guide you but I don’t have my own from scratch implementation of CNNs, sorry. I’m not best placed to help at the moment.
  
  I’d recommend reading code from existing open source implementations.
  
  Good luck with your project.
  
  Reply
matias February 22, 2017 at 3:34 pm #

Thank you, I was looking for exactly this kind of ann algorith. A simple thank won’t be enough tho lol

Reply
- Jason Brownlee February 23, 2017 at 8:52 am #
  
  I’m glad it helped.
  
  The best way to help is to share the post with other people, or maybe purchase one of my books to support my ongoing work:
  https://machinelearningmastery.com/products
  
  Reply
Manohar Katam February 26, 2017 at 3:40 pm #

Great one! .. I have one doubt .. the dataset seeds contains missing features/fields for some rows.. how you are handling that …

Reply
- Jason Brownlee February 27, 2017 at 5:49 am #
  
  You could set the missing values to 0, you could remove the rows with missing values, you could impute the missing values with mean column values, etc.
  
  Try a few different methods and see what results in the best performing models.
  
  Reply
  - Manohar Katam March 1, 2017 at 2:59 pm #
    
    What if I have canonical forms like “male” or “female” in my dataset… Will this program work even with string data..
    
    Reply
    - Jason Brownlee March 2, 2017 at 8:11 am #
      
      Hi Manohar,
      
      No, you will need to convert them to integers (integer encoding) or similar.
      
      Reply
Wissal ARGOUBI February 27, 2017 at 11:12 pm #

Great job! this is what i was looking for ! thank you very much .
However i already have a data base and i didn’t know how to make it work with this code how can i adapt it on my data
Thank you

Reply
- Jason Brownlee February 28, 2017 at 8:10 am #
  
  This process will help you work through your predictive modeling problem:
  https://machinelearningmastery.com/start-here/#process
  
  Reply
Shweta Gupta March 5, 2017 at 4:37 am #

Thanks for such a great article..
I have one question, in update_weights why you have used weight=weight+l_rate*delta*input rather than weight=weight+l_rate*delta?

Reply
- Jason Brownlee March 6, 2017 at 10:55 am #
  
  You can learn more about the math in the book on the topic.
  
  I recommend Neural Smithing: http://amzn.to/2ld9ds0
  
  Reply
Sittha March 13, 2017 at 1:23 pm #

Thanks for a good tutorial.
I have some IndexError: list assignment index out of range. And I cannot fix it with comma or full-stop separator.

Reply
- Jason Brownlee March 14, 2017 at 8:11 am #
  
  What is the full error you are getting?
  
  Did you copy-paste the full final example and run it on the same dataset?
  
  Reply
  - Sittha March 24, 2017 at 3:36 am #
    
    line 151 :
    expected[row[-1]] = 1
    IndexError : list assignment index out of range
    
    Reply
    - Jason Brownlee March 24, 2017 at 8:00 am #
      
      Is this with a different dataset?
      
      Reply
      - Benji Weiss May 11, 2017 at 5:31 am #
        
        if it is a different dataset, what do i need to do to not get this error
Karan March 16, 2017 at 6:26 pm #

The dataset that was given was for training the network. Now how do we test the network by providing the 7 features without giving the class label(1,2 or 3) ?

Reply
- Jason Brownlee March 17, 2017 at 8:27 am #
  
  You will have to adapt the example to fit the model on all of the training data, then you can call predict() to make predictions on new data.
  
  Reply
  - Karan March 19, 2017 at 7:43 pm #
    
    Ok Jason, i’ll try that and get back to you! Thank you!
    
    Reply
Karan March 19, 2017 at 7:48 pm #

Just a suggestion for the people who would be using their own dataset(not the seeds_dataset) for training their network, make sure you add an IF loop as follows before the 45th line :
if minmax[i][1]!=minmax[i][0]

This is because your own dataset might contain same values in the same column and that might cause a divide by zero error.

Reply
- Jason Brownlee March 20, 2017 at 8:16 am #
  
  Thanks for the tip Karan.
  
  Reply
Li Qun March 25, 2017 at 5:45 pm #

Thanks jason for the amazing posts of your from scratch pyhton implementations! i have learned so much from you!

I have followed through both your naive bayes and backprop posts, and I have a (perhaps quite naive) question:

what is the relationship between the two? did backprop actually implement bayesian inference (after all, what i understand is that bayesian = weights being updated every cycle) already? perhaps just non-gaussian? so.. are non-gaussian PDF weight updates not bayesian inference?

i guess to put it simply : is backpropagation essentially a bayesian inference loop for an n number of epochs?

I came from the naive bayes tutorial wanting to implement backpropagation together with your naive bayes implementation but got a bit lost along the way.

sorry if i was going around in circles, i sincerely hope someone would be able to at least point me on the right direction.

Reply
- Jason Brownlee March 26, 2017 at 6:11 am #
  
  Great question.
  
  No, they are both very different. Naive bayes is a direct use of the probabilities and bayes theorem. The neural net is approximating a mapping function from inputs and outputs – a very different approach that does not directly use the joint probability.
  
  Reply
Chiraag March 26, 2017 at 10:10 pm #

How did you decide that the number of folds will be 5 ? Could you please explain the significance of this number. Thank You.

Reply
- Jason Brownlee March 27, 2017 at 7:54 am #
  
  In this case, it was pretty arbitary.
  
  Generally, you want to split the data so that each fold is representative of the dataset. The objective measure is how closely the mean performance reflect the actual performance of the model on unseen data. We can only estimate this in practice (standard error?).
  
  Reply
Li Qun March 27, 2017 at 10:19 pm #

Dear Jason,

thank you for the reply! I read up a bit more about the differences between Naive Bayes (or Bayesian Nets in general) and Neural Networks and found this Quora answer that i thought was very clear. I’ll put it up here to give other readers a good point to go from:

https://www.quora.com/What-is-the-difference-between-a-Bayesian-network-and-an-artificial-neural-network

TL:DR :
– they look the same, but every node in a Bayesian Network has meaning, in that you can read a Bayesian network structure (like a mind map) and see what’s happening where and why.
– a Neural Network structure doesn’t have explicit meaning, its just dots that link previous dots.
– there are more reasons, but the above two highlighted the biggest difference.

Just a quick guess after playing around with backpropagation a little: the way NB and backprop NN would work together is by running Naive Bayes to get a good ‘first guess’ of initial weights that are then run through and Neural Network and Backpropagated?

Reply
- Jason Brownlee March 28, 2017 at 8:23 am #
  
  Please note that a Bayesian network and naive bayes are very different algorithms.
  
  Reply
Melissa March 27, 2017 at 10:54 pm #

Hi Jason,
Further to this update:

Update Jan/2017: Changed the calculation of fold_size in cross_validation_split() to always be an integer. Fixes issues with Python 3.

I’m still having this same problem whilst using python 3, on both the seeds data set and my own. It returns an error at line 75 saying ‘list object has no attribute ‘sum” and also saying than ‘an integer is required.’

Any help would be very much appreciated.
Overall this code is very helpful. Thank you!

Reply
- Jason Brownlee March 28, 2017 at 8:24 am #
  
  Sorry to hear that, did you try copy-paste the complete working example from the end of the post and run it on the same dataset from the command line?
  
  Reply
  - Melissa March 28, 2017 at 9:29 am #
    
    Yes I’ve done that, but still the same problem!
    
    Reply
david March 29, 2017 at 6:16 am #

Hello jason,

please i need help on how to pass the output of the trained network into a fuzzy logic system if possible a code or link which can help understand better. Thank you

Reply
Aditya April 2, 2017 at 3:57 pm #

Awesome Explanation

Reply
- Jason Brownlee April 4, 2017 at 9:05 am #
  
  Thanks!
  
  Reply
Raunak Jain April 6, 2017 at 5:20 pm #

Hello Jason
I m getting list assignment index out or range error. How to handle this error?

Reply
- Jason Brownlee April 9, 2017 at 2:37 pm #
  
  The example was developed for Python 2, perhaps this is Python version issue?
  
  Reply
Marco April 6, 2017 at 9:37 pm #

Thanks but I think python is not a good choice…

Reply
- Jason Brownlee April 9, 2017 at 2:40 pm #
  
  I think it is a good choice for learning how backprop works.
  
  What would be a better choice?
  
  Reply
Agrawal April 6, 2017 at 9:38 pm #

Hey, Jason Thanks for this wonderful lecture on Neural Network.

As I am working on Iris Recognition, I have extracted the features of each eye and store it in .csv file, Can u suggest how further can I build my Backpropagation code.
As when I run your code I am getting many errors.
Thank you

Reply
- Jason Brownlee April 9, 2017 at 2:40 pm #
  
  This process will help you work through your modeling problem:
  https://machinelearningmastery.com/start-here/#process
  
  Reply
Jack April 7, 2017 at 3:42 pm #

Could you please convert this iterative implementation into matrix implementation?

Reply
- Jason Brownlee April 9, 2017 at 2:52 pm #
  
  Perhaps in the future Jack.
  
  Reply
Jk April 12, 2017 at 5:04 am #

Hi Jason,

In section 4.1 , may you please explain why you used ### inputs = row[:-1] ### ?

Thanks

Reply
- Jason Brownlee April 12, 2017 at 7:58 am #
  
  Yes. By default we are back-propagating the error of the expected output vs the network output (inputs = row[:-1]), but if we are not the output layer, propagate the error from the previous layer in the network (inputs = [neuron[‘output’] for neuron in network[i – 1]]).
  
  I hope that helps.
  
  Reply
  - JK April 13, 2017 at 3:59 am #
    
    Thanks for your respond. I understand what you said , the part I am no understanding is the [:-1] . why eliminating the last list item ?
    
    Reply
    - Jason Brownlee April 13, 2017 at 10:10 am #
      
      It is a range from 0 to the second last item in the list, e.g. (0 to n-1)
      
      Reply
    - Amer April 6, 2018 at 7:22 am #
      
      Because the last Item in the weights array is the biass
      
      Reply
Prem Puri April 12, 2017 at 8:18 pm #

In function call, def backward_propagate_error(network, expected):
how much i understand is , it sequentially pass upto
if i != len(network)-1:
for j in range(len(layer)):
error = 0.0
for neuron in network[i + 1]:
error += (neuron[‘weights’][j] * neuron[‘delta’])
My question is which value is used in neuron[‘delta’]

Reply
- Jason Brownlee April 13, 2017 at 10:01 am #
  
  delta is set in the previous code block. It is the error signal that is being propagated backward.
  
  Reply
  - Nishu March 25, 2018 at 11:32 am #
    
    I’m sorry, but I still can’t find the location where delta is set and hence, the code gives error.
    Where is the delta set for the first time?
    
    Reply
Prem Puri April 14, 2017 at 3:20 am #

Thanks very much!

Reply
- Jason Brownlee April 14, 2017 at 8:54 am #
  
  You’re welcome.
  
  Reply
youssef oumate April 26, 2017 at 4:53 pm #

Hi Jason

Thank you very much for this awesome implementation of neural network,
I have a question for you : I want to replace the activation function from Sigmoid
to RELU . So, what are the changes that I should perform in order to get
correct predictions?

Reply
- Jason Brownlee April 27, 2017 at 8:34 am #
  
  I think just a change to the transfer() and transfer_derivative() functions will do the trick.
  
  Reply
  - youssef oumate April 27, 2017 at 10:17 am #
    
    Awesome !
    
    Thank you so much
    
    Reply
    - Jason Brownlee April 28, 2017 at 7:26 am #
      
      You’re welcome.
      
      Reply
  - audrey April 14, 2020 at 5:40 am #
    
    how? please
    
    Reply
    - Jason Brownlee April 14, 2020 at 6:30 am #
      
      If you need help coding relu, see this:
      https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/
      
      Reply
Yahya Alaa April 30, 2017 at 2:38 am #

Hi Jason,
Thank you very much for this wonderful implementation of Neural Network, it really helped me a lot to understand neural networks concept,

n_inputs = len(dataset[0]) – 1
n_outputs = len(set([row[-1] for row in dataset]))
network = initialize_network(n_inputs, 2, n_outputs)
train_network(network, dataset, 0.5, 20, n_outputs)

What do n_inputs and n_outputs refer to? According to the small dataset used in this section, is n_inputs only 2 and n_outputs only 2 (0 or 1) or I am missing something?

Reply
- Jason Brownlee April 30, 2017 at 5:31 am #
  
  Input/outputs refers to the number of input and output features (columns) in your data.
  
  Reply
- Yahya Alaa May 3, 2017 at 1:42 pm #
  
  Is the program training the network for 500 epochs for each one of the k-folds and then testing the network with the testing data set?
  
  Reply
  - Jason Brownlee May 4, 2017 at 8:02 am #
    
    Hi Yahya,
    
    5-fold cross validation is used.
    
    That means that 5 models are fit and evaluated on 5 different hold out sets. Each model is trained for 500 epochs.
    
    I hope that makes things clearer Yahya.
    
    Reply
    - Yahya Alaa May 4, 2017 at 8:17 am #
      
      Yes you made things clear to me, Thank you.
      I have two other questions,
      How to know when to stop training the network to avoid overfitting?
      How to choose the number of neurons in the hidden layer?
      
      Reply
      - Jason Brownlee May 5, 2017 at 7:27 am #
        
        You can use early stopping, to save network weights when the skill on a validation set stops improving.
        
        The number of neurons can be found through trial and error.
      - Yahya Alaa May 6, 2017 at 8:48 am #
        
        I am working on a program that recognizes handwritten digits, the dataset is consisting of pictures (45*45) pixels each, which is 2025 input neurons, this causes me a problem in the activation function, the summation of (weight[i] * input[i]) is big, then it gives me always a result of (0.99 -> 1) after putting the value of the activation function in the Sigmoid function, any suggestions?
      - Jason Brownlee May 7, 2017 at 5:31 am #
        
        I would recommend using a Convolutional Neural Network rather than a Multilayer Perceptron.
morok April 30, 2017 at 3:56 am #

In section 3.2. Error Backpropagation, where did output numbers came from for testing backpropagation

‘output’: 0.7105668883115941
‘output’: 0.6213859615555266
‘output’: 0.6573693455986976

Perhaps from outputs on test forward propagation [0.6629970129852887, 0.7253160725279748] taking dd -> derivative = output * (1.0 – output), problem is they don’t match, so I’m a bit lost here…

thanks!

Awesome article!!!

Reply
- Jason Brownlee April 30, 2017 at 5:34 am #
  
  In that example, the output and weights were contrived to test back propagation of error. Note the “delta” in those outputs.
  
  Reply
- Massa November 25, 2017 at 7:36 am #
  
  hello Dr Jason…
  
  I was wondering …
  
  n_outputs = len(set([row[-1] for row in dataset]))
  
  this line, how does it give the number of output features?
  when I print it gives the number of the dataset(number of rows, not columns)
  
  Reply
  - Jason Brownlee November 25, 2017 at 10:25 am #
    
    The length of the set of values in the final column.
    
    Perhaps this post will help with Python syntax:
    https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-python/
    
    Reply
    - Massa November 26, 2017 at 4:02 am #
      
      but I thought it gives the number of outputs…I mean the number of neurons in the output layer.
      
      here it’s giving the number of the dataset ….if I have 200 input/output pairs it prints 200
      
      so I am confused…how would it be?
      
      Reply
      - Jason Brownlee November 26, 2017 at 7:35 am #
        
        If there are two class values, it should print 2. It should not print the number of examples.
Umamaheswaran May 8, 2017 at 9:49 pm #

Hi Jason,

I am using the MNIST data set to implement a handwritten digit classifier. How many training examples will be needed to get a perfomance above 90%.

Reply
- Jason Brownlee May 9, 2017 at 7:42 am #
  
  I would recommend using a CNN on MNIST. See this tutorial:
  https://machinelearningmastery.com/handwritten-digit-recognition-using-convolutional-neural-networks-python-keras/
  
  Reply
Huyen May 9, 2017 at 6:32 pm #

Hi Jason,

Your blog is totally awesome not only by this post but also for the whole series about neural network. Some of them explained so much useful thing than others on Internet. They help me a lot to understand the core of network instead of applying directly Keras or Tensorflow.

Just one question, if I would like to change the result from classification to regression, which part in back propagation I need to change and how?

Thank you in advance for your answer

Reply
- Jason Brownlee May 10, 2017 at 8:46 am #
  
  Thanks Huyen.
  
  You would change the activation function in the output layer to linear (e.g. no transform).
  
  Reply
TGoritsky May 12, 2017 at 12:41 am #

Hi Jason,

I am playing around with your code to better understand how the ANN works. Right now I am trying to do predictions with a NN, that is trained on my own dataset, but the program returns me one class label for all rows in a test dataset. I understand, that normalizing dataset should help, but it doesn`t work (I am using your minmax and normalize_dataset functions). Also, is there a way to return prediction for one-dimensional dataset?
Here is the code (sorry for lack of formatting):
def make_predictions():
dataset = [[29,46,107,324,56,44,121,35,1],
[29,46,109,327,51,37,123,38,1],
[28,42,107,309,55,32,124,38,1],
[40,112,287,59,35,121,36,1],
[27,43,129,306,75,41,107,38,1],
[28,38,127,289,79,40,109,37,1],
[29,37,126,292,77,35,100,34,1],
[30,40,87,48,77,51,272,80,2],
[26,37,88,47,84,44,250,80,2],
[29,39,91,47,84,46,247,79,2],
[28,38,85,45,80,47,249,78,2],
[28,36,81,43,76,50,337,83,2],
[28,34,75,41,83,52,344,81,2],
[30,38,80,46,71,53,347,92,2],
[28,35,72,45,64,47,360,101,2]]
network = [[{‘weights’: [0.09640510259345969, 0.37923370996257266, 0.5476265202749506, 0.9144446394025773, 0.837692750149296, 0.5343300438262426, 0.7679511829130964, 0.5325204151469501, 0.06532276962299033]}],
[{‘weights’: [0.040400453542770665, 0.13301701225112483]}, {‘weights’: [0.1665525504275246, 0.5382087395561351]}, {‘weights’: [0.26800994395551214, 0.3322334781304659]}]]
# minmax = dataset_minmax(dataset)
# normalize_dataset(dataset, minmax)
for row in dataset:
prediction = predict(network, row)
print(‘Expected=%d, Got=%d’ % (row[-1], prediction))

Reply
- Jason Brownlee May 12, 2017 at 7:43 am #
  
  I would suggest exploring your problem with the Keras framework:
  https://machinelearningmastery.com/start-here/#deeplearning
  
  Reply
Tomo May 18, 2017 at 6:22 pm #

Hi Jason!
In the function “backward_propagate_error”, when you do this:

neuron[‘delta’] = errors[j] * transfer_derivative(neuron[‘output’])

The derivative should be applied on the activation of that neuron, not to the output . Am I right??

neuron[‘delta’] = errors[j] * transfer_derivative(activate(neuron[‘weights’], inputs))

And inputs is:
inputs = row[-1]
if i != 0:
inputs = [neuron[‘output’] for neuron in self.network[i-1]]

Thank you! The post was really helpful!

Reply
- Adika February 2, 2021 at 2:30 am #
  
  I think you are right but not sure.
  
  Reply
Tina May 26, 2017 at 3:49 am #

Hello Jason!

This is a very interesting contribution to the community 🙂
Have you tried using the algorithm with other activation functions?
I tried with Gaussian, tanh and sinx, but the accuracy was not that high, so I think that I omitted something. What I altered were the activation functions and the derivatives. Is there something else that needs to be changed?

Reply
- Jason Brownlee June 2, 2017 at 11:49 am #
  
  Sigmoid was the defacto standard for many years because it performs well on many different problems.
  
  Now the defacto standard is ReLU.
  
  Reply
  - Manu June 6, 2017 at 8:50 pm #
    
    Sigmoid and ReLU are transfer functions right ?
    Activation function is just the sum of all weights and inputs
    
    Reply
    - Jason Brownlee June 7, 2017 at 7:12 am #
      
      You are correct, but in some frameworks, transfer functions are called activation functions:
      https://keras.io/activations/
      
      Reply
vishwanathan May 27, 2017 at 8:08 pm #

Thanks for the great post. Here is some observation that I am not able to understand. In the back ward propagate you are not taking all the weights and only considering the jth. Can you kindly help understand. I was under the impression that the delta from output is applied across all the weights,
for neuron in network[i + 1]:
error += (neuron[‘weights’][j] * neuron[‘delta’])

Reply
- vishwanathan May 27, 2017 at 8:14 pm #
  
  I understand you do not want to take in the bias weight hence the exclusion of the last weight in neuron. I kind of get stumped on bias.
  
  Reply
vishwanathan May 27, 2017 at 9:12 pm #

Thanks for the great article. In the backward propagate, the delta value is applied for each weight across the neuron and the error is summed. I am curious why is the delta not applied to individual weights of the neuron and the error summed for that neuron. Can you please clarify?

Reply
Josue May 29, 2017 at 3:12 am #

Why don’t you split the data into TrainData and TestData, like 80% of the dataset for training and 20% for testing, because if you train with 100% of rows of the dataset and then test some rows of the dataset the accuracy will be good . But if you put new data on the seeds.csv the model will work with less accuracy, Right?

Reply
- Jason Brownlee June 2, 2017 at 12:16 pm #
  
  You can, k-fold cross validation generally gives a better estimate of model performance.
  
  Once we have the estimate and choose our model, we can fit the final model on all available data and make predictions on new data:
  https://machinelearningmastery.com/train-final-machine-learning-model/
  
  Reply
Josue May 29, 2017 at 11:08 am #

Thanks for the post! I have a question about cross-validation. The dataset of seeds is perfect for 5 folds but for a dataset of 211? I’ll have uniformly sized subset right? (211/5) Can you give me a suggestion how I could handle that ?
Thanks in advanced.

Reply
- Jason Brownlee June 2, 2017 at 12:20 pm #
  
  One way is that some records can be discarded to give even sized groups.
  
  Reply
Sebastián May 30, 2017 at 9:35 am #

Thanks so much for the tutorial. It was really helpful!

Reply
- Jason Brownlee June 2, 2017 at 12:31 pm #
  
  I’m glad it helped.
  
  Reply
Manu June 10, 2017 at 9:00 pm #

Hello Jason,

any advice on how to handle multi-classifier problems when the classes have high cardinality ?
I’m thinking about input data of search engines linked to choosen urls.

Reply
- Jason Brownlee June 11, 2017 at 8:25 am #
  
  Ouch, consider modeling it as regression instead (e.g. a rating or recommender system).
  
  Reply
  - Manuel June 13, 2017 at 1:17 am #
    
    Ok thank you very much Jason.
    But it wont work with searches unseen by the algorithm.
    I red something in the books “Programming collective intelligence” about a neural net from scratch for this king of problem but I don’t understang how it works for the moments…
    
    Reply
    - Jason Brownlee June 13, 2017 at 8:23 am #
      
      Consider focusing on one measure/metric that really matters in your domain, then try a suite of framings of the problem and different algorithms to get a feeling for what might work best.
      
      Reply
Yash June 18, 2017 at 6:21 pm #

I am not able to understand the above code.So, I request you to explain me the above code

Reply
- Jason Brownlee June 19, 2017 at 8:43 am #
  
  Which part do you not understand exactly?
  
  Reply
Tathagat June 21, 2017 at 3:20 pm #

Hey Jason..am a novice in machine learning..have a small question…how can I track the timesteps involved in the algorithm with accordance with the code?

Reply
- Jason Brownlee June 22, 2017 at 6:04 am #
  
  What do you mean by time steps?
  
  Reply
bazooka June 29, 2017 at 6:52 am #

Hi, Jason. I am so confused, in the result, why there are 4 set of [output,weight,delta]

like this:
[{‘output’: 0.9999930495852168, ‘weights’: [0.9315463130784808, 1.0639526745114607, 0.9274685127907779], ‘delta’: -4.508489650980804e-09}, {‘output’: 0.9992087809233077, ‘weights’: [-2.4595353900551125, 5.153506472345162, -0.5778256160239431], ‘delta’: 1.940550145482836e-06}]
[{‘output’: 0.01193860966265472, ‘weights’: [2.3512725698865053, -8.719060612965613, 1.944330467290268], ‘delta’: -0.0001408287858584854}, {‘output’: 0.988067899681387, ‘weights’: [-2.2568526798573116, 8.720113230271012, -2.0392501730513253], ‘delta’: 0.0001406761850156443}]

after the backpropagation we find the optimal weights to get minimum error, what does these 4 group means?
E

Reply
- Jason Brownlee June 29, 2017 at 7:48 am #
  
  That is the internal state of the whole trained network.
  
  Reply
hassan June 29, 2017 at 7:30 am #

hi Jason
thanks for your code and good description here, i like it so much.
i run your example code and encounter with an error same others whom left note here
the error is:
expected[row[-1]] = 1
IndexError: list assignment index out of range

how i can fix this error?

Reply
- Jason Brownlee June 29, 2017 at 7:49 am #
  
  The code was written for Python 2.7, confirm that this is your Python version.
  
  Also confirm that you have copied the code exactly.
  
  Reply
Jerome July 5, 2017 at 9:20 pm #

Dear Jason,

i have this question about Back Propagate Error

1- derivative sigmoid = output * (1.0 – output)
That is ok

2- error = (expected – output) * transfer_derivative(output)
Ok but it also means that error == 0 for output = 1 whatever the expected is because transfer_derivative(1) ==0

So, whatever the expected , error is nil if output is 1 …
Is there something rotten here?

Thanks

Jerome

Reply
wddddds July 10, 2017 at 10:01 pm #

Thank you Jason, It’s a great tutorial and really helpful for me!

But I have to say that trying to reimplement your code strongly increased my ability of debugging 🙂

Reply
- Jason Brownlee July 11, 2017 at 10:32 am #
  
  Thanks.
  
  Reply
Victor July 17, 2017 at 7:50 pm #

Hi Jason,

Thanks for sharing your code. I’m a PhD candidate in machine learning, and I have a doubt about the weights update in section 4.1:

weight = weight + learning_rate * error * input

Should not it be as follows?

weight = weight – learning_rate * error * input

Thanks again for sharing this.

Regards,
Victor.

Reply
- Víctor August 4, 2017 at 11:07 pm #
  
  I didn’t say anything, my mistake in understanding.
  
  Thanks again for sharing your work.
  
  Reply
vishnu priya July 22, 2017 at 4:26 pm #

Hi..
Thanks for ur coding. It was too helpful. can u suggest me how to use this code for classifying tamil characters. i have tried in cnn and now i need to compare the result with bpn. can u pls suggest me.

thank you

Reply
- Jason Brownlee July 23, 2017 at 6:20 am #
  
  Perhaps this tutorial on classifying with a CNN would be more useful to you:
  https://machinelearningmastery.com/handwritten-digit-recognition-using-convolutional-neural-networks-python-keras/
  
  Reply
vishnu priya July 23, 2017 at 4:06 pm #

Thank you sir. With this tutorial i have implemented cnn sir. but for BPN i am getting error rate 687.203 sir. i dnt know what to do sir. can u help me sir.

Thank you

Reply
- Jason Brownlee July 24, 2017 at 6:49 am #
  
  What is the problem exactly?
  
  Reply
Vishnupriya July 24, 2017 at 4:53 pm #

Classification of Tamil characters sir. I have 144 different classes. I have taken 7 glcm features of each character and I need to train this features in backpropagation and predict the character to which class it belongs.

Reply
- Jason Brownlee July 25, 2017 at 9:34 am #
  
  Sound like a great project!
  
  Reply
codeo July 26, 2017 at 5:37 pm #

Hi, so I wasn’t following this tutorial when implementing my neural network from scratch, and mine is in JavaScript. I just need help with the theory. How do I calculate the error for each node in the net so that I can incrementally change the weights? Great tutorial btw

Reply
- codeo July 26, 2017 at 6:38 pm #
  
  Hahaha nevermind, it was my code
  Multidimensional arrays and stuff boggle the mind hah
  
  Reply
  - Jason Brownlee July 27, 2017 at 7:56 am #
    
    Glad to hear you worked it out.
    
    Reply
PRABHAKARAN M July 31, 2017 at 4:31 pm #

[ 6.38491205 5.333345 4.81565798 5.43552204 9.96445304 2.57268919 4.07671018 1.5258789 6.19728301 0 1 ]
Dear sir,
the above mentioned numerical values are extracted from the dental x-ray image using gray level co occurrence matrix [10 inputs and 1 output]. This dataset is used as a input for BPN classifier. whether the same data set as[.csv] file can be used as the input for DEEP Convolutional Neural Network technique ? and can i get the output as image ? for example if i give the dental x ray images as numerical values i have to get the caries affected teeth as the output for the given dataset.

Reply
- Jason Brownlee August 1, 2017 at 7:51 am #
  
  That sounds like a great problem. It may be possible.
  
  I would recommend using deep CNNs.
  
  Perhaps this tutorial will give you some ideas on how to get started:
  https://machinelearningmastery.com/handwritten-digit-recognition-using-convolutional-neural-networks-python-keras/
  
  You may want to look at some papers on object localization in images. I don’t have material on it sorry.
  
  Reply
PRABHAKARAN M July 31, 2017 at 4:32 pm #

can i get the example code for dental caries detection using deep Convolutional Neural Network for the given dataset as x ray images.

Reply
- Jason Brownlee August 1, 2017 at 7:52 am #
  
  I do not have sample code for this problem, sorry.
  
  Reply
John August 1, 2017 at 3:26 am #

Very nice explanation, thank you.
I have some questions.

1) weight = weight + learning_rate * error * input

Do I really need to multiply it with input ? For example here http://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html they dont it multiply it with input. At least, I think that…

2) Is your method same as in http://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html?
i think yes, but again, Im not sure and Im confused by that input multiplication.

3) What is exactly loss function in your example (I usually found some derivations of loss (cost ?) function (in other explanations), not transfer function derivation)? Im actually very confused by notation which I find around …

4) momentum and weight decay. In your example, you can implement them that you substract calculated decay and add calculated momentum (to weight update) ? Again, I found forms which substract both and weight update as w + deltaW, so again I’m mega confused by notation for backpropagation which I found…

Sorry for dumb questions, … math is not my strong side, so many things which can be inferred by math sense are simply hidden for me.

Reply
- John August 1, 2017 at 3:30 am #
  
  *substract both and weight update as w + deltaW, so again
  
  I found above sentence as nonsense, must be side effect of my confusion …
  
  Reply
  - Jason Brownlee August 1, 2017 at 8:12 am #
    
    Hang in there.
    
    Pick one tutorial and focus on it. Jumping from place to place will make things worse for sure.
    
    Reply
- Jason Brownlee August 1, 2017 at 8:10 am #
  
  Hi John, good questions.
  
  According to my textbook, yes.
  I can’t speak for random sites on the internet sorry.
  
  Loss is prediction error. You can change this to other forms like MAE or MSE.
  
  No decay or momentum in this example. Easy to add if you want. There are many ways to dial in the learning process. No hard and fast rules, just some norms that people reuse.
  
  Reply
Parminder Kaur August 6, 2017 at 7:50 pm #

A VERY GOOD TUTORIAL SIR…
Sir i am implementing remote sensed image classification using BPN neural network using IDL.
I am not finding good resources on constructing features for input dataset and also number of hidden layers and number of neurons in hidden layer.
Any resources you know, can help me?

Thanks

Reply
- Jason Brownlee August 7, 2017 at 8:41 am #
  
  The CNN will perform feature extraction automatically, you could explore using different filters on the data to see if it helps the network.
  
  The number of layers and neurons/filters per layer must be found using trial and error. It is common to copy the designs from other papers as a starting point.
  
  I hope that helps.
  
  Reply
pero August 9, 2017 at 1:11 am #

Nice tutorial, very clean and readable code. =) thank you!

Reply
- Jason Brownlee August 9, 2017 at 6:37 am #
  
  Thanks pero.
  
  Reply
Vatandas August 15, 2017 at 3:28 am #

1. I expect that this code is deep learning (many hidden layer) but not. One sentence is easy (“you can add more hidden layer as explained”) but to do is not as easy as you said.

2. I think your code is wrong.
neuron[‘delta’] = errors[j] * transfer_derivative(neuron[‘output’])
but
Error = Target – ActivatedOutputNode
Delta = Error * Derivative(NONActivatedOutputNode)

I mean you use the same ‘output’ variable both error and delta. But in error it must be activated one, in delta it must be NONactivated one.

Reply
- A Researcher May 2, 2019 at 3:10 am #
  
  Exactly, this article is completely misleading :S
  
  Reply
8CG_256 August 18, 2017 at 9:02 am #

Nice tutorial, very clean code and beginner-friendly. Thank you very much!

Reply
- Jason Brownlee August 18, 2017 at 4:36 pm #
  
  Thanks, I’m glad you found it useful!
  
  Reply
- 8CG_256 August 18, 2017 at 9:26 pm #
  
  I only have one slight issue: I implemented this in Ruby and I tried to train it using the IRIS dataset, keeping the network simple (1 input layer, 1 hidden layer, 1 output layer) and after decreasing for a while the error rate keeps increasing. I tried lowering the learning rate, even making it dynamic so it decreases whenever the error increases but it doesn’t seem to help. Could you give me some advice? P.S sorry for my bad English
  
  Reply
  - Jason Brownlee August 19, 2017 at 6:19 am #
    
    Here is an example of backprop I developed in Ruby:
    http://cleveralgorithms.com/nature-inspired/neural/backpropagation.html
    
    Reply
Derek Martins August 22, 2017 at 9:22 pm #

Hi Jason, I enjoy so much your tutorials. Can you do a tutorial implementing BackPropagation Through Time? Thanks man.

Reply
- Jason Brownlee August 23, 2017 at 6:50 am #
  
  Thanks for the suggestion.
  
  I have a few posts on the general topic, for example:
  https://machinelearningmastery.com/gentle-introduction-backpropagation-time/
  
  Reply
Anubhav Singh August 24, 2017 at 1:08 pm #

Hello Jason,

Thank you for the great tutorial!

I would like to know how I can obtain the weight*input for every single neuron in the network…

I’ve been trying these lines –

for layer in network:
new_inputs = []
for neuron in layer:
activation = activate(neuron[‘weights’], inputs)
neuron[‘output’] = transfer(activation)
new_inputs.append(neuron[‘output’])

but the activation variable here is a single value…what I understand is that if I have set n_hidden = 5 (number of hidden layers), I should get N*5 (N = number of features in the dataset) outputs if I print the activation…

Kindly help 🙂

Thank you!

Reply
Jose Panakkel August 25, 2017 at 10:45 am #

Dear Jason,

I have a question on the delta calculation at the output layer, where
the primary value is the difference between the neuron output and
the expected output. And we are then multiplying this difference
with the transfer_derivative. where transfer_derivative is a function
of neuron’s output.

My question is, is it correct to find the difference between the
neuron’s output and the expected output?

In this case of the example, you have chosen digital outputs [0,1]
and hence it may not have come up .. but my point is…
one is already subjected to a transfer function, and one is not.

The neuron’s output is always subjected to a transfer function and
hence will be in a specific range, say -.5 to +.5 or something..
But the expected output is the user’s choice .. isnt it?
user can have an expected value of say 488.34, for some stock price
learning.. then is it still correct to find this primary difference
between the expected output and the neuron output, at the output
layer delta calculation?

shoulnt the expected output also be subjected to the same transfer
function before finding the difference? Or the otherway, like
shoulnt the neuron ouptut be subjected to a reverse transfer function
before comparing with the expected output directly?

Thanks and Regards,
Jose Panakkel

Reply
RealUser404 September 6, 2017 at 1:36 pm #

Hello Jason, great tutorial that helped me a lot!

I have a question concerning the back-propagation : what if instead of having an error function I only have a desired gradient for the output (in the case of an actor-critic model for example)?
How can I change your backprop function to make it work? Or can I just use the gradient as the error?

Reply
- Jason Brownlee September 7, 2017 at 12:49 pm #
  
  Sorry, I don’t follow, perhaps you can restate your question with an example?
  
  Reply
user28 September 8, 2017 at 9:26 pm #

Hi Jason , thank you for providing this tutorial. I’m confused of how can I implement the same backpropagation algorithm with output not binary. Since I noticed that your example has binary output. Like predicting for stock price given the open, high, low and close values. Regards.

Reply
- Jason Brownlee September 9, 2017 at 11:55 am #
  
  Use a library like Keras. Start here:
  https://machinelearningmastery.com/start-here/#deeplearning
  
  Reply
Lewis September 11, 2017 at 2:11 am #

Hi Jason,

great article. I have an interest in NN but I am not that good at python.

Want I wanted to try was to withhold say 5 rows from the dataset and have the trained network predict the results for those rows. these is is different from what I think the example does which is rolling predictions with the learning. Removing 5 rows from the dataset is of course easy but my pitiful attempts at predicting with unseen data like below fail ((I guess network is not in scope at the end): any help appreciated!

# predict unseen data
unseendataset = [[12.37,13.47,0.8567,5.204,2.96,3.919,5.001],
[12.19,13.2,0.8783,5.137,2.981,3.631,4.87],
[11.23,12.88,0.8511,5.14,2.795,4.325,5.003],
[13.2,13.66,0.8883,5.236,3.232,8.315,5.056],
[11.84,13.21,0.8521,5.175,2.836,3.598,5.044],
[12.3,13.34,0.8684,5.243,2.974,5.637,5.063]]

for row in unseendataset:
prediction2 = predict(network, row)
print(‘Predicted=%d’ % (prediction2))

Reply
- Jason Brownlee September 11, 2017 at 12:08 pm #
  
  I would recommend starting with Keras rather than coding the algorithm from scratch.
  
  Start here:
  https://machinelearningmastery.com/start-here/#deeplearning
  
  Reply
Karim September 14, 2017 at 1:27 pm #

Hi Jason, I am trying to generalize your implementation to work with a variable number of layers and nodes. However, whenever I try to increase the number of nodes too much it stops working (the network freezes at one error rate and all output nodes are active, i.e. giving 1). Although the code would work if I decreased the layers and the errors will go down.
Is there something I am missing when using too many layers? The concepts should be the same.

I trained a network with 4 layers: [14,10,10,4] and it worked.
I trained a network with 4 layers [14,100,40,4] and it is stuck. Same dataset.

My code is here if you are looking in more details:
https://github.com/KariMagdy/Implementing-a-neural-network

Thanks

Reply
- Jason Brownlee September 15, 2017 at 12:10 pm #
  
  What problem do you get exactly?
  
  Reply
Laksh October 4, 2017 at 11:11 pm #

Hi, Jason Brownlee,
can we extend this code for 2 or more hidden layers ?

Reply
- Jason Brownlee October 5, 2017 at 5:24 am #
  
  Sure.
  
  Reply

dsliver33 October 9, 2017 at 1:52 pm #

Dear Mr. Brownlee,

I’m trying to alter the code to represent a regression problem (sigmoid on hidden layer, linear on output layer). As far as I know, the main part of the code that would have to be modified is the FF algorithm. I’ve rewritten the code as below:

# Forward propagate input to a network output
def forward_propagate_regression(network, row):
    inputs = row
    new_inputs = []
    #gets the 1st layer, applies sigmoid activation
    hiddenlayer = network[0]
    for neuron in hiddenlayer:
        activation = activate(neuron['weights'], inputs)
        neuron['output'] = transfer(activation)
        new_inputs.append(neuron['output'])
    inputs = new_inputs
    #gets the last layer, applies linear activation
    outputlayer = network[-1]
    for neuron in outputlayer:
            activation = activate(neuron['weights'], inputs)
            neuron['output'] = activation
            new_inputs.append(neuron['output'])
    inputs = new_inputs
    return inputs

# Forward propagate input to a network output

def forward_propagate_regression(network, row):

inputs = row

new_inputs = []

#gets the 1st layer, applies sigmoid activation

hiddenlayer = network[0]

for neuron in hiddenlayer:

activation = activate(neuron['weights'], inputs)

neuron['output'] = transfer(activation)

new_inputs.append(neuron['output'])

inputs = new_inputs

#gets the last layer, applies linear activation

outputlayer = network[-1]

for neuron in outputlayer:

activation = activate(neuron['weights'], inputs)

neuron['output'] = activation

new_inputs.append(neuron['output'])

inputs = new_inputs

return inputs

With this code, I’m getting an “OverflowError: (34, ‘Result too large’)” error. Could you please tell what I’m doing wrong? All the other parts of the code are as you’ve written.

Jason Brownlee October 9, 2017 at 4:47 pm #

What did you change exactly? Can you highlight the change for me?

Also, try using pre tags.

Reply
- dsliver33 October 10, 2017 at 4:08 am #
  
  (I don’t know how to highlight the change, sorry!)
  
  I got the hidden layer (network[0]), and I applied your algorithm (calculate activation, transfer the activation to the output, append that to a new list called “new_inputs”).
  
  After that, I get the output layer (network[-1]), I calculate the activation with the “new_inputs”, but I do NOT apply the sigmoid transfer function (so, the outputs should be linear). The results are appended to a new list, which is set to be the return of the function.
  
  Would that be the best way to remove the sigmoid function from the output layer, making the code a regression, instead of a classification?
  
  Reply
  - Jason Brownlee October 10, 2017 at 7:52 am #
    
    Sounds good. I don’t have any good ideas, I’d recommend stepping through some calculations to help spot where it is going wrong.
    
    You may want to consider moving to an open source neural net library, such as Keras:
    https://machinelearningmastery.com/start-here/#deeplearning
    
    Reply
Liam McGoldrick October 26, 2017 at 5:23 am #

I am having the same issue with mine. i made alterations and they are just the same as yours. Did you find a solution?

Reply
Liam McGoldrick October 26, 2017 at 5:27 am #

I GOT IT TO WORK!!! You have to normalize your output data. Then you can apply the transfer function to the output layer just the same! After that it will work!

Reply
- Steven August 20, 2019 at 8:40 pm #
  
  But didn’t you changed the function ‘train_network’ ???
  
  Reply
- Urvi Deole March 12, 2021 at 2:21 am #
  
  Could you please mention the functions you made changes to to get the code to work for regression?
  
  Reply

Chris October 12, 2017 at 11:27 am #

Hi Jason, nice posting and it really helps a lot
for j in range(len(layer)):
neuron = layer[j]
neuron[‘delta’] = errors[j] * transfer_derivative(neuron[‘output’])
Should the neuron[‘output’] be the output of the activation function instead of the transfer function here?

Reply
Asad October 14, 2017 at 3:24 pm #

hi jason, nice post its really helps alot.

please tell me how we can change the neuron in hidden layer and in output layer?
and what will be the result when we change the neuron in hidden layer and in output layer?
in this tutorial u take one hidden layer,so can we use more than one hidden layer? and how?

please tell me i m waiting

Reply
- Jason Brownlee October 15, 2017 at 5:19 am #
  
  Perhaps you would be better served by starting with a neural network library such as Keras:
  https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
  
  Reply
dsliver33 October 16, 2017 at 2:27 pm #

Dear Mr. Brownlee,

I’m trying to adapt the code to support many hidden layers. I’ve adapted the code as below, with a new input called “n_layers”, to insert N hidden layers in the network.

# Initialize a network with “n_layers” hidden layers
def initialize_network3(n_inputs, n_hidden, n_layers, n_outputs):
network = list()
for i in range(n_layers):
hidden_layer = [{‘weights’:[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
network.append(hidden_layer)
output_layer = [{‘weights’:[random() for i in range(n_hidden)]} for i in range(n_outputs)]
network.append(output_layer)
return network

When I try to run the code, it shows the error below. Do you have any idea why?

in backward_propagate_error(network, expected)
78 error = 0.0
79 for neuron in network[i + 1]:
—> 80 error += (neuron[‘weights’][j] * neuron[‘delta’])
81 errors.append(error)
82 else:

IndexError: list index out of range

Reply
- dna_remaps February 3, 2018 at 10:43 pm #
  
  This took me a minute to figure out myself.
  
  You need to add a conditional after your first layer to make sure your subsequent hidden layer weights have the proper dimensions (n_hidden+1, n_hidden)
  
  for i in range(n_layers):
  hidden_layer = [{‘weights’:[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
  if i > 0:
  hidden_layer = [[{‘weights’:[random() for i in range(n_hidden + 1)]} for i in range(n_hidden)]
  network.append(hidden_layer)
  
  Reply
Arijit Mukherjee October 17, 2017 at 1:40 am #

Hi,

In the output/last layer when we are calculating the backprop error why are we multiplying with the transfer derivative with the (expected-output)?? transfer derivative is already canceled out for the the last layer , the update should be only (expected-output)*previous_layer_input , ???
Thanks

Reply
Tanoh Henry October 18, 2017 at 8:54 pm #

Really good article. Thanks a lot.
Need a little bit of clarification.
For backward propagation starting at the output layer,
you get the error by appending to errors expected[j] – neuron[‘output’].
Isn’t Error = 0.5 * sum(errors)?
and then using this sum of errors for back-propagation?
Thanks.

Reply
Liam October 21, 2017 at 5:41 am #

Thanks for the tutorial! I am trying to modify your code to do a regression model and I am stuck. I have an input data set (4 columns and many rows) and a single variable output data set (in range of tens of thousands). I fed them into the train procedure and I get an error when it reaches “expected = [0 for i in range(n_outputs)]” in the train portion. The error reads “only length-1 arrays can be converted to Python scalar”. Now I understand this is because of the intended purpose for the code was a categorization problem but I am wondering what I would need to modify to get this to work? Any help would go a long way as I have been stuck on this issue for some time now.

Thanks, and again wonderful tutorial!

Reply
- Jason Brownlee October 21, 2017 at 5:45 am #
  
  Perhaps start with Keras, it will be much easier for you:
  https://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/
  
  Reply
Sam October 26, 2017 at 11:15 pm #

Hi
I am implementing a 2 layer neural network with 100 hidden units in the first layer and 50
in the next using your code. Implement sigmoid activation function in each layer. Train/test your
model on the MNIST dataset subset.
But it is always giving same prediction.
[0.99999999986772, 0.99999999994584]
Expected=0, Got=1
[0.99999999986772, 0.99999999994584]
Expected=1, Got=1
[0.99999999986772, 0.99999999994584]
Expected=1, Got=1
[0.99999999986772, 0.99999999994584]
Expected=1, Got=1
[0.99999999986772, 0.99999999994584]
Expected=1, Got=1
[0.99999999986772, 0.99999999994584]
Expected=0, Got=1
[0.99999999986772, 0.99999999994584]
Expected=0, Got=1
[0.99999999986772, 0.99999999994584]
Expected=1, Got=1
[0.99999999986772, 0.99999999994584]
Expected=0, Got=1
[0.99999999986772, 0.99999999994584]
Expected=0, Got=1
[0.99999999986772, 0.99999999994584]
Expected=0, Got=1
[0.99999999986772, 0.99999999994584]
Expected=0, Got=1

Reply
- Jason Brownlee October 27, 2017 at 5:20 am #
  
  I would recommend using a framework like Keras:
  https://machinelearningmastery.com/handwritten-digit-recognition-using-convolutional-neural-networks-python-keras/
  
  Reply
- Matthias April 15, 2018 at 2:57 am #
  
  Weights should be initialized with normally distributed random values. Try using random.gauss for weight initialization.
  
  Reply
John October 28, 2017 at 6:52 pm #

help, I dont know why i got this error.

Traceback (most recent call last):
File “a.py”, line 185, in
for i in range(len(dataset[0])-1):
TypeError: ‘NoneType’ object has no attribute ‘__getitem__’

Reply
- Jason Brownlee October 29, 2017 at 5:52 am #
  
  You cannot have the “-1” within the call to len()
  
  Reply
João Costa November 12, 2017 at 12:46 pm #

Hey Jason, thanks for your post!

This is helping me a lot with a college work. But in this NN, how can I set manually not the number of input neuros, the input values?

For example, if I have 1 input neuro, I wan’t to set this value to 0.485.

Best regards!

Reply
- Jason Brownlee November 13, 2017 at 10:11 am #
  
  Sorry, I don’t follow.
  
  Perhaps you’d be better off using a library like Keras:
  https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
  
  Reply
yesta November 17, 2017 at 8:21 am #

Hi, Jason
Thank you for this amazing tutorial!

I have a question that may be out of the topic. How do you call models or type of DL models where you feed a model with new test data in order to make the model adaptive to the environment?

Thank you.

Reply
- Jason Brownlee November 17, 2017 at 9:31 am #
  
  Yes, you can update a model after it has been trained.
  
  Reply
Nil December 6, 2017 at 7:47 pm #

Hi, Dr. Jason,

I have been studying how to develop a neural network from scratch and this tutorial is the main one I have been following because it is helping me so much.
I have a doubt: When I study the theory I see the neural network scheme carrying only the weights and bias. And here in practice I see that the network is also carrying the output values and the delta i.e (weights, bias, output and delta). Will the final model be saved like this? with the latter (weights, bias, output and delta)? would this be the rule in practice?

I would appreciate it if you could help with this issue so that I could get out of where I left off.

Your posts are really very good there is where I find my way in to learning in Machine Learning.

Best Regards

Reply
- Jason Brownlee December 7, 2017 at 7:51 am #
  
  The final model (e.g. trained) only needs to perform the forward pass.
  
  Reply
  - Nil December 8, 2017 at 5:50 am #
    
    Understood.
    Thank you Dr. Jason
    
    Reply
MohamedElshazly December 8, 2017 at 9:30 pm #

Hi , there’s something i don’t understand :

for i in reversed(range(len(network))): layer = network[i] errors = list() if i != len(network)-1: for j in range(len(layer)): error = 0.0 for neuron in network[i + 1]:

1
2
3
4
5
6
7

for i in reversed(range(len(network))):
layer = network[i]
errors = list()
if i != len(network)-1:
for j in range(len(layer)):
error = 0.0
for neuron in network[i + 1]:

wouldn’t the last line be out of range because the current ‘ i ‘ is the last one and i can’t go beyond it by 1 ? thanks in advance

Reply
- Jason Brownlee December 9, 2017 at 5:41 am #
  
  No, because of the “if” check on the 4th line down.
  
  Reply
Olu December 9, 2017 at 3:17 am #

Hi Mr Brownlee,

Thank you for your tutorial. The training for the example worked however when I try to implement the code for the Wheat Seeds Dataset I get an error from my line 210:

for i in range(len(dataset[0]) – 1):
str_column_to_float(dataset, i)

The error is: IndexError: list index out of range

Can you please explain why it is (dataset[0])? Does (dataset[0]) means the 1st column in the dataset?

Reply
- Jason Brownlee December 9, 2017 at 5:45 am #
  
  Yes, I recommend learning more about Python arrays here:
  https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-python/
  
  The example was written for Python 2.7, confirm your Python version.
  
  Reply
Jonesy December 12, 2017 at 3:17 pm #

Hello Jason,

Fantastic stuff here. I had a question about the network configuration. This 2 input, 2 hidden and 2 output seems a bit odd to me. I’m used to seeing 2, 2, 1 for XOR – can you explain why you have two output nodes and how they work with the max function? I think it would better explain this line for me in train():

expected[row[-1]] = 1

And lastly, why would one choose this configuration over a 2, 2, 1.

Thanks!

Reply
- Jason Brownlee December 12, 2017 at 4:12 pm #
  
  The final model has the shape [7, 5, 3]. Perhaps check the tutorial again? It has 3 outputs, one for each of the 3 classes in the dataset.
  
  Configuration was chosen via trial and error. There is no analytical way to choose a configuration for a neural network.
  
  Finally, you can learn more about array indexing in Python here:
  https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-python/
  
  Reply
Mohamed Elshazly December 17, 2017 at 9:34 pm #

Hi Jason

In the tain_network function the line “expected[row[-1]] = 1” what i understand is that you take the Y value of every row (which is either 0 or 1 ) and use it as an index in the expected array and you change the value at that index to 1 ,First i don’t know if i understand that correctly in the first place or not but if so, Wouldn’t the modification to the expected array be locked down to just only the first and second index because “expected[row[-1]] = 1” would only be expected[0] or expected[1] ? and how would that help in our algorithm .

looking forward to your response and thanks for the Great Tutorial

Reply
- Jason Brownlee December 18, 2017 at 5:22 am #
  
  We are one hot encoding the variable.
  
  Learn more about it here:
  https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/
  
  Reply
MohamedElshazly December 18, 2017 at 4:55 pm #

HI again Jason

If I’m implementing this algorithm in python 3 what should i change in expected[row[-1]]=1 in order for it to work because I’m having this error : list assignment index out of range
thanks in advance

Reply
- Jason Brownlee December 19, 2017 at 5:15 am #
  
  I don’t know off the cuff, I will look into porting the example to Py3 in the new year.
  
  Reply
Tushar December 19, 2017 at 5:29 pm #

You are just awesome Jason. You are adding more value to people’s ML skills than most average graduate schools do in the US.

Thanks a ton!

Reply
- Jason Brownlee December 20, 2017 at 5:39 am #
  
  Thanks Tushar.
  
  Reply
mark January 6, 2018 at 3:29 am #

Wow, thanks for your codes. I have a question, what if I want to add regularisation term like L2 during back propagation, what should i do?

Reply
- Jason Brownlee January 6, 2018 at 5:55 am #
  
  I would recommend moving to a platform like Keras:
  https://machinelearningmastery.com/start-here/#deeplearning
  
  Reply
  - mark January 7, 2018 at 5:12 pm #
    
    Thanks for replying. I know the keras and have been using keras for a while. But in the problem I am focusing on, I need to make changes on the back propagation. That’s why I didn’t use keras.
    So let’s go back to my original question, is the error term the cost function? Thanks.
    
    Reply
    - Jason Brownlee January 8, 2018 at 5:42 am #
      
      Sorry, I cannot work-through adding regularization to this tutorial for you.
      
      Reply
Mojo January 20, 2018 at 7:04 pm #

Hello Jeson,
Thanks for the informative tutorial. I have a question.
if i want to change the error equation and as well as the equation between input with hidden and hidden with output layer. How can i change it?
Hope you will reply in a short time.

Regards,
Mojo

Reply
- Jason Brownlee January 21, 2018 at 9:08 am #
  
  I would recommend using a library instead, it will be much easier for you.
  
  Here’s how to get started with Keras:
  https://machinelearningmastery.com/start-here/#deeplearning
  
  Reply
Aliya Anil February 16, 2018 at 9:04 pm #

Hi Jason,

It was indeed a very informative tutorial. Could you please explain the need for seed(1) in the code?

Reply
- Jason Brownlee February 17, 2018 at 8:44 am #
  
  I am trying to tie down the random number generator so that you get the same results as me.
  
  Learn more here:
  https://machinelearningmastery.com/randomness-in-machine-learning/
  
  Reply
Raj February 19, 2018 at 7:10 am #

Hey there,
Been following your tutorial and I’m having problems with using my dataset with it. The outputs of the hidden neurons appear to only be exactly 1 constantly. I’m not sure what’s wrong exactly or how to fix it but its resulting in the network not learning at all. Please let me know if you can help.
Thanks,
Raj

Reply
- Jason Brownlee February 19, 2018 at 9:10 am #
  
  Perhaps try to get the code and data in the tutorial working first and use that as a starting point for your own problem.
  
  Generally, I would recommend using a library like Keras for your own projects and only code methods from scratch as a learning exercise.
  
  Reply
Aliya Anil February 20, 2018 at 4:37 pm #

Hi,

I tried the first code in the tutorial with 4-parameter dataset, but it is not predicting like the 2-parameter set. Could you explain the reason?

Thanks,
Aliya

Reply
- Jason Brownlee February 21, 2018 at 6:36 am #
  
  If you are looking to develop a neural network for your own data, I would recommend the Keras library:
  https://machinelearningmastery.com/start-here/#deeplearning
  
  Reply
Nik March 2, 2018 at 1:54 pm #

Dear Jason,

Can I use the codes for handwritten digits recognition? If yes, are there any special recommendations what to change in the codes or I can use them with no changes?

Thanks,
Nik

Reply
- Jason Brownlee March 2, 2018 at 3:25 pm #
  
  I would recommend this tutorial:
  https://machinelearningmastery.com/handwritten-digit-recognition-using-convolutional-neural-networks-python-keras/
  
  Reply
  - Nik March 2, 2018 at 7:51 pm #
    
    Yes, I have seen that tutorial. But is there any way to use the codes from this tutorial? I just would like to understand why they work so well for seeds and do not work for handwritten digits…
    
    Reply
    - Jason Brownlee March 3, 2018 at 8:10 am #
      
      Yes, you can. You can develop a model with all pixels as inputs.
      
      I cannot write the modification for you, sorry, I just don’t have the capacity.
      
      Reply
Filoingko March 4, 2018 at 2:58 am #

Hi,

How can I use this trained network to predict another data set.

Thank you.

Reply
- Jason Brownlee March 4, 2018 at 6:04 am #
  
  The code in this tutorial is to teach you about backprop, not for use on real problems. If you are working through a problem, I’d recommend using Keras.
  
  Reply
Jean-Michel Richer March 5, 2018 at 9:27 pm #

Dear Jason,
I have tried to use your code on a simple XOR example but get a result of [0, 0, 1, 1] instead of [0,1,1,0]
Scores: [0.0]
Mean Accuracy: 0.000%

The input xor.csv file is
0,0,0
0,1,1
1,0,1
1,1,0

For this I have modified the evaluate_algorithm function to:
def evaluate_algorithm_no_fold(dataset, algorithm, *args):
scores = list()
predicted = algorithm(dataset, dataset, *args)
print(predicted)
accuracy = accuracy_metric(dataset, predicted)
scores.append(accuracy)
return scores

and call the function like this:
scores = evaluate_algorithm_no_fold(dataset, back_propagation, 0.1, 500, 4)

Would you have some explanation because I can not figure out why it is not working ?
Best regards,
JM

Reply
- Jason Brownlee March 6, 2018 at 6:12 am #
  
  Perhaps the model requires tuning to your new dataset.
  
  Reply
Tanveer March 5, 2018 at 9:29 pm #

Thank You So Much Jason !! Wonderful Tutorial. THANKS Much !!

Reply
- Jason Brownlee March 6, 2018 at 6:12 am #
  
  You’re welcome.
  
  Reply
Mojo March 9, 2018 at 10:06 pm #

If i want to calculate the training accuracy and F-measure and want to change the activation function, how i can do it?

Reply
- Jason Brownlee March 10, 2018 at 6:28 am #
  
  Perhaps you would be better off using scikit-learn and Keras instead.
  
  Reply
Fahad March 12, 2018 at 8:03 pm #

Is there something wrong with this code in case of using MINIST data? I tried to change the structured of the data to be compatible with the code, but it gave me a huge error and the error did not decrees during all training steps

Reply
- Jason Brownlee March 13, 2018 at 6:25 am #
  
  The code was not developed for MNIST. Here is an example of working with MNIST:
  https://machinelearningmastery.com/handwritten-digit-recognition-using-convolutional-neural-networks-python-keras/
  
  Reply
Fahad March 13, 2018 at 4:06 pm #

Thanks Jason for your response. I want to apply the code without keras. I tried to change the structure of the data to be each row as a vector of 784 pixel followed by a class label, but as I said it gave a huge error and does not decrees at all.

I am trying to develop some algorithm for enhancing of learning, hence, I need to deal with the procedure as step by step. So keras or any other library does not help.

Thanks again Jason

Reply
- Jason Brownlee March 14, 2018 at 6:17 am #
  
  Perhaps update the code to use numpy, it will be much faster.
  
  Reply
kelvin March 15, 2018 at 2:39 am #

Hi Mr Brownlee,

Can you teach me how to plot the errors per epochs (validation error) and accuracy for both training and validation in your scratch network?

Reply
- kelvin March 15, 2018 at 2:44 am #
  
  I only can find the training error but not validation error in the code. For the accuracy, I plot a graph have a straight line only.
  
  Reply
- Jason Brownlee March 15, 2018 at 6:33 am #
  
  Yes, you can do it easily in Keras here:
  https://machinelearningmastery.com/display-deep-learning-model-training-history-in-keras/
  
  Reply
  - kelvin March 15, 2018 at 12:19 pm #
    
    Is there any possible way to do it on your scratch network? for example which part of the code save the training error, validation error, training accuracy and validation accuracy? So I can plot the graph myself since your scratch model does not have “model” for me to save the history.
    
    Reply
    - Jason Brownlee March 15, 2018 at 2:50 pm #
      
      Yes, perhaps change it from CV to a single/train test, then evaluate the model skill on each dataset at the end of each epoch. Save the results in a list and return the lists.
      
      Reply
  - Zahra May 6, 2019 at 9:37 am #
    
    Hello, I’m so confuse..
    I try to run this code in command prompt. But, I use my dataset (not Wheat Seeds dataset).
    
    And why this happened? What’s wrong? What should I do? What should I change?
    Please, help me! ????????????????
    
    Traceback (most recent call last):
    File “journal.py”, line 197, in
    scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
    File “journal.py”, line 81, in evaluate_algorithm
    predicted = algorithm(train_set, test_set, *args)
    File “journal.py”, line 173, in back_propagation
    train_network(network, train, l_rate, n_epoch, n_outputs)
    File “journal.py”, line 150, in train_network
    expected[row[-1]] = 1
    IndexError: list assignment index out of range
    
    Reply
    - Jason Brownlee May 6, 2019 at 2:33 pm #
      
      Sorry, I cannot debug your dataset.
      
      Perhaps start with Keras for deep learning instead:
      https://machinelearningmastery.com/start-here/#deeplearning
      
      Reply
Jack March 15, 2018 at 12:11 pm #

Can I use this model for regression problem? For example us this model for boston house-prices dataset?

Reply
- Jason Brownlee March 15, 2018 at 2:50 pm #
  
  Sure, some changes would be required, such as the activation in the output layer would need to be linear.
  
  Reply
Nabil March 15, 2018 at 3:49 pm #

Are you using MSE?

Reply
- Jason Brownlee March 16, 2018 at 6:09 am #
  
  As mentioned in the post, we are reporting accuracy for the classification problem.
  
  Reply

Olu March 19, 2018 at 11:23 pm #

In the train section,

def train_network(network, train, l_rate, n_epoch, n_outputs):
    for epoch in range(n_epoch):
        for row in train:
            outputs = forward_propagate(network, row)
            expected = [0 for i in range(n_outputs)]
            expected[row[-1]] = 1
            backward_propagate_error(network, expected)
            update_weights(network, row, l_rate)

def train_network(network, train, l_rate, n_epoch, n_outputs):

for epoch in range(n_epoch):

for row in train:

outputs = forward_propagate(network, row)

expected = [0 for i in range(n_outputs)]

expected[row[-1]] = 1

backward_propagate_error(network, expected)

update_weights(network, row, l_rate)

Can you please explain how this expected[row[-1]] = 1 knows where to insert the 1 in the arrays of zero created.

Jason Brownlee March 20, 2018 at 6:23 am #

Good question.

expected is all zeros. row[-1] is the index of the class value. therefore we set the index of the class value in expected to 1.

Perhaps it is worth reading up on array indexing:
https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-python/

Reply
- kmillen November 16, 2018 at 11:27 am #
  
  Jason,
  This is an amazing piece of code that has been very beneficial.
  
  Doesn’t that mean that only expected[0] and expected[1] will ever be set to 1 for this test data?
  
  Thank you,
  
  Reply
  - Jason Brownlee November 16, 2018 at 1:58 pm #
    
    Sorry, I don’t understand your question?
    
    Reply
    - kmillen November 17, 2018 at 1:44 am #
      
      If I understand Python (which I may not), row[-1] represents the last item in the row. Since the last value in each of the 10 rows is only either 0 or 1, expected[row[-1]] = 1 will only ever set expected[0] or expected[1] to the value of 1. Or, what am I missing?
    - Jason Brownlee November 17, 2018 at 5:50 am #
      
      Are you referring to this: expected[row[-1]] = 1
      
      If so:
      
      “expected” is all zeros, e.g. [0, 0]
      “row” is an example, e.g. […] where the value at -1 is either 0 or 1
      
      Therefore row[-1] is an index of either 0 or 1 and we are marking the value in expected at that index as 1.
      
      We have created a one hot vector.
  - kmillen November 17, 2018 at 2:54 am #
    
    Disregard my previous question; I found the answer in a previous reply. Thank you again for this example.
    
    Reply

kelvin March 21, 2018 at 2:14 am #

Hi, I would like to use softmax as the activation function for output layer. However, I do not know how to write the code for the derivative of softmax. Can you show me the code how to change the sigmoid function from your code to softmax?

Reply
- kelvin March 21, 2018 at 2:20 am #
  
  I do try few ways to change the sigmoid to softmax, however, all of them are not working. Can you show me how to create a softmax layer?
  
  for transfer():
  first case:
  def transfer(input_value):
  exp_scores = np.exp(input_value)
  return exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
  
  second case:
  def transfer(input_value):
  input_value -= np.max(input_value)
  return np.exp(input_value) / np.sum(np.exp(input_value))
  
  third case:
  def transfer(input_value):
  input_value -= np.max(input_value)
  result = (np.exp(input_value).T / np.sum(np.exp(input_value))).T
  return result
  
  for transfer_derivative():
  first case:
  def transfer_derivative(output):
  s = output.reshape(-1, 1)
  return np.diagflat(s) – np.dot(s, s.T)
  
  second case:
  def transfer_derivative(output):
  jacobian_m = np.diag(output)
  for i in range(len(jacobian_m)):
  for j in range(len(jacobian_m)):
  if i == j:
  jacobian_m[i][j] = output[i] * (1 – output[i])
  else:
  jacobian_m[i][j] = -output[i] * output[j]
  return jacobian_m
  
  Reply
  - Jason Brownlee March 21, 2018 at 6:39 am #
    
    Here you go:
    https://en.wikipedia.org/wiki/Softmax_function#Artificial_neural_networks
    
    Reply
- Jason Brownlee March 21, 2018 at 6:38 am #
  
  Perhaps use Keras instead?
  
  Reply
Suede March 29, 2018 at 6:40 am #

hey Jason, this is very helpful. I have run the code but i keep on getting this error, can you please help me out? the error is:

NameError Traceback (most recent call last)
in ()
186 str_column_to_float(dataset, i)
187 # convert class column to integers
–> 188 str_columnto_int(dataset, len(dataset[0])-1)
189 # normalize input variables
190 minmax = dataset_minmax(dataset)

NameError: name ‘str_columnto_int’ is not defined

Reply
- Jason Brownlee March 29, 2018 at 6:42 am #
  
  The code was written for Python 2.7, confirm that you are using this version of Python?
  
  Reply
Fahri Güreşçi April 15, 2018 at 7:03 am #

The csv file is not working. Edited csv file > bit.ly/2GYX2dF
you can use python 2 or 3
results:
python2 > Mean Accuracy: 95.238%
python3 > Mean Accuracy: 93.333%

Why different?

Reply
- Jason Brownlee April 16, 2018 at 6:00 am #
  
  Wh is the CSV file not working? The link appears to work fine.
  
  Good question re the difference, no idea. Perhaps small differences in the API? The code was written for Py2, so it may require changes for py3.
  
  Also, see this post on the stochastic nature of ml algorithms:
  https://machinelearningmastery.com/randomness-in-machine-learning/
  
  Reply
Fahad April 18, 2018 at 8:58 pm #

I have altered the code to work with MNIST (digit numbers) , the problem I have faced that forward_propagate function returns [1 ,1 ,1 ,1 ,1 ,1 ,1 ,1 ,1 ,1 ] for each instance !

Any help

Reply
- Jason Brownlee April 19, 2018 at 6:30 am #
  
  Well done.
  
  The model will require tuning for the problem.
  
  Reply
Fahad April 19, 2018 at 7:09 am #

Could you explain more in some details please.

Reply
- Jason Brownlee April 19, 2018 at 2:45 pm #
  
  What details exactly?
  
  Reply
Fahad April 19, 2018 at 8:17 pm #

As I mentioned that forward_propagation function returns [1,1,1,1,1,1,1,1,1,1], what is the possible alter to come over this problem

Reply
- Jason Brownlee April 20, 2018 at 5:48 am #
  
  Perhaps tune the model to your specific problem. I have some suggestions here:
  https://machinelearningmastery.com/improve-deep-learning-performance/
  
  Reply
Fahad April 23, 2018 at 5:34 am #

I altered the code to work with XOR problem and it was working perfectly. Then, I altered the code to work with digit numbers MNIST, but as I told you there was a problem with the the forward_propagation function that it returned all outputs to be [1,1,1,…] instead of a probabilities for each output.
I think it is not an optimization problem, there is something wrong with the forward_propagate function.

Here is the code after alteration [it is working but with a fixed error during all training epochs

from random import seed
from random import randrange
from random import random
from csv import reader
from math import exp
global network
global gl_errors

# Load a CSV file
def load_csv(filename):
dataset = list()
with open(filename, ‘r’) as file:
csv_reader = reader(file)
for row in csv_reader:
if not row:
continue
dataset.append(row)
return dataset

# Convert string column to float
def str_column_to_float(dataset, column):
for row in dataset:
row[column] = float(row[column].strip())

def str_column_to_intX(dataset, column):
for row in dataset:
row[column] = int(row[column].strip())

# Convert string column to integer
def str_column_to_int(dataset, column):
class_values = [row[column] for row in dataset]
unique = set(class_values)
lookup = dict()
for i, value in enumerate(unique):
lookup[value] = i
for row in dataset:
row[column] = lookup[row[column]]
return lookup

# Find the min and max values for each column
def dataset_minmax(dataset):
minmax = list()
stats = [[min(column), max(column)] for column in zip(*dataset)]
return stats

# Rescale dataset columns to the range 0-1
def normalize_dataset(dataset):
for row in dataset:
for i in range(1,len(row)):
# row[i] = (row[i] – minmax[i][0]) / (minmax[i][1] – minmax[i][0])
if row[i]>10:
row[i]=1
else:
row[i]=0

# Split a dataset into k folds
def cross_validation_split(dataset, n_folds):
dataset_split = list()
dataset_copy = list(dataset)
fold_size = int(len(dataset) / n_folds)
for i in range(n_folds):
fold = list()
while len(fold) epoch=%d, lrate=%.3f, error=%.3f’ % (epoch, l_rate, sum_error))

# Calculate neuron activation for an input
def activate(weights, inputs):

activation = weights[-1]
for i in range(len(weights)-1):
activation += weights[i] * inputs[i]
return activation

# Transfer neuron activation
def transfer(activation):
return 1.0 / (1.0 + exp(-activation))

# Forward propagate input to a network output
def forward_propagate(network, row):
inputs = row[1:]
i=0

for layer in network:
new_inputs = []
i+=1
for neuron in layer:
activation = activate(neuron[‘weights’], inputs)
neuron[‘output’] = transfer(activation)
new_inputs.append(neuron[‘output’])
inputs = new_inputs

return inputs

# Calculate the derivative of an neuron output
def transfer_derivative(output):
return output * (1.0 – output)

# Backpropagate error and store in neurons
def backward_propagate_error(network, expected):
# err =0
for i in reversed(range(len(network))):
layer = network[i]
errors = list()
if i != len(network)-1:
for j in range(len(layer)):
error = 0.0
for neuron in network[i + 1]:
error += (neuron[‘weights’][j] * neuron[‘delta’])
errors.append(error)
else:
for j in range(len(layer)):
neuron = layer[j]
errors.append(expected[j] – neuron[‘output’])

for j in range(len(layer)):
neuron = layer[j]
neuron[‘delta’] = errors[j] * transfer_derivative(neuron[‘output’])

# Update network weights with error
def update_weights(network, row, l_rate):

for i in range(len(network)):
inputs = row[1:]
if i != 0:
inputs = [neuron[‘output’] for neuron in network[i – 1]]
for neuron in network[i]:
for j in range(len(inputs)):
neuron[‘weights’][j] += l_rate * neuron[‘delta’] * inputs[j]
neuron[‘weights’][-1] += l_rate * neuron[‘delta’]

# Initialize a network
def initialize_network(n_inputs, n_hidden, n_outputs):
global network
network = list()
hidden_layer1 = [{‘weights’:[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
network.append(hidden_layer1)

hidden_layer2 = [{‘weights’:[random() for i in range(n_hidden + 1)]} for i in range(100)]
network.append(hidden_layer2)

hidden_layer3 = [{‘weights’:[random() for i in range(100 + 1)]} for i in range(50)]
network.append(hidden_layer3)

output_layer = [{‘weights’:[random() for i in range(50 + 1)]} for i in range(n_outputs)]
network.append(output_layer)
return network

# Make a prediction with a network
def predict(network, row):
outputs = forward_propagate(network, row)
return outputs.index(max(outputs))

# Test Backprop on Seeds dataset
seed(1)
# load and prepare data
filename = ‘dataset/train2.csv’
dataset = load_csv(filename)

for i in range(1,len(dataset[0])):
str_column_to_float(dataset, i)

# convert class column to integers
str_column_to_int(dataset, 0)

normalize_dataset(dataset)

# evaluate algorithm
n_folds = 5
l_rate = 0.5
n_epoch = 100
n_hidden = 500

scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
print(‘Scores: %s’ % scores)
print(‘Mean Accuracy: %.3f%%’ % (sum(scores)/float(len(scores))))

Reply
- Jason Brownlee April 23, 2018 at 6:26 am #
  
  Sorry, I don’t have the capacity to debug the modified code for you.
  
  Reply
Rahmad ars April 26, 2018 at 3:23 am #

Sir, can you help me?
this is my question..

https://stackoverflow.com/questions/50027886/need-help-for-check-my-backprop-ann-using-python

Reply
- Jason Brownlee April 26, 2018 at 6:37 am #
  
  Perhaps you can summarize your question in a sentence or two?
  
  Reply
Rahmad ars April 26, 2018 at 7:19 am #

your original code sir, only have 1 hidden layer with 2 neurons. then, I modify it, so the ANN have 3 hidden layers, each consist of (128, 64, 32). and i have my own dataset, so i change it (the dataset and input neurons). when i run this code, everything looks fine but the error value is not changing…

here’s the screen: https://i.stack.imgur.com/NQbNd.png

modified code: https://stackoverflow.com/questions/50027886/need-help-for-check-my-backprop-ann-using-python

thanks sir

Reply
- Jason Brownlee April 26, 2018 at 2:59 pm #
  
  If you are struggling with the code, I would recommend not coding the algorithm from scratch.
  
  Instead, I would recommend using a library like Keras. Here is a worked example:
  https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
  
  Reply
Fahad April 28, 2018 at 9:20 pm #

I have the same problem of Rahmad

The same problem occurs when you change the original code from 5 neurons in the hidden layer to 31 neuron ( the error value does not change).

I know 31 hidden neuron is not a right number of neurons for seed data set. But I would like to know what is the wrong when you increase the number of neurons.

Logically, it should be fine and the error value decreases. But when you change the number of neurons to 30 it is still working , when change it to 31 neurons it does not decrease !

I think if this problem is fixed, then the problem of Rahmad will be fixed too.

Reply
- Jason Brownlee April 29, 2018 at 6:26 am #
  
  Perhaps the model requires tuning to your specific problem (e.g. layers, nodes, activation function, etc.)
  
  It might be better to use a library like Keras for your project:
  https://machinelearningmastery.com/start-here/#deeplearning
  
  Reply
Rocha May 2, 2018 at 9:58 pm #

Hi dude, I’m stuck in this error, could you help me?

# Forward propagate input to a network output
def forward_propagate(network, row):
inputs = row
for layer in network:
new_inputs = []
for neuron in layer:
—>>> activation = activate(neuron[‘weights’], inputs)
neuron[‘output’] = transfer(activation)
new_inputs.append(neuron[‘output’])
inputs = new_inputs
return inputs

That line is giving me this: TypeError: list indices must be integers or slices, not str

Should be the python version? I’m using python 3…

Reply
- Jason Brownlee May 3, 2018 at 6:33 am #
  
  This code was written for Python 2.7 sorry.
  
  Reply
Kamrun Nahar Nisha May 8, 2018 at 4:00 pm #

hello.please help me.
I want to use breast cancer dataset instead of seed dataset.

seed(1)
# load and prepare data
filename = ‘seeds_dataset.csv’
dataset = load_csv(filename)
for i in range(len(dataset[0])-1):
str_column_to_float(dataset, i)
# convert class column to integers
str_column_to_int(dataset, len(dataset[0])-1)
# normalize input variables
minmax = dataset_minmax(dataset)
normalize_dataset(dataset, minmax)
# evaluate algorithm
n_folds = 5
l_rate = 0.3
n_epoch = 500
n_hidden = 5
scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
print(‘Scores: %s’ % scores)
print(‘Mean Accuracy: %.3f%%’ % (sum(scores)/float(len(scores))))

In this part of your code I also want to print the error and it will be like

epoch=0, lrate=0.500, error=6.350
>epoch=1, lrate=0.500, error=5.531
>epoch=2, lrate=0.500, error=5.221
>epoch=3, lrate=0.500, error=4.951
>epoch=4, lrate=0.500, error=4.519
>epoch=5, lrate=0.500, error=4.173
>epoch=6, lrate=0.500, error=3.835
>epoch=7, lrate=0.500, error=3.506
>epoch=8, lrate=0.500, error=3.192
>epoch=9, lrate=0.500, error=2.898
>epoch=10, lrate=0.500, error=2.626
>epoch=11, lrate=0.500, error=2.377
>epoch=12, lrate=0.500, error=2.153
>epoch=13, lrate=0.500, error=1.953
>epoch=14, lrate=0.500, error=1.774
>epoch=15, lrate=0.500, error=1.614
>epoch=16, lrate=0.500, error=1.472
>epoch=17, lrate=0.500, error=1.346
>epoch=18, lrate=0.500, error=1.233
>epoch=19, lrate=0.500, error=1.132
[{‘output’: 0.029980305604426185, ‘weights’: [-1.4688375095432327, 1.850887325439514, 1.0858178629550297], ‘delta’: -0.0059546604162323625}, {‘output’: 0.9456229000211323, ‘weights’: [0.37711098142462157, -0.0625909894552989, 0.2765123702642716], ‘delta’: 0.0026279652850863837}]
[{‘output’: 0.23648794202357587, ‘weights’: [2.515394649397849, -0.3391927502445985, -0.9671565426390275], ‘delta’: -0.04270059278364587}, {‘output’: 0.7790535202438367, ‘weights’: [-2.5584149848484263, 1.0036422106209202, 0.42383086467582715], ‘delta’: 0.03803132596437354}]

please tell me the code . Using breast cancer dataset not wheat seed dataset. I am not so good in coding that’s why I need your help immediately.

Reply
- Jason Brownlee May 9, 2018 at 6:09 am #
  
  I’m eager to help, but I do not have the capacity to outline the changes or write the code for you.
  
  Reply
Akefar May 11, 2018 at 4:41 am #

hi Jason,
I tried your code in my data set ,shape of my data is (576,16) .the problem is

IndexError: list assignment index out of range

is there any need to change your code for (576,16) data shape .
Thanks

—————————————————————————
IndexError Traceback (most recent call last)
in ()
195 n_epoch = 500
196 n_hidden = 1
–> 197 scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
198 print(‘Scores: %s’ % scores)
199 print(‘Mean Accuracy: %.3f%%’ % (sum(scores)/float(len(scores))))

in evaluate_algorithm(dataset, algorithm, n_folds, *args)
79 test_set.append(row_copy)
80 row_copy[-1] = None
—> 81 predicted = algorithm(train_set, test_set, *args)
82 actual = [row[-1] for row in fold]
83 accuracy = accuracy_metric(actual, predicted)

in back_propagation(train, test, l_rate, n_epoch, n_hidden)
171 n_outputs = len(set([row[-1] for row in train]))
172 network = initialize_network(n_inputs, n_hidden, n_outputs)
–> 173 train_network(network, train, l_rate, n_epoch, n_outputs)
174 predictions = list()
175 for row in test:

in train_network(network, train, l_rate, n_epoch, n_outputs)
148 outputs = forward_propagate(network, row)
149 expected = [0 for i in range(n_outputs)]
–> 150 expected[row[-1]] = 1
151 backward_propagate_error(network, expected)
152 update_weights(network, row, l_rate)

IndexError: list assignment index out of range

Reply
- Jason Brownlee May 11, 2018 at 6:39 am #
  
  You may need to change your data to match the model or the model to match the data.
  
  Reply
Pradeep May 14, 2018 at 3:31 am #

Hi Jason, I tried your code on the same sample dataset, i am getting the following type error in the function activate. I am doing it in python3.6. hope to hear from you soon

Traceback (most recent call last):
File “neural_network.py”, line 94, in
train_network(network, dataset, 0.5, 20, n_outputs)
File “neural_network.py”, line 76, in train_network
outputs = forward_propagate(network, row)
File “neural_network.py”, line 31, in forward_propagate
activation = activate(neuron[‘weights’], inputs)
File “neural_network.py”, line 18, in activate
activation += weights[i] * inputs[i]
TypeError: can’t multiply sequence by non-int of type ‘float’

Reply
- Jason Brownlee May 14, 2018 at 6:39 am #
  
  The code requires Python 2.7.
  
  Reply
Deepak D May 17, 2018 at 6:49 pm #

Hi Jason Brownlee,

I tried your code and experienced the some error applying the Backpropagation algorithm to the wheat seeds dataset. I am using python 2.7.

Error type:

File “C:\Python27\programs\backpropagation.py”, line 186,
in str_column_to_float(dataset, i)

File “C:\Python27\programs\backpropagation.py”, line 22,
in str_column_to_float row[column] = float(row[column].strip())
ValueError: could not convert string to float:

Reply
- Jason Brownlee May 18, 2018 at 6:22 am #
  
  I am sorry to hear that, I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Dhanya Hegde May 19, 2018 at 1:53 am #

Hey Jason! Great work. Really helpful. I didn’t understand one part of your code. On what basis does predict function return the predicted value as 0 or 1, after taking the maximum of the two output neuron values?

Reply
- Jason Brownlee May 19, 2018 at 7:44 am #
  
  The summation of the activation is passed through a sigmoid transfer function.
  
  Reply
  - Dhanya Hegde May 21, 2018 at 3:37 am #
    
    I didn’t understand this part of the code
    
    outputs.index(max(outputs)
    
    Is one hot encoding used or binary classification?
    If so, how is the actual mapping done?
    And when is the iteration process stopped?
    
    Reply
    - Jason Brownlee May 21, 2018 at 6:35 am #
      
      As stated in the text above the code, it returns an integer for the class with the largest probability.
      
      Reply
Ionut May 27, 2018 at 12:05 am #

Hi,

I’m a beginner in neural networks and I don’t understand the dataset from the section “4.2. Train Network”. Can anyone explain me what x1, x2 and y means?

Reply
- Jason Brownlee May 27, 2018 at 6:45 am #
  
  Input 1, input 2 and output.
  
  Perhaps start here:
  https://machinelearningmastery.com/start-here/#algorithms
  
  Reply
Rishik Mani May 28, 2018 at 7:26 am #

Hi Jason, thank you for the highly informative post. But could you please clarify me upon this petty little issue.

In section 4.2 Train network, you considered n_inputs = len(dataset[0]) – 1. Why did you put a -1 here, while the number of the inputs should exactly be of the length of the dataset.

Reply
- Jason Brownlee May 28, 2018 at 2:32 pm #
  
  To exclude the output variable from the number of inputs.
  
  Reply
Samih Eisa June 2, 2018 at 9:13 pm #

Thank you, jason.

Reply
- Jason Brownlee June 3, 2018 at 6:22 am #
  
  You’re welcome.
  
  Reply

Kie Woo Nam June 5, 2018 at 3:01 am #

# Update network weights with error
def update_weights(network, row, l_rate):
	for i in range(len(network)):
		inputs = row[:-1]
		if i != 0:
			inputs = [neuron['output'] for neuron in network[i - 1]]
		for neuron in network[i]:
			for j in range(len(inputs)):
				neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j]
			neuron['weights'][-1] += l_rate * neuron['delta']

# Update network weights with error

def update_weights(network, row, l_rate):

for i in range(len(network)):

inputs = row[:-1]

if i != 0:

inputs = [neuron['output'] for neuron in network[i - 1]]

for neuron in network[i]:

for j in range(len(inputs)):

neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j]

neuron['weights'][-1] += l_rate * neuron['delta']

Hi,

I guess I’m likely mistaken, so please but when i != 0, isn’t the last line updating the last weight for the second time?

So, shouldn’t it be “inputs = [neuron[‘output’] for neuron in network[i – 1]][:-1]” (add “[:-1]]” at the end)?

If I’m wrong, I’ll read the code again more carefully, so please let me know.

Jason Brownlee June 5, 2018 at 6:46 am #

No. There are more weights than inputs and the -1 index of the weights is the bias.

Reply
- Kie Woo Nam June 5, 2018 at 7:06 pm #
  
  Ah, right. Now I see it from “output_layer = [{‘weights’:[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]”.
  
  Thank you for your quick reply.
  
  Reply
  - Jason Brownlee June 6, 2018 at 6:39 am #
    
    No problem.
    
    Reply

Thomas Specht July 10, 2018 at 4:46 am #

Hi Jason,

Great tutorial to get into ML coding. I have one question:

What library would you recommend for projects and why? I want to use NN for regression problems.

Reply
- Jason Brownlee July 10, 2018 at 6:53 am #
  
  I recommend Keras because it is computationally efficient, fast for development and fun:
  https://machinelearningmastery.com/start-here/#deeplearning
  
  Reply
Hugo B. August 13, 2018 at 1:29 am #

Hi Jason! Thank you for this tutorial.
I try to implement the batch learning, but I have some questions about it…

– Computing the errors:
Do I have to accumulate the errors (‘delta’) in backward_propagate_error() during one epoch and performing an average according to the number of back-propagations performed?

– Updating the weights:
In train_network(), I call update_weights() for each epoch, but I don’t know which row(s) of train (dataset) I have to used. Currently I use only one row: train[0].

Reply
elizabeth August 14, 2018 at 2:40 am #

Hello Sir
Thank you so much for this article.
Can you please tell me how i can solve this error:

File “mlp8.py”, line 186, in
str_column_to_float(dataset, i)
File “mlp8.py”, line 22, in str_column_to_float
row[column] = float(row[column].strip())
ValueError: could not convert string to float:

thankyou

Reply
- Jason Brownlee August 14, 2018 at 6:22 am #
  
  I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
STEPHEN OLADEJI August 28, 2018 at 10:35 pm #

Dear Prof,

Thanks for this tutorial,
Sir, I’ve gone through a lot of your project online and they are very superb. God bless you, sir.
Sir. my question is as followings.
1. I noticed that one WEKA 3.6, Artificial Immune System was removed because this was in version 1.8, Is it because there is no research prospect in the algorithm?
2. I want to write python version for AIRS, CSCA, Genetic Algorithm can you help proofread it sir so see if what I write is correct

Reply
- Jason Brownlee August 29, 2018 at 8:11 am #
  
  I have an implementation here that you can use:
  http://wekaclassalgos.sourceforge.net/
  
  Reply
Tamara September 11, 2018 at 5:36 am #

Thank you so much for this tutorial.
How to see the results of work a trained network in Python?

Reply
- Jason Brownlee September 11, 2018 at 6:34 am #
  
  What do you mean exactly?
  
  Reply
ritu September 23, 2018 at 9:49 am #

How should I modify to code to always run for one output neuron in the output layer?
eg. if the output class consist of only 2 output classes ‘1’ and ‘2’ , as per the above code 2 neurons will be created within the output layer, but what if I wanted a neural network to just have one neuron in the output layer.

Reply
- Jason Brownlee September 24, 2018 at 6:09 am #
  
  If you are having trouble with this tutorial, I would encourage you to use Keras to develop neural network models.
  
  You can get started here:
  https://machinelearningmastery.com/start-here/#python
  
  Reply

Parva September 27, 2018 at 9:04 am #

Why is there just one output coming from layer 1 though it contains 2 neurons. Shouldn’t there be 2 outputs one from each neuron?

[{‘output’: 0.7105668883115941, ‘weights’: [0.13436424411240122, 0.8474337369372327, 0.763774618976614], ‘delta’: -0.0005348048046610517}]
[{‘output’: 0.6213859615555266, ‘weights’: [0.2550690257394217, 0.49543508709194095], ‘delta’: -0.14619064683582808}, {‘output’: 0.6573693455986976, ‘weights’: [0.4494910647887381, 0.651592972722763], ‘delta’: 0.0771723774346327}]

Jason Brownlee September 27, 2018 at 2:47 pm #

Which step exactly are you having trouble with?

Brian September 28, 2018 at 1:17 am #

Thank you for this example. It has helped me get past the block I had with the mathematical based descriptions and differential calculus related to back propagation.
As I have been focusing on the back propagation portion of this example I have come up with an alternative version of the ‘backward_propagate_error’ function that I think is a much more succinct and logical way to write this function.

Please find below

# Backpropagate error and store in neurons
def backward_propagate_error(network, expected):
	for i in reversed(range(len(network))):
		layer = network[i]
		for j in range(len(layer)):
			fromNeuron = layer[j]
			error = 0.0
			if i != len(network)-1:                           #This identifies all but the last (output) layer
				for toNeuron in network[i + 1]:
					error += (toNeuron['weights'][j] * toNeuron['delta'])
			else:                                             #This is the last (output) layer
				error = expected[j] - fromNeuron['output']
			fromNeuron['error'] = error
			fromNeuron['delta'] = error * transfer_derivative(fromNeuron['output'])

# Backpropagate error and store in neurons

def backward_propagate_error(network, expected):

for i in reversed(range(len(network))):

layer = network[i]

for j in range(len(layer)):

fromNeuron = layer[j]

error = 0.0

if i != len(network)-1: #This identifies all but the last (output) layer

for toNeuron in network[i + 1]:

error += (toNeuron['weights'][j] * toNeuron['delta'])

else: #This is the last (output) layer

error = expected[j] - fromNeuron['output']

fromNeuron['error'] = error

fromNeuron['delta'] = error * transfer_derivative(fromNeuron['output'])

Jason Brownlee September 28, 2018 at 6:16 am #

Cool!

Sorry, I don’t have the capacity to review your code.

Reply

Brian October 4, 2018 at 2:59 am #

Should not the formula for the error be
sum_error += sum([0.5*(expected[i]-outputs[i])**2 for i in range(len(expected))])
as apposed to
sum_error += sum([(expected[i]-outputs[i])**2 for i in range(len(expected))])
To correctly back propagate with the derivative?

Reply
- Jason Brownlee October 4, 2018 at 6:20 am #
  
  Why is that Brian?
  
  Reply
KS October 10, 2018 at 2:15 am #

Can you please tell me how to inplant tanh and ReLu?
I tried to find examples on the internet, but I couldn’t find good examples.

I understand that these 2 codes need to be changed:

# Transfer neuron activation
def transfer(activation):
return 1.0 / (1.0 + exp(-activation))

# Calculate the derivative of an neuron output
def transfer_derivative(output):
return output * (1.0 – output)

1. What is the code when using tanh?
2. What is the code when using ReLu?

Reply
- Jason Brownlee October 10, 2018 at 6:15 am #
  
  Thanks for the suggestion, sorry, I don’t have the capacity to make these changes for you.
  
  Reply
- audrey April 14, 2020 at 5:42 am #
  
  did you find something about Relu?
  
  Reply
  - Jason Brownlee April 14, 2020 at 6:31 am #
    
    See thus tutorial:
    https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/
    
    Reply
moe October 10, 2018 at 7:08 pm #

Thank you so much, 1 – 5 helped me to built my own generic network. Now i have a MLP network which i can easily adjust with a few parameter changes, wouldnt have been so easy with your example.

Reply
- Jason Brownlee October 11, 2018 at 7:51 am #
  
  Well done!
  
  Reply
Anwar October 11, 2018 at 3:35 am #

I am running the same code that u have provided on python 3.6 and getting these errors please help me:-

—————————————————————————————————————————

IndexError Traceback (most recent call last)
in ()
16 n_epoch = 500
17 n_hidden = 5
—> 18 scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
19 print(‘Scores: %s’ % scores)
20 print(‘Mean Accuracy: %.3f%%’ % (sum(scores)/float(len(scores))))

in evaluate_algorithm(dataset, algorithm, n_folds, *args)
12 test_set.append(row_copy)
13 row_copy[-1] = None
—> 14 predicted = algorithm(train_set, test_set, *args)
15 actual = [row[-1] for row in fold]
16 accuracy = accuracy_metric(actual, predicted)

in back_propagation(train, test, l_rate, n_epoch, n_hidden)
4 n_outputs = len(set([row[-1] for row in train]))
5 network = initialize_network(n_inputs, n_hidden, n_outputs)
—-> 6 train_network(network, train, l_rate, n_epoch, n_outputs)
7 predictions = list()
8 for row in test:

in train_network(network, train, l_rate, n_epoch, n_outputs)
5 outputs = forward_propagate(network, row)
6 expected = [0 for i in range(n_outputs)]
—-> 7 expected[row[-1]] = 1
8 backward_propagate_error(network, expected)
9 update_weights(network, row, l_rate)
IndexError: list assignment index out of range

Reply
- Jason Brownlee October 11, 2018 at 8:00 am #
  
  I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
KS October 12, 2018 at 11:33 am #

How to:
“Regression. Change the network so that there is only one neuron in the output layer and that a real value is predicted.”

How to get 1 output layer?

Reply
- Jason Brownlee October 13, 2018 at 6:06 am #
  
  Coding algorithms from scratch is not for beginners.
  
  I strongly encourage you to use Keras, for example:
  https://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/
  
  Reply

Jerry October 16, 2018 at 9:08 am #

Ok, just following up. I still don’t get how the backprop is updating:

        for row in train:
            outputs = forward_propagate(network, row, alpha)
            expected = [0 for i in range(n_outputs)]
            expected[row[-1]] = 1
            sum_error += sum([(expected[i]-outputs[i])**2 for i in range(len(expected))])
            backward_propagate_error(network, expected, alpha)
            update_weights(network, row, l_rate)

for row in train:

outputs = forward_propagate(network, row, alpha)

expected = [0 for i in range(n_outputs)]

expected[row[-1]] = 1

sum_error += sum([(expected[i]-outputs[i])**2 for i in range(len(expected))])

backward_propagate_error(network, expected, alpha)

update_weights(network, row, l_rate)

1. No return statement or ‘global network’ for backward_propagate_error or update_network to actually incorporate the new weights.

My question, are you sure this uses back-propagation? How are the weights saved and updated for each epoch?

Jason Brownlee October 16, 2018 at 2:34 pm #

The weights are passed in by reference and modified in place.

Perhaps using Keras would be a better fit Jerry:
https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/

Reply
- Jerry November 10, 2018 at 11:53 pm #
  
  You learn something new everyday! I was always under the impression that values were immutable when passing to a function in Python.
  
  https://stackoverflow.com/questions/986006/how-do-i-pass-a-variable-by-reference
  
  Thank you Jason! I’m really amazed on how active you are on this site. Just so you know, you and your work are referenced often in my MSDS program.
  
  One area that I think would also be beneficial is some work on hidden node activations and their interpretations. The majority of our work is not so much on the output/accuracy of the NN, but more of visualizing weights, activations, and determining the features that are causing nodes to activate.
  
  Reply
  - Jason Brownlee November 11, 2018 at 6:08 am #
    
    Thanks, happy to help.
    
    Why do you want to view/understand the dynamics of nodes in hidden layers?
    
    Reply
ben October 31, 2018 at 9:47 pm #

i think this is not correct:
sum_error += sum([(expected[i]-outputs[i])**2 for i in range(len(expected))])

shouldn´t this be:
sum_error += sum([abs(0.5*(expected[i]-outputs[i])**2) for i in range(len(expected))])

see https://goo.gl/iqHJf6 page 233 (5.11).

Reply
- Jason Brownlee November 1, 2018 at 6:11 am #
  
  This implementation is based on the book “Neural Smithing”:
  https://www.amazon.com/Neural-Smithing-Supervised-Feedforward-Artificial/dp/0262527014/ref=as_li_ss_tl?ie=UTF8&linkCode=sl1&tag=inspiredalgor-20&linkId=e3db0b57249093a94ebb073983bc8b4d&language=en_US
  
  Reply

Ayhan October 20, 2018 at 1:43 am #

Hey Jason, first thnx for the efforts.

Maybe one thing:
wouldnt it be better (especially for beginners to that topic) to at least use
something like numpy which would at least let all the matrix calculation look
a bit more compact (and therefore possible to concentrate on the real topic .. which is
backprop i would have guessed)?

Ok the title sais ‘implement from scratch’ but I would say at some point getting
the point towards backprop is maybe more important than being able to say that
it was implemented from scratch (using nothing but plain python).

Greets

Reply
- Jason Brownlee October 20, 2018 at 5:57 am #
  
  Thanks for the feedback.
  
  This example is not intended for writing operational code, for that I would recommend Keras:
  https://machinelearningmastery.com/start-here/#deeplearning
  
  This tutorial is to teach how to develop a net (training and evaluation) without any dependencies other than the standard lib.
  
  Reply
Vadim November 3, 2018 at 11:10 pm #

Just Brilliant stuff

Reply
- Jason Brownlee November 4, 2018 at 6:27 am #
  
  Thanks.
  
  Reply
Himanshu November 14, 2018 at 10:38 pm #

Hi Jason
how i can use forward and backward propagation in real time data. I mean data in which we have multiple field. Fields contains Numerical values , floating values , String Values.
For an example if i want to use this technique for Titanic data how can i use it.

How can i decide how many hidden layers should be there.
How can i decide for learning rate.
How can i decide for what should be the seed value.
How can i decide for weights for such huge data.

Reply
- Jason Brownlee November 15, 2018 at 5:30 am #
  
  You must use careful experimentation to get answers to each of these questions.
  
  Reply
Himanshu November 15, 2018 at 6:12 pm #

Hi Jason
I have few more questions on this.

First: why you assigned “activation = weights[-1]” , why not other weights or any random value.
Second: why you are looping only for two values
for i in range(len(weights) – 1) though we have three values.
Third: Why you have considered only two outputs here though we have four
output for layer one =.9643898158763548
output for layer two =.9185258960543243
output for layer three = .8094918973879515
output for layer two = .7734292563511262

why you considered only last two values why not first two or any other combination.

Four: here i guess some problem by mistake you have update wrong value
expected[row[-1]] = 1
after this line you have updated expected as [1,0] from [0,0]
and why we in this i have other question why we are updating this value.

Reply
- Jason Brownlee November 16, 2018 at 6:12 am #
  
  Too many questions for one comment (I’m a simple human), one at a time and can you please reference specific examples/lines of code otherwise I can’t help.
  
  Reply
Karim November 18, 2018 at 8:14 am #

Hi Jason — How can i add more hidden layers? Thankx

Reply
- Jason Brownlee November 19, 2018 at 6:41 am #
  
  I recommend using Keras to develop your network:
  https://machinelearningmastery.com/start-here/#deeplearning
  
  Reply
Himanshu November 22, 2018 at 5:24 pm #

Hi Jason,

How to apply mathematical implementation of gradient descent and logistic regression,classification in real time data.
For example if i want use this in survivors of Titanic data how to start with.

Reply
- Jason Brownlee November 23, 2018 at 7:44 am #
  
  Here’s an example:
  https://machinelearningmastery.com/implement-logistic-regression-stochastic-gradient-descent-scratch-python/
  
  Reply
John Sald December 6, 2018 at 2:05 pm #

Hello, could you show me an example of using one of the extensions you mentioned, which can give us a gain in performance?

Such as using matrix operations (in the weights) and vectors (inputs, intermediate signals and outputs)

Reply
- Jason Brownlee December 7, 2018 at 5:16 am #
  
  Thanks for the suggestion.
  
  If you’re looking to go deeper into neural nets, I recommend using a library like Keras. You can start here:
  https://machinelearningmastery.com/start-here/#deeplearning
  
  Reply
Pipo December 17, 2018 at 7:25 am #

How could i change the loss to mse in this code? I can’t wrap my head around it. Thanks

Reply
- Jason Brownlee December 17, 2018 at 2:13 pm #
  
  Calculate error between actual and predicted using a MSE function. That’s it.
  
  Reply
  - Brett August 24, 2019 at 4:07 am #
    
    I’m confused about why you chose MSE for a classification problem. I was trying to use this tutorial to discern the differences between a classification and function approximation implementation, and the use of MSE for classification really threw me off. I know that it technically works, but it’s probably good to mention that it’s not ideal. It would have been nice to get exposure to taking the derivative of a different loss function, so that someone who is new to back-propagation will start to grasp how different functions change the derivative, etc. Otherwise, the code is understandable and could be modified slightly to make a good tutorial for function approximation.
    
    Reply
    - Brett August 24, 2019 at 4:19 am #
      
      Just to clarify what I’m saying, and to answer Pipo’s question, this implementation is already using MSE. The derivative of MSE with respect to the output is: (output – expected). The fact that you’re multiplying this by the transfer derivative just means that you’re passing the MSE back through the activation of the output node. So if you want the code to work for function approximation, you simply don’t multiply by the transfer derivative. However, if you want classification to work better, you could use the derivative of a different loss function with respect to the output and predicted, and multiply that by your transfer derivative.
      
      Reply
      - Jason Brownlee August 24, 2019 at 7:59 am #
        
        Agreed!
      - Marcel June 7, 2021 at 6:51 pm #
        
        Hello Brett,
        
        could you please point out where exactly the MSE loss is calculated? And where you would put the Cross entropy loss? Could you please demonstrate this with a short code example based on the tutorial ? I would be very pleased. Thank you in advance.
    - Jason Brownlee August 24, 2019 at 7:58 am #
      
      Yes, it was what we did in the 90s. Cross entropy would be preferred today, I agree.
      
      I really need to do a series on coding neural nets from scratch to really dig into this. Thanks for the kick!
      
      Reply
      - Brett August 24, 2019 at 9:26 am #
        
        Thanks for the reply. I just found this post on finding the derivative of cross entropy, and it turns out that you can do a really nice simplification of the math to basically get (output – expected) or (expected – output) for your implementation, when combining the cross entropy derivative and sigmoid derivative. So I’m pretty sure that if you simply stop multiplying by the transfer derivative to get your output error, you should see a big increase in performance of the algorithm. Worth a try at least. Here is the link, with the conclusion I mentioned at the very end:
        
        https://peterroelants.github.io/posts/cross-entropy-logistic/
      - Jason Brownlee August 25, 2019 at 6:30 am #
        
        Thanks for sharing.
muhammad December 21, 2018 at 8:48 am #

hi, thanks for this code.
I’m trying to understand why are u adding on the update weights, shouldnt be
wi←wi−η∂E/∂wi like this?

Reply
Sangeeth January 20, 2019 at 4:40 am #

Hi,

This website provides a good introduction for almost all topics in machine learning. Thanks for your work.

In backpropagation, the error at each neuron is the product of
1. Change in error w.r.t y_out
2. Change in y_out w.r.t y
3. Change in y w.r.t weight.

Could you please tell how you just multiplied 1 and 2 in backward_propagate_error (from the last layer) and then used 3 in update_weights (from the first layer). Should we not do all steps in backward_propagate_error and then use it to update_weights?.

Reply
- Jason Brownlee January 20, 2019 at 5:41 am #
  
  I show exactly how in the above tutorial.
  
  Reply
  - sangeeth January 20, 2019 at 5:44 am #
    
    Sorry, I just realized what I said is same as what you did.
    
    About the error, should we not use 2*error (derivative of MSE)?
    
    Reply
    - Jason Brownlee January 21, 2019 at 5:26 am #
      
      No, we calculate the derivative of the error against the non linear activation function, not the derivative of the loss function itself.
      
      Reply
      - sangeeth January 21, 2019 at 12:11 pm #
        
        Ok. I got it. Thanks,
        
        I think this is online learning using SGD. Would you have an implementation for offline learning using mini Batch Gradient descent?
      - Jason Brownlee January 22, 2019 at 6:16 am #
        
        Correct, you can modify the above example to use batch or mini-batch gradient descent.
      - sangeeth January 22, 2019 at 1:14 am #
        
        Is the sum_error variable same as loss in model.fit output?..I get different loss values when testing the same datasets on model.fit and your model. Could you tell me why this is?
      - Jason Brownlee January 22, 2019 at 6:25 am #
        
        Yes, see this post:
        https://machinelearningmastery.com/randomness-in-machine-learning/
    - Brett August 24, 2019 at 4:25 am #
      
      The loss function used in this tutorial is: (1/2)(out – expect)^2. The derivative of which with respect to the output is: (out – expect) * 1, or simply (out – expected). This is then multiplied by the transfer derivative, because the error is being passed backward via the chain rule. You always have to take the derivative with respect to the loss function itself first. I hope this clears up any confusion.
      
      Reply
Gary January 22, 2019 at 8:54 am #

Hi Jason.

In the “full” seeds example you call user defined function evaluate_algorithm(). However, the “heavy lifting” inside it is performed by the function algorithm(). That function looks like it’s a part of some standard Python library, but I can’t find it in any reference. Also you don’t comment at all at its use.

What’s the deal here?

Thank you,

Gary

Reply
- Jason Brownlee January 22, 2019 at 11:43 am #
  
  The “algorithm” is a reference to a function that is passed in as an argument.
  
  Reply
Gary January 22, 2019 at 1:24 pm #

Yes, thank you, I already realized that.

Regards,

Gary

Reply
sangeeth January 28, 2019 at 8:39 am #

Hi,

For online machine learning, should we perform epochs?. Should not we update the model based only on the present time input and then predict the next time step. If we do epochs that means the model is getting updated for the whole data set up to the present time. Am I correct?. Thanks

Reply
- Jason Brownlee January 28, 2019 at 11:44 am #
  
  It depends on the problem and the data. Yes, if often makes sense to update the model with new data and with a little of the old data.
  
  Note, online gradient descent does not have to be used for online learning.
  
  Reply
kmillen February 12, 2019 at 10:49 am #

Good afternoon Jason. I have thoroughly enjoyed this solution both in Python and my conversion to C#. I guess for all the learning I’ve gleaned, one thing still seems to be a mystery to me. What exactly are the five scores telling me? Do they annotate how well the data fits a curve for each fold?

Reply
- Jason Brownlee February 12, 2019 at 1:59 pm #
  
  Correct.
  
  The mean of the scores is our estimate of the model’s performance when making predictions on unseen data.
  
  Reply
  - kmillen February 26, 2019 at 8:10 am #
    
    Thank you.
    
    Reply
MathewP February 20, 2019 at 3:25 am #

I think there is a mistake update_weights function.
inputs = row[:-1]
If we have, say 2 inputs and 1 neuron in hidden layer then only one weight is going to be updated, which is clearly wrong. Correct me if I am wrong. The code works fine just taking row as inputs.

Reply
- Jason Brownlee February 20, 2019 at 8:11 am #
  
  I don’t follow the possible issue, can you please elaborate?
  
  Reply
  - Romel Rudon October 20, 2019 at 11:25 pm #
    
    The issue is that the ‘row’ list should represent the outputs from the preceding layer (counting in the direction from input layer to output layer). having row[:-1] seems to exclude the very last output from the preceding layer, which doesn’t seem to be warranted in this case.
    
    Reply
    - Romel Rudon October 21, 2019 at 2:05 am #
      
      I see now why a few people (including myself) were thrown off by this line. The last element of the row list ( i.e. row[-1]) is not an actual part of the input data, but the label or the ‘correct answer’ of the input data, which is why it’s left out.
      
      Cheers.
      
      Reply
      - Jason Brownlee October 21, 2019 at 6:19 am #
        
        Yes.
Venkat February 22, 2019 at 4:53 am #

Hi Jason Brownlee, back propagation implementation really excellent ,because of without using any predefined library just use functions list, set, and dictionary. I need a suggestion how to write a code for implement activation function like a sigmoidal at hidden layer neurons and a tangent at output neurons. could u help me.

Reply
- Jason Brownlee February 22, 2019 at 6:25 am #
  
  Yes, you can use the above as a starting point.
  
  More on tanh here:
  https://en.wikipedia.org/wiki/Hyperbolic_function
  
  Reply

Venkat February 22, 2019 at 4:49 pm #

Hi Jason Brownlee, yes , iam not asking how to write code for implementation of tanh, sigmoidial . My request is how to modified code in forward_propagate function to implement suppose x is a activation at hidden layer and y is another activation function at output layer.

def forward_propagate(network, row):
   inputs =row  
   for layer in network:
      new_inputs = []
      for neuron in layer:
        activation = activate(neuron['weights'], inputs)
        neuron['output'] = transfer(activation)
        new_inputs.append(neuron['output'])
      inputs = new_inputs    
   return inputs

def forward_propagate(network, row):

inputs =row

for layer in network:

new_inputs = []

for neuron in layer:

activation = activate(neuron['weights'], inputs)

neuron['output'] = transfer(activation)

new_inputs.append(neuron['output'])

inputs = new_inputs

return inputs

in the above code u r calling transfer function for the hidden neurons and also output neurons . I request u to give suggestion to call different activation functions for hidden and output neurons.

Jason Brownlee February 23, 2019 at 6:26 am #

Change the code in the activation function itself.

Does that help?

Reply

Danh Nguyen February 24, 2019 at 12:09 pm #

Example is great! The totally clean CSV wheat seed dataset is here: https://raw.githubusercontent.com/NguyenDa18/MachineLearning_HW3/master/wheat-seeds.csv

I tried Jason’s link
https://raw.githubusercontent.com/jbrownlee/Datasets/master/wheat-seeds.csv
and the UCI Repo link and the CSVs still had double commas and so we got the str_column_to_float error

Anyway, posting this here so others won’t run into the same problem I did! Thanks

Reply
- Jason Brownlee February 25, 2019 at 6:37 am #
  
  Thanks for sharing.
  
  Reply
vartika sharma February 27, 2019 at 3:49 am #

Hey Jason,
While I am ruuning the following code, I am getting this error
>>> scores=evaluate_algorithm(dataset,back_propagation,n_folds,l_rate,n_epoch,n_hidden)
Traceback (most recent call last):
File “”, line 1, in
File “”, line 13, in evaluate_algorithm
File “”, line 5, in back_propagation
File “”, line 6, in train_network
TypeError: list indices must be integers, not str

Reply
- Jason Brownlee February 27, 2019 at 7:34 am #
  
  Perhaps try saving code to a file and run from the command line, here’s how:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-run-a-script-from-the-command-line
  
  Reply
Andy March 5, 2019 at 12:51 pm #

Hello Jason,

I would like to ask, can you make the data split between training data and test data, instead of using k folds variation, I would like to get some insight in this, thanks

Reply
- Jason Brownlee March 5, 2019 at 2:22 pm #
  
  Yes, I show how here:
  https://machinelearningmastery.com/implement-resampling-methods-scratch-python/
  
  Reply
Andy March 5, 2019 at 6:33 pm #

Hello Jason, it’s me again.

I would like to ask another question, how do you predict using this trained network ?
Lets say I have 100 data, and I split the training and test by 70:30 ratio. I’ve trained the network using 70 data, how do I predict the rest (30 data) ?

Reply
- Jason Brownlee March 6, 2019 at 7:46 am #
  
  You can fit one final model on all data, then use it to make predictions (see the predict section).
  
  Remember, this is an example for learning only. If you want a model for your data in practice, I recommend using a robust neural network library like Keras:
  https://machinelearningmastery.com/start-here/#deeplearning
  
  Reply
Dini M March 11, 2019 at 12:21 am #

# Convert string column to integer
def str_column_to_int(dataset, column):
class_values = [row[column] for row in dataset]
unique = set(class_values)
lookup = dict()
for i, value in enumerate(unique):
lookup[value] = i
for row in dataset:
row[column] = lookup[row[column]]
return lookup

I got the error “class_values = [row[column] for row in dataset]”
IndexError: list index out of range

Reply
- Jason Brownlee March 11, 2019 at 6:52 am #
  
  Sorry to hear that, I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Dini M March 11, 2019 at 12:28 am #

I trying your code example and seeds_dataset.csv

Reply
giuseppe March 15, 2019 at 7:08 am #

Hi thanks for the code is amazing,
I’ve included it in a class for a project, I’ve modified it so I can decide how many neurons put in each layer because, but I have a question: during the train process using for example 4 neurons in the first layer and 3 in the second one I get nice result, around 85% / 92 %. At this point I save all the weights of the neurons and I call another function that just load the weight that I’ve saved(skipping in this way the traning process) and using all the dataset(the same I’ve used for train the network) as test set it gives me a really bad score, most of the time is around 30%. I’m using the “IrisDataTrain” and what I’ve noticed is that the networks fails to recognise one of the 3 classes. Do you have any suggestion about what could be? Thanks 🙂

Reply
- Jason Brownlee March 15, 2019 at 2:25 pm #
  
  Perhaps the weights are not being loaded correctly?
  
  Reply
  - giuseppe March 28, 2019 at 8:11 pm #
    
    Hi sorry for the late replay, actually the problem of save and loading the nn is not that important so maybe i’ll try to solve it later. At the moment the problem is that i should reach the 99. % on the “IrisDataTrain” set. What i’ve noticed is that the accurancy can change a lot repeating the same training process with the same configuration. In order to get a better result I’ve tryied to repeat the same traning process different times with the same configuration, I’ve choosen the configuration that give me the best result in mean and variance. Now in order to improve the accurancy I’ve modified the code so that I can connect easily the output of a nn with the input of another one so that I can create a cascade of neural networks connected in different ways. At this point i’m stucked at 96% in mean. To improve the accurancy I’ve implemented the relu activation function (but i’m not sure it’s correctly implemented) and adam optimizer (but it doesn’t work at all).
    I’ll link the code on pastbin (I don’t know if there is any better way to do that) in particular what I’ve done is just insert everything in a class and modified:
    1. the initialization function so that I can chose the number of neurons in each layer
    2. the back_propagation_error function trying to add the relu and adam optimizer
    3. the update weights function trying to implement adam optimizer (it doesnt’ work at all)
    In the code I’m going to share I’ve just removed many parts just for a readability reason, after cleaning it I will send it to you if you want. Sorry for the long message and thanks for you help 🙂
    
    https://pastebin.com/RxxuVaCD
    
    Reply
    - Jason Brownlee March 29, 2019 at 8:31 am #
      
      Nice work!
      
      It might be time to graduate to Keras where everything is implemented for you and you can just use it directly and focus on tuning the model.
      
      Reply
MLnovice March 21, 2019 at 10:53 pm #

Hello sir,
I am playing with your code and I am trying to figure out this error:

line 185, in
str_column_to_float(dataset, i)
line 21, in str_column_to_float
row[column] = float(row[column].strip())
ValueError: could not convert string to float:

Do you have any insides of why this is happening?

Reply
- Jason Brownlee March 22, 2019 at 8:28 am #
  
  Sorry to hear that, I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Matty March 25, 2019 at 9:55 am #

Thank you for the post Jason.

Reading this post, it seems to me that I can split the process of back propagation in large networks into multiple steps. Am I right?

I have a large network that my current GPU runs out of memory when I try to train it. I was wondering if I can split my network into two sub-networks, and first calculate the updates for the deeper part(that has the ground truth outputs) and obtain the error that should be passed to the other sub-network. Then, use the provided error to calculate the updates for the second sub-network as well. Do you think it’s possible? Do you have any suggestion (or source that can be helpful) for implementing this back propagation with existing tensorflow or pytorch builtin functions?

Thanks.

Reply
- Jason Brownlee March 25, 2019 at 2:17 pm #
  
  Yes, by node or by layer.
  
  It might be possible, but also a massive pain.
  
  It might be cheaper (in time/money) to rent an AWS EC2 instance with more GPU RAM for a few hours?
  
  Reply
  - Matty March 26, 2019 at 1:03 pm #
    
    Thanks, Jason. I think I found an easy way to split the back-propagation in tensorflow.
    
    We can define two separate optimizations with different trainable variable lists. Something similar to:
    
    self.optim_last_layers = tf.train.AdamOptimizer(lr, beta1=beta1) \
    .minimize(loss, var_list=vars_of_last_layers)
    
    self.optim_first_layers = tf.train.AdamOptimizer(lr, beta1=beta1) \
    .minimize(loss, var_list=vars_of_first_layers)
    
    And in each iteration, we can call the optimizations separately.
    
    I did a small sanity check with a two-layer network, and it seems both the two-step optimization and the one-step optimization with all the trainable parameters results in the same updates.
    
    Reply
    - Jason Brownlee March 26, 2019 at 2:20 pm #
      
      Glad to hear it.
      
      Reply
Novia Puspitasari March 28, 2019 at 12:17 am #

thankyou so much jason, for your post about it
i have some problem in “‘float’ object has no attribute ‘append'”

Traceback (most recent call last):
File “backprop.py”, line 200, in
scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
File “backprop.py”, line 80, in evaluate_algorithm
predicted = algorithm(train_set, test_set, *args)
File “backprop.py”, line 172, in back_propagation
train_network(network, train, l_rate, n_epoch, n_outputs)
File “backprop.py”, line 150, in train_network
backward_propagate_error(network, expected)
File “backprop.py”, line 123, in backward_propagate_error
error.append(error)
AttributeError: ‘float’ object has no attribute ‘append’

do you have a solve for that ? thankyou

Reply
Kevin March 29, 2019 at 1:21 am #

Hi there Jason, I would like to ask if I wanted to generate random weights and bias with range -1 to 1, how to do it ? Since I already import random or from uniform import random, and it said AttributeError: ‘builtin_function_or_method’. Thank you very much !

Reply
- Jason Brownlee March 29, 2019 at 8:38 am #
  
  Good question, this post shows how:
  https://machinelearningmastery.com/how-to-generate-random-numbers-in-python/
  
  You then scale them to any range you want:
  
  result = min + value * range
  
  Reply
  - Kevin March 29, 2019 at 4:20 pm #
    
    Thanks for your reply ! Awesome guide, I really grateful for it. Once again, thanks a lot Jason.
    
    Reply
    - Jason Brownlee March 30, 2019 at 6:22 am #
      
      You’re welcome.
      
      Reply
wancong zhang March 30, 2019 at 12:45 pm #

Hi Jason, very cool tutorial.

I notice that your neural network only has 3 layers.

If I change your “initialize network” method to initialize multiple hidden layers with arbitrary width, will your program still work? In other words does your algorithm generalize to deeper networks?

Thanks.

Reply
- Jason Brownlee March 31, 2019 at 9:26 am #
  
  No idea – it is for educational purposes only, try it and see.
  
  Reply
manoj April 19, 2019 at 7:38 am #

Hi Jason!

Its really a helpful post, that you very much.

I wanted to see the plots of training error and testing error. (like how they finally converged by epochs by epochs). What would be the easiest way to plot those training and testing graphs

regards
Manoj Goli

Reply
- Jason Brownlee April 19, 2019 at 3:03 pm #
  
  I’d recommend using a library like Keras which provides the history directly and ready to plot:
  https://machinelearningmastery.com/display-deep-learning-model-training-history-in-keras/
  
  Reply
Danial April 20, 2019 at 5:46 pm #

Hi jason.
My question is how I can see my CNN code is using BP framework?

Reply
- Jason Brownlee April 21, 2019 at 8:20 am #
  
  You can save the model weights to a file.
  
  Is that what you mean?
  
  Reply
  - Danial April 21, 2019 at 1:27 pm #
    
    Yes. How I can see model weights? How cnn use BP framework if it is not shown in code?
    for i in range(len(test)):
    # Forecast the data
    test_X, test_y = test[i, 0:-1], test[i, -1]
    X_ = test_X.reshape(1, 28, 28, 1)
    predict = model.predict(X_, batch_size=1)
    predict = predict[0, 0]
    
    # Replacing value in test scaled with the predicted value.
    test_prediction = [predict] + test_prediction
    if len(test_prediction) > sequence_length+1:
    test_prediction = test_prediction[:-1]
    if i+1 sequence_length+1:
    test[i+1] = test_prediction
    else:
    test[i+1] = np.concatenate((test_prediction, test[i+1, i+1:]), axis=0)
    
    # Inverse transform
    predict = inverse_transform(scaler, test_X, predict)
    # Inverse the features
    predict = inverse_features(data_set, predict, len(test)+1-i) – maxVal
    if predict < 0:
    predict = 0
    # Round the value
    predict = np.round(predict, 2)
    # store forecast
    expected = data_set[len(train) + i + 1]
    predict_data.append(predict )
    real_data.append(expected )
    if expected != 0:
    prediction.append(predict)
    real.append(expected)
    
    Reply
    - Jason Brownlee April 22, 2019 at 6:15 am #
      
      You can get the model weights from a Keras model by calling the get_weights() function on a give layer.
      
      Reply
      - Danial April 22, 2019 at 6:38 am #
        
        Is it right that above code is using BP framework.? It’s part of CNN code that I sent
      - Danial April 22, 2019 at 11:23 am #
        
        How keras uses BP framework.? If you have link kindly share it.
      - Jason Brownlee April 22, 2019 at 2:26 pm #
        
        You can get started with Keras here:
        https://machinelearningmastery.com/start-here/#deeplearning
Idris Shareef April 24, 2019 at 4:16 pm #

Hello Jason , You’re the best teacher.

Reply
- Jason Brownlee April 25, 2019 at 8:08 am #
  
  Thanks.
  
  Reply
Abarni April 30, 2019 at 12:39 pm #

Nice Post !

Here is another very nice tutorial with step by step Mathematical explanation and full coding.

http://www.adeveloperdiary.com/data-science/machine-learning/understand-and-implement-the-
backpropagation-algorithm-from-scratch-in-python/

Reply
- Jason Brownlee April 30, 2019 at 2:27 pm #
  
  Thanks for sharing.
  
  Reply
Zahra May 6, 2019 at 4:28 pm #

Hello, I’m so confuse.
I try to run this code in command prompt. But, I use my dataset (not Wheat Seeds dataset).

And why this happened? What’s wrong? What should I do? What should I change?
Please, help me!

Traceback (most recent call last):
File “journal.py”, line 197, in
scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
File “journal.py”, line 81, in evaluate_algorithm
predicted = algorithm(train_set, test_set, *args)
File “journal.py”, line 173, in back_propagation
train_network(network, train, l_rate, n_epoch, n_outputs)
File “journal.py”, line 150, in train_network
expected[row[-1]] = 1
IndexError: list assignment index out of range

Reply
- Jason Brownlee May 7, 2019 at 6:12 am #
  
  I cannot know, I can give you some advice on debugging your problem here:
  https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
  
  Reply
Zahra May 9, 2019 at 1:11 am #

Hello, How to import my dataset using that codes?

For example (in this codes), how to use my dataset (use excel file) in this codes.. How to import my dataset in this codes? Can you teach me more detail, please..

# Test training backprop algorithm
seed(1)
dataset = [[2.7810836,2.550537003,0],
[1.465489372,2.362125076,0],
[3.396561688,4.400293529,0],
[1.38807019,1.850220317,0],
[3.06407232,3.005305973,0],
[7.627531214,2.759262235,1],
[5.332441248,2.088626775,1],
[6.922596716,1.77106367,1],
[8.675418651,-0.242068655,1],
[7.673756466,3.508563011,1]]
n_inputs = len(dataset[0]) – 1
n_outputs = len(set([row[-1] for row in dataset]))
network = initialize_network(n_inputs, 2, n_outputs)
train_network(network, dataset, 0.5, 20, n_outputs)
for layer in network:
print(layer)

Reply
- Jason Brownlee May 9, 2019 at 6:46 am #
  
  Perhaps you should start with Keras, it is much easier for beginners:
  https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
  
  Reply
  - Aziz Ahmad July 22, 2019 at 4:42 am #
    
    Sir! I really apprecitae your work.
    
    Reply
    - Jason Brownlee July 22, 2019 at 8:28 am #
      
      Thanks.
      
      Reply
Nirmala May 10, 2019 at 5:37 am #

I got an error called->

IndexError: list assignment index out of range.

but I m using python 3 itself.

Reply
- Jason Brownlee May 10, 2019 at 8:20 am #
  
  Sorry to hear that, I have some suggestions here:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Ido Berenbaum May 11, 2019 at 9:39 pm #

Hi Jason,
thanks for the great tutorial, I learned a lot from it.
There is one thing I didn’t really understand though,
when you update the weights you add to the weight the calculated change that is needed.
but’from what I read in other sites like wikipedia, the change to the weight needs to be multiplied by -1 and then added to ensure it
changes the weight in the opposite direction of the gradient and so getting it closer to the local minimum.
like muhammad said in December 21, 2018:
“hi, thanks for this code.
I’m trying to understand why are u adding on the update weights, shouldnt be
wi←wi−η∂E/∂wi like this?”

and I tried to change line 141 to: neuron[‘weights’][j] -= l_rate * neuron[‘delta’] * inputs[j]
basically doing -= and not += but it just made the sum error of the network to increase after each epoch.

so, I will be grateful if you could explain to me why are you adding and not subtracting.

thanks

Reply
- Jason Brownlee May 12, 2019 at 6:43 am #
  
  There are many ways to implement the algorithm description.
  
  This implementation is based on the description in “neural smithing”:
  https://amzn.to/2pW6hjI
  
  Reply
- cocoa July 16, 2019 at 8:37 am #
  
  Jason seem to use mean square error as loss. partial derivative of loss should be (output-expected). In his “backward” function, he did (expected-output). That’s why he came up with “+=” not “-=”
  
  Reply
  - Jason Brownlee July 16, 2019 at 2:19 pm #
    
    Nice explanation.
    
    Reply
Nirmala May 16, 2019 at 4:23 pm #

In training code and testing code I want to link the onther dataset .txt file but it will not work.please can u send a code for that..

Reply
- Jason Brownlee May 17, 2019 at 5:49 am #
  
  Sorry, I don’t have the capacity to develop custom code for you.
  
  Reply
Arthur May 22, 2019 at 4:03 pm #

Hi Jason, first of all thank you very much for this post, I’m learning ML at the moment, and writing a neural network with backpropagation in C# to help the process.

When using the wheat seeds dataset, and the same network layout as you suggest, I get very similar results to yours in terms of accuracy.

When I scale the network however to have 2 (or more) hidden layers of 5 neurons, I sometimes get exploding gradients (and NaN output results because of it). Is this something you’d expect given the dataset? ie, doesn’t this dataset allow for much more than 1 hidden layer for some reason. (Some context: I normalize the data just like you do, and I use the same dataset as you do, the reason I test with more than 1 layer, is just out of curiosity, and to see whether the accuracy improved – it doesnt)

I’m trying to understand why it happens with this particular data, or whether my implementation fails somehow. Note that I do get good results most of the time, but with a certain weight initialization the exploding gradients can happen.

Reply
- Jason Brownlee May 23, 2019 at 5:54 am #
  
  Perhaps try scaling the data prior to modeling, see this post:
  https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/
  
  Reply
Zahra Nabila May 27, 2019 at 4:00 pm #

Hello, I have problem. Why output must be integers, not float (decimal)? Specially in Train.. How to change output data type to float?

TypeError Traceback (most recent call last)

in train_network(network, train, l_rate, n_epoch, n_outputs)
93 outputs = forward_propagate(network, row)
94 expected = [0 for i in range(n_outputs)]
—> 95 expected[row[-1]] = 1
96 sum_error += sum([(expected[i]-outputs[i])**2 for i in range(len(expected))])
97 backward_propagate_error(network, expected)

TypeError: list indices must be integers or slices, not numpy.float64

Reply
- Jason Brownlee May 28, 2019 at 8:08 am #
  
  The example is classification, you can change it to regression if you wish.
  
  Perhaps this tutorial will be more helpful:
  https://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/
  
  Reply
  - Zahra May 28, 2019 at 1:09 pm #
    
    I want to ask, how to display execution time (running rime) in Train code?
    
    Reply
    - Jason Brownlee May 28, 2019 at 2:44 pm #
      
      Sorry, I don’t have an example of calculating clock time for code examples.
      
      Reply
Jeny June 3, 2019 at 10:06 am #

”’
Calculate the output of a recurrent neural network with tanh activation
and a linear layer on top
Params:
x: input matrix [n_timesteps * n_samples * 2]
w: non-recurrent weights
r: recurrent weights
b: biases
wo: output-layer weights
bo: output-layer biases
Returns:
h: matrix of activations (n_timesteps, n_samples, n_hiddens)
o: final predictions
”’

def forward_path(x, w, r, b, wo, bo):
h = np.empty([t_max, n, w.shape[0]], dtype=np.float32) # storage for the hidden activations
for t in range(t_max):
z = np.dot(x[t], w.T) + b
if t > 0:
z += np.dot(h[t-1], r.T)
h[t] = np.tanh(z)
o = np.dot(h[-1], wo.T) + bo
return h, o

def backward_path(x, h, w, b, r, wo, bo, o, y):
n, t_max, _ = x.shape
dw = np.zeros_like(w)
db = np.zeros_like(b)
dr = np.zeros_like(r)
dwo = 0
dbo = 0

return dw, dr, db, dwo, dbo

def loss(w, r, b, wo, bo, x, y):
_, o = forward_path(x, w, r, b, wo, bo)
err = 0.5*np.sum(np.square(o-y))
return err

Can you help me implement the backpropagation? Please.

Reply
- Jason Brownlee June 3, 2019 at 2:34 pm #
  
  Sorry, I don’t have the capacity to debug your code, perhaps try posting to stackoverflow?
  
  Reply
Zahra Nabila Izdihar June 10, 2019 at 11:53 pm #

Hello…
How to display “predicted” value in your code?

Because I need to display the predicted or forecast value..

Thank you

Reply
- Jason Brownlee June 11, 2019 at 7:54 am #
  
  The forward_propagate() function makes a prediction.
  
  Reply
  - Zahra Nabila Izdihar June 13, 2019 at 2:36 am #
    
    I got it. So, “output” (in ForwardPropagation code)= prediction result?
    
    But, I don’t understand How to determine the weights in forward propagation? What is the formula?
    
    Thank you
    
    Reply
    - Jason Brownlee June 13, 2019 at 6:21 am #
      
      What do you mean exactly?
      
      The weights are learned during training.
      
      Reply
      - Zahra June 16, 2019 at 4:51 am #
        
        Did you mean that “output” (in forward propagatation) is predicted result?
        
        # test forward propagation
        network = [[{‘weights’: [0.13436424411240122, 0.8474337369372327, 0.763774618976614]}],
        [{‘weights’: [0.2550690257394217, 0.49543508709194095]}, {‘weights’: [0.4494910647887381, 0.651592972722763]}]]
        row = [1, 0, None]
        output = forward_propagate(network, row)
        print(output)
      - Jason Brownlee June 16, 2019 at 7:16 am #
        
        Yes.
        
        Perhaps this is too advanced. I recommend starting with Keras instead:
        https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
Leo July 2, 2019 at 11:25 pm #

It is crazy that nobody complain the readability of your codes. Thanks anyway

Reply
- Jason Brownlee July 3, 2019 at 8:34 am #
  
  Sorry that you think that the code is not readable. I thought it was very readable.
  
  What do you believe the problem is exactly?
  
  Reply
Femi July 14, 2019 at 12:17 am #

Good day Dr. I am new to machine learning but have interest in it. My question is this, can i use the Python 2.7 in miniconda to implement your samples?

Reply
- Jason Brownlee July 14, 2019 at 8:15 am #
  
  Yes.
  
  Reply
Ravi July 19, 2019 at 9:52 pm #

Hi Dr. Jason

I have developed and trained a neural network (3 layers: 1 input, 1 hidden and 1 output) for following situation

(The code was written step by step, as i do not want to use a tool without understanding the computations)

Data set (40 input patterns):

Input: 40 samples 5 elements
Output: 40 samples 1 element
number of neurons (Input = 5; hidden = 5; output = 1)

Using the delta rule with backpropagation algorithm, i was able to achieve error = 9.39E-06 for 1000 iterations

My final “input to hidden layer” weight matrix size is 200 x 5 (as i have 40 samples x 5 input neurons and 5 hidden neurons)

“hidden to output layer” weight matrix size is 200 x 1 (as i have 40 samples x 5 hidden neurons and 1 output neuron)

Now my question is for a given test sample having 5 elements (input is 1 sample 5 elements),

i need to run feed-forward computation to get a single element output.

For running this which weights i need to select in “input to hidden layer” and “hidden to output layer” from the trained set??

I have 200 x 5 and 200 x 1 weight matrices; but i require only 5 x 5 and 5 x 1 weight matrices for testing.

Kindly let me know if i am missing something here?

Thanks in advance

Ravi

Reply
Chrissie Li July 25, 2019 at 9:06 pm #

hi, how can i download the code? copy the tip’s script?

Reply
- Jason Brownlee July 26, 2019 at 8:22 am #
  
  I show you how right here:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-copy-code-from-a-tutorial
  
  Reply
Femi July 25, 2019 at 9:39 pm #

Prof Sir, pls i have been finding it difficult to implement your samples when i can not prepare environment that accept all the command. Pls help

Reply
- Jason Brownlee July 26, 2019 at 8:23 am #
  
  This tutorial will show you how to setup your environment:
  https://machinelearningmastery.com/setup-python-environment-machine-learning-deep-learning-anaconda/
  
  Reply
Femi July 31, 2019 at 12:39 am #

Sir, I guessed you use scipy environment. am i right?

Reply
- Jason Brownlee July 31, 2019 at 6:53 am #
  
  For this tutorial, a simple Python environment is enough.
  
  Reply

Majed August 4, 2019 at 8:10 am #

I wrote a neural network that consists of three layers as follows:[ 4 input neurones – 5 hidden neurones – 3 output neurones]. first, I standerdized the data using the z-score. The accuracy of my model exceeded 67. Note: I didn’t use the regularisation terms yet.
here is my implementation of both feedforward and back prop ..

    while iteration < number_of_iterations:
        z2 = np.dot(x, w1)
        a2 = sigmoid(z2)
        z3 = np.dot(a2, w2)
        a3 = sigmoid(z3)  # this represents the output of the network
        error = loss_function(a3, label_matrix)
        delta_3 = np.multiply(-(label_matrix-a3), sigmoid_prime(z3))
        dJW2 = np.dot(a2.transpose(), delta_3)
        delta_2 = np.dot(delta_3, w2.transpose()) * sigmoid_prime(z2)
        dJW1 = x.transpose() @ delta_2
        w2 = w2 - (learning_rate * dJW2)
        w1 = w1 - (learning_rate * dJW1)
        iteration = iteration + 1  # update the counter :')
    return w1, w2

while iteration < number_of_iterations:

z2 = np.dot(x, w1)

a2 = sigmoid(z2)

z3 = np.dot(a2, w2)

a3 = sigmoid(z3) # this represents the output of the network

error = loss_function(a3, label_matrix)

delta_3 = np.multiply(-(label_matrix-a3), sigmoid_prime(z3))

dJW2 = np.dot(a2.transpose(), delta_3)

delta_2 = np.dot(delta_3, w2.transpose()) * sigmoid_prime(z2)

dJW1 = x.transpose() @ delta_2

w2 = w2 - (learning_rate * dJW2)

w1 = w1 - (learning_rate * dJW1)

iteration = iteration + 1 # update the counter :')

return w1, w2

Thanks …

Majed August 4, 2019 at 8:11 am #

The data set that I worked on is the Iris data set

Reply
Jason Brownlee August 5, 2019 at 6:43 am #

Well done!

Reply

Ekundayo August 6, 2019 at 9:40 pm #

Ca I still get an help this one time? Pls sir, my project is to use 14 features extractions for plant leave classification. I need to recognize one leaf out of 36 leaves all with 14 features. Sir can I use your code to achieve this?
Thanks

Reply
- Jason Brownlee August 7, 2019 at 7:52 am #
  
  I would recommend this tutorial, e.g. transfer learning:
  https://machinelearningmastery.com/how-to-use-transfer-learning-when-developing-convolutional-neural-network-models/
  
  Reply
Mohammed August 14, 2019 at 12:20 pm #

Hi Dr. Jason

Thank you for this post, it is really very helpful.

I have one question about backpropagation in unsupervised model, e.g. extract features.
Is it possible to apply this code for it, and only replaces loss function of unsupervised model by the loss function of supervised?

Regards

Reply
- Jason Brownlee August 14, 2019 at 2:10 pm #
  
  Backpropagation is for supervised learning, not unsupervised learning.
  
  Reply
  - Mohammed August 15, 2019 at 5:08 pm #
    
    Oh! many thanks,So, can help me what is the way for learning parameters in unsupervised approach.
    if i need to extract the features from data as low dimension nested of data with large dimension.
    
    Reply
    - Jason Brownlee August 16, 2019 at 7:47 am #
      
      There are specialized techniques for unsupervised learning neural nets, perhaps start with the SOM:
      http://cleveralgorithms.com/nature-inspired/neural/som.html
      
      Reply
Mohammed August 15, 2019 at 5:10 pm #

such as Unsupervised feature learning with Sparse Filtering!

Reply
- Jason Brownlee August 16, 2019 at 7:48 am #
  
  Sorry, I don’t have a tutorial on that topic, perhaps in the future.
  
  Reply
Mohammed August 16, 2019 at 11:20 am #

Thank you so much Dr. Jason.

Reply
- Jason Brownlee August 16, 2019 at 2:10 pm #
  
  You’re welcome.
  
  Reply

Cherinet Mores August 20, 2019 at 4:44 pm #

Jason Brownlee
Thank you for your continues help
Here I have one questions,
In case, if i want to solve the regression problem (Meaning, if I have 3 real value outputs from the input parameters) which part of the code should be modified and How?

Thank you very much

Steven Pauly August 20, 2019 at 9:59 pm #

Hi Cherinet, I’ve changed the n_outputs to 1 and the function train_network, I’ve changed the below. I’ve increased the n_epoch to a lot higher, because else it will give you the average. Be sure to normalize your input & output values, though.

def train_network(network, train, l_rate, n_epoch, n_outputs):
	for epoch in range(n_epoch):
		sum_error = 0
		for row in train:
			outputs = forward_propagate(network, row)
			expected = [row[-1] for i in range(n_outputs)]
			sum_error = sum([(expected[i]-outputs[i])**2 for i in range(len(expected))])
			backward_propagate_error(network, expected)
			update_weights(network, row, l_rate)
		print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error))

def train_network(network, train, l_rate, n_epoch, n_outputs):

for epoch in range(n_epoch):

sum_error = 0

for row in train:

outputs = forward_propagate(network, row)

expected = [row[-1] for i in range(n_outputs)]

sum_error = sum([(expected[i]-outputs[i])**2 for i in range(len(expected))])

backward_propagate_error(network, expected)

update_weights(network, row, l_rate)

print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error))

Jason Brownlee August 21, 2019 at 6:42 am #

Thanks for sharing.

Reply
Cherinet Mores August 21, 2019 at 8:06 am #

Dear Steven Pauly thank you very much for your help.

Reply

Charles September 17, 2019 at 1:01 pm #

Hi,

By normalizing input and output, do you mean modifying the forward_propogate method like this?

# Forward propagate input to a network output
def forward_propagate_regression(network, row):
    inputs = row
    new_inputs = []
    # gets the 1st layer, applies sigmoid activation
    hiddenlayer = network[0]
    for neuron in hiddenlayer:
        activation = activate(neuron['weights'], inputs)
        neuron['output'] = transfer(activation)
        new_inputs.append(neuron['output'])
    inputs = new_inputs
    # gets the last layer, applies linear activation
    outputlayer = network[-1]
    for neuron in outputlayer:
        activation = activate(neuron['weights'], inputs)
        neuron['output'] = activation
        new_inputs.append(neuron['output'])
    inputs = new_inputs
    return inputs

# Forward propagate input to a network output

def forward_propagate_regression(network, row):

inputs = row

new_inputs = []

# gets the 1st layer, applies sigmoid activation

hiddenlayer = network[0]

for neuron in hiddenlayer:

activation = activate(neuron['weights'], inputs)

neuron['output'] = transfer(activation)

new_inputs.append(neuron['output'])

inputs = new_inputs

# gets the last layer, applies linear activation

outputlayer = network[-1]

for neuron in outputlayer:

activation = activate(neuron['weights'], inputs)

neuron['output'] = activation

new_inputs.append(neuron['output'])

inputs = new_inputs

return inputs

Jason Brownlee August 21, 2019 at 6:36 am #

Change the output to be a linear activation and the loss function to mse.

Reply

Steven Pauly August 20, 2019 at 9:56 pm #

Well Done, Jason! Great stuff!!!

Reply
- Jason Brownlee August 21, 2019 at 6:41 am #
  
  Thanks, I’m glad it helped.
  
  Reply
George Shannon September 25, 2019 at 12:09 am #

Dear Dr. Brownlee:

Do you have an example of doing the same thing but with backprop using momentum?

George

Reply
- Jason Brownlee September 25, 2019 at 5:59 am #
  
  You can easily update the example to add momentum.
  
  Reply
  - shahrul December 8, 2020 at 7:29 pm #
    
    can you show me how to add momentum in this tutorial, I quite confuse how to apply the momentum calculation.
    
    Reply
    - Jason Brownlee December 9, 2020 at 6:13 am #
      
      This is a common question that I answer here:
      https://machinelearningmastery.com/faq/single-faq/can-you-change-the-code-in-the-tutorial-to-___
      
      Reply
Harini October 6, 2019 at 3:45 am #

Dear Sir,

This tutorial is really helpful for a beginner like me. I couldn’t understand where the input and output nodes are mentioned in the code. How to change number of nodes for input and output layer. Kindly help me with it.

Regards

Reply
- Jason Brownlee October 6, 2019 at 8:17 am #
  
  Perhaps start with this even simpler example:
  https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
  
  Reply
Víctor October 21, 2019 at 5:06 am #

Hello Jason,

Thanks for this great tutorial. How can the trained model be saved with this example? I mean with pickle or joblib in a .sav file like in other scikit classifiers. Thanks.

Regards

Reply
- Jason Brownlee October 21, 2019 at 6:25 am #
  
  I recommend using sklearn for real projects, this code is for learning purposes only.
  
  That being said, you can save the “network” prepared in the backpropagation function.
  
  Reply
chamodi October 31, 2019 at 5:53 pm #

Thank you very much sir..your articles are always very clear,greatly supporting in coding.

Reply
- Jason Brownlee November 1, 2019 at 5:26 am #
  
  Thanks!
  
  Reply
Jaya November 9, 2019 at 3:10 pm #

Thanks for great tutorial. How can we determine the number of neuron we used in each of hidden layer?. Thanks

Reply
- Jason Brownlee November 10, 2019 at 8:16 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/how-many-layers-and-nodes-do-i-need-in-my-neural-network
  
  Reply
  - Jaya November 13, 2019 at 12:20 am #
    
    Thanks for the answer, i mean in your code is there variable that we can set to determine the sum of neuron in each hidden layer.
    And one question more, is there case that epoch looping will stop when the error values is small or epoch looping is always looping until finish to max epoch?
    sorry for my bad english and im a newbie in neural network
    thanks you so much
    
    Reply
    - Jason Brownlee November 13, 2019 at 5:45 am #
      
      No, in this example we run for a fixed number of epochs.
      
      Reply
Jean November 27, 2019 at 2:08 am #

Hello Jason,
Thanks for great content.
Nevertheless, I think it would much better if you could also write down the mathematical equation behind the code(s). It would be much easier to understand how all “those scary math” are implemented.
Anyway very good job!

Regards,
Jean

Reply
- Jason Brownlee November 27, 2019 at 6:09 am #
  
  Thanks for the suggestion.
  
  Reply
Tobias December 7, 2019 at 7:27 am #

It does not work for xor but it works for the first data you used. Why?

Reply
- Jason Brownlee December 8, 2019 at 6:00 am #
  
  The network was designed for a specific dataset.
  
  Reply
Tobias December 7, 2019 at 7:54 am #

this is what it is outputting

Expected=1, Got=0
Expected=1, Got=0
Expected=0, Got=0
Expected=0, Got=0

Reply
- Jason Brownlee December 8, 2019 at 6:02 am #
  
  Perhaps try tuning the architecture and training of the model to your specific dataset.
  
  Reply
Samara Silva Santos December 8, 2019 at 1:13 am #

Hii, I would like to know what do you mean when you say that “Using the Zero Rule algorithm that predicts the most common class value, the baseline accuracy for the problem is 28.095%.”

If I use this algoritm for another use case, the accuracity is just 28%?

Please, look what I have:
I need to modify this approach to use Quasi-newton method to calculate the error, instead of gradient method. The gradient method, what you have used, use partial derivative to calculate if the error is growing on. I see that you implemented derivative this way:

def transfer_derivative(output):
return output * (1.0 – output)

And what I know is that derivate is calculated this way:

( f(x + h) – f( x) ) /h

this both way are equivalent?

I already have quasi-newton method implemented but it is now really difficult to me make this modification.

Please, let me know if you could help me. I really appreciate your help.

Reply
- Jason Brownlee December 8, 2019 at 6:14 am #
  
  I mean predicting the majority class. It is a naive classifier sometimes called the zero rule.
  
  Sorry, I don’t have the capacity to help you adapt the example to use a different optimization algorithm.
  
  Reply
Jeff Myzek December 9, 2019 at 9:49 am #

Hey Jason,
I am trying to use your code to run back propagation on MNIST with the following parameters but i am having trouble: 784 input units, a hidden layer of 100, and a Softmax group of 10 units as the output layer, cross-entropy loss objective function. I want to compute the weight update based on the entire training set, using the error backpropagation algorithm. learning rate that’s small enough for all practical purposes, but not so small that the network doesn’t learn. And I want to stop when the weight update becomes zero. Optimally i would want to see the weight vector and loss at each step. would you be able to assist me?

Reply
- Jason Brownlee December 9, 2019 at 1:43 pm #
  
  I would recommend using mini-batches to approximate the error gradient.
  
  Reply
Sabarish December 11, 2019 at 6:20 pm #

Back Propagate Error:

We are using the sigmoid transfer function, the derivative of which can be calculated as follows:

derivative = output * (1.0 – output)

What does it mean? I am not clear. Could you please help me understand?
Sigmoid function =1/1+e**-x
How come derivative of it be output * (1.0 – output)??

Reply
- Jason Brownlee December 12, 2019 at 6:15 am #
  
  The gradient or slope at a point on the function.
  https://en.wikipedia.org/wiki/Logistic_function
  
  Reply
  - Job December 10, 2021 at 3:49 am #
    
    Yes but the derivative of 1/1+e**-x
    is equel to (e**-x)/((e**-x)**2)
    and not x*(1-x)
    is it so that the error rises as you get further from x = 0??
    
    Reply
    - Adrian Tam December 10, 2021 at 4:26 am #
      
      It is y = 1/(1+e**-x)
      and then differentiation is y’ = y*(1-y)
      
      Reply
Sylvan December 17, 2019 at 7:25 am #

Hello here!

I am very new with Python in Data Science and Artificial Intelligence. Can anyone here help me with this AI assignment below due by December 19 2019, please? I am seriously stuck. Here is the question:

<> End of the question.
———————————————————————

Below is the indicator simple code:

#Indicators.py

# Import Built-Ins
import logging

# Import Third-Party
import pandas as pd
import numpy as np

# Import Homebrew

# Init Logging Facilities
log = logging.getLogger(__name__)

#
from alpha_vantage.timeseries import TimeSeries
import matplotlib.pyplot as plt

# Add get_price() def from get_price_alphavantagepy code
def get_prices():
apikey = “BW4V00IXHSAE829D”

ts = TimeSeries(key=apikey, output_format=’pandas’)
data, meta_data = ts.get_intraday(symbol=’MSFT’,interval=’1min’, outputsize=’full’)
data[‘4. close’]

# End add get_price() def from get_price_alphavantagepy code

#plt.title(‘Intraday Times Series for the MSFT stock (1 min)’)
#plt.show()
return data[‘4. close’] #return price

#if __name__ == “__main__”:
# get_prices()

def rsi(price, n=14): #rsi(prices, n=14):
deltas = np.diff(prices)
seed = deltas[:n+1]
up = seed[seed>=0].sum()/n
down = -seed[seed0:
upval = delta
downval = 0.
else:
upval = 0.
downval = -delta
up = (up*(n-1) + upval)/n
down = (down*(n-1) + downval)/n

rs = up/down
rsi[i] = 100. – 100./(1.+rs)
return rsi
prices = get_prices()
print(“\n”)
print(rsi(prices))
print(“\n”)

——————————–

Very thank you in advance.

*Sylvan

Reply
- Jason Brownlee December 17, 2019 at 7:34 am #
  
  Perhaps try posting your code and question to stackoverflow?
  
  Reply
bismeet December 22, 2019 at 5:12 pm #

row = [1, 0, None]
Cant understand the use of None here.

Reply
- Jason Brownlee December 23, 2019 at 6:44 am #
  
  The final value in the row is the class label. Here we set None, as in no class label.
  
  Reply
  - bismeet December 24, 2019 at 12:40 pm #
    
    I still don’t understand , how can an input have no class label?
    
    Reply
    - Jason Brownlee December 24, 2019 at 4:58 pm #
      
      In the case where we want to make a prediction.
      
      Reply
bismeet December 22, 2019 at 9:40 pm #

Why are there two formulas for error?

error = (expected – output) * transfer_derivative(output)

error = (weight_k * error_j) * transfer_derivative(output)

Reply
- Jason Brownlee December 23, 2019 at 6:49 am #
  
  They are the same, but one for the output of the model and one for credit assignment for each weight.
  
  Reply
Vaishu December 24, 2019 at 9:23 pm #

Why can’t we use same backpropagation algorithm code for wheatseed_dataset as the one used in previous case?

Reply
- Jason Brownlee December 25, 2019 at 10:36 am #
  
  You can. We do.
  
  Reply
Ansist January 19, 2020 at 1:49 am #

Hi, I am a little new to the implementing neural networks and the underlying mathematics. I wanted to know why the target variable (y-variable) is usually binary in nature ([0 or 1]). Why can’t I have, say for example returns (usually between [-1,1] continuous)?

Secondly, is it always advised to transform your X and Y variables before feeding them into the neural network?

Reply
- Jason Brownlee January 19, 2020 at 7:17 am #
  
  You can, it is common to use 0 and 1 with a sigmoid activation function in the output layer.
  
  It is good practice to scale data:
  https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/
  
  Reply
rafael gamboa January 23, 2020 at 10:54 am #

it function perfectly!!!

Reply
- Jason Brownlee January 23, 2020 at 12:56 pm #
  
  Thanks, I’m, happy to hear that.
  
  Reply
Bram January 28, 2020 at 6:00 pm #

hi jason,
I just finished the tutorial, this tutorial is very helpfull for me as the beginner in python and neural network. i have some question for the k-fold validation

in the tutorial above I see if every fold process need to initialize a new network. Does the neural network work like that? i think the network will only be initialized once and the network will be used in the next fold? not initialize a new one. what if i use it in a real case ?
I might be wrong, please correct me.

Reply
- Jason Brownlee January 29, 2020 at 6:31 am #
  
  Yes, k-fold cross-validation means fitting and evaluating k different models and averaging their performance.
  
  You can learn more here:
  https://machinelearningmastery.com/k-fold-cross-validation/
  
  Reply
ssrinath February 19, 2020 at 3:22 am #

hello jason brownlee

I am a CSE student and can you help us in predicting weather using neural network backpropagation

Reply
- Jason Brownlee February 19, 2020 at 8:07 am #
  
  Perhaps start here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Salvador February 20, 2020 at 9:43 am #

Hello Jason,
I receive this error in spyder :
IndexError: list assignment index out of range.
Do you know where is the error?

Reply
- Jason Brownlee February 20, 2020 at 11:28 am #
  
  This will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Melkamu February 22, 2020 at 1:04 pm #

Hey Jason! Thank you so much for your best tutorial

Reply
- Jason Brownlee February 23, 2020 at 7:22 am #
  
  You’re welcome.
  
  Reply
Melkamu February 22, 2020 at 1:15 pm #

Hello Jason i am new for Python and i seriously follow your tutorial because i wanna to design my own prediction model using neural network with back propagation algorithm. but when i try to write this code on Jupiter notebook on python 3.6 “list index out of range ” error message displayed. Could you correct me ? the code i tried and error is from random import seed
from random import random
from math import exp

# Initialize a network
def initialize_network(n_inputs, n_hidden, n_outputs):
network = list()
hidden_layer = [{‘weights’:[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
network.append(hidden_layer)
output_layer = [{‘weights’:[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
network.append(output_layer)
return network

seed(1)
network = initialize_network(5, 6, 1)
for layer in network:
print(layer)
#Calculate neuron activation for an input
def activate(weights, inputs):
activation = weights[-1]
for i in range(len(weights)-1):
activation += weights[i] * inputs[i]
return activation

# Transfer neuron activation
def transfer(activation):
return 1.0 / (1.0 + exp(-activation))

# Forward propagate input to a network output
def forward_propagate(network, row):
inputs = row
for layer in network:
new_inputs = []
for neuron in layer:
activation = activate(neuron[‘weights’], inputs)
neuron[‘output’] = transfer(activation)
new_inputs.append(neuron[‘output’])
inputs = new_inputs
return inputs

# test forward propagation
network = [[{‘weights’: [0.13436424411240122,
0.8474337369372327,
0.763774618976614,
0.2550690257394217,
0.49543508709194095,
0.4494910647887381],
‘output’: 0.7853169772903308}],
[{‘weights’: [0.651592972722763,
0.7887233511355132,
0.0938595867742349,
0.02834747652200631,
0.8357651039198697,
0.43276706790505337]}],
[{‘weights’: [0.762280082457942,
0.0021060533511106927,
0.4453871940548014,
0.7215400323407826,
0.22876222127045265,
0.9452706955539223]}],
[{‘weights’: [0.9014274576114836,
0.030589983033553536,
0.0254458609934608,
0.5414124727934966,
0.9391491627785106,
0.38120423768821243]}],
[{‘weights’: [0.21659939713061338,
0.4221165755827173,
0.029040787574867943,
0.22169166627303505,
0.43788759365057206,
0.49581224138185065]}],
[{‘weights’: [0.23308445025757263,
0.2308665415409843,
0.2187810373376886,
0.4596034657377336,
0.28978161459048557,
0.021489705265908876]}],
[{‘weights’: [0.8375779756625729,
0.5564543226524334,
0.6422943629324456,
0.1859062658947177,
0.9925434121760651,
0.8599465287952899,
0.12088995980580641]}]]
row = [0, 1, 0, 0, 0]
output = forward_propagate(network, row)
print(output)
and the error is
IndexError Traceback (most recent call last)
in
85 0.12088995980580641]}]]
86 row = [0, 1, 0, 0, 0]
—> 87 output = forward_propagate(network, row)
88 print(output)

in forward_propagate(network, row)
33 new_inputs = []
34 for neuron in layer:
—> 35 activation = activate(neuron[‘weights’], inputs)
36 neuron[‘output’] = transfer(activation)
37 new_inputs.append(neuron[‘output’])

in activate(weights, inputs)
20 activation = weights[-1]
21 for i in range(len(weights)-1):
—> 22 activation += weights[i] * inputs[i]
23 return activation
24

IndexError: list index out of range

Reply
- Jason Brownlee February 23, 2020 at 7:22 am #
  
  Perhaps don’t use a notebook. See this:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Pavitra February 29, 2020 at 12:12 am #

Hello Jason,
I’am new to machine learning and I want to build a model using back propagation. And I am using this code for my project. This code works perfectly. But I want the prediction when I input the value so can you please tell me how can I get the prediction ?

Reply
- Jason Brownlee February 29, 2020 at 7:15 am #
  
  Yes, you can change the example to just train the network then call the predict function on new data.
  
  If this is challenging for you, perhaps use a library instead, like keras:
  https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
  
  Reply
  - Pavitra February 29, 2020 at 11:12 am #
    
    Thank you so much sir.
    
    Reply
    - Jason Brownlee March 1, 2020 at 5:19 am #
      
      You’re welcome.
      
      Reply
  - Pavitra February 29, 2020 at 2:50 pm #
    
    Could you please tell me how to use the predict function.
    
    Reply
    - Jason Brownlee March 1, 2020 at 5:21 am #
      
      You can adapt the example in the tutorial directly. Sorry, I cannot prepare a custom code example for you.
      
      If it is challenging for you (which it sounds like it is), I recommend using a library instead:
      https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
      
      Reply
Lucas March 3, 2020 at 2:20 am #

Hello Jason,

i’ve noticed that (in chapter 4) you use the labels (0,1) as a bias constant for the bias weight multiplication in the activation function. Shouldn’t you theoratically set the all labels temporarily to 1, otherwise samples with label 0 will have no bias?

Reply
- Jason Brownlee March 3, 2020 at 6:01 am #
  
  No.
  
  Why do you think this?
  
  Reply
  - Lucas March 5, 2020 at 1:55 am #
    
    Oh nvm sorry, i missed that you add the bias explicitly “activation = weights[-1]”.
    
    Most of the books I’ve read add a temporary “1” to the inputs, so that the dot product doesn’t exclude the bias. So i falsly asumed you wanted to set the labels temporarly to one “activation = activate(neuron[‘weights’], inputs)” because the inputs include the labels (i already thought this would be a weird way to do it).
    
    Btw thanks for the excellent tutorial.
    
    I also tried to implement a multi-layer nn which uses np.arrays for efficent matrix multiplication. But somehow my weights get really small really fast. Is this a general problem with nn’s or is it probably a problem with my activation function?
    I use reLu for the hidden layers and sigmoid for the output layer.
    
    Reply
    - Jason Brownlee March 5, 2020 at 6:39 am #
      
      NN can be hard to debug, it could be a hyperparameter or it could be a bug in your implementation.
      
      Moving to a standard lib is highly recommended at some point.
      
      Reply
Heritiera fomes March 3, 2020 at 5:50 am #

Hello Jason,

I have to implement a placement problem, where I need to place some students in different classes, where every classes have some capacity. In that case how can I relate ANN with these?
If I want to add some constraints in the ANN, how can I add these constraint? for example when a test case (student) is going to be predict in which class it is assigned. My porblem needs to check the capacity of the class, then all the students must be assigned to a class.
It would be great help if I hear from you.

Reply
- Jason Brownlee March 3, 2020 at 6:05 am #
  
  o, sounds like an constraint satisfaction / optimization problem. Look at operations research / dynamic programming.
  
  Reply
  - Heritiera fomes March 4, 2020 at 1:44 am #
    
    Thanks for your reply.
    
    If i want to add some constraints during prediction how should I do this?
    
    Reply
    - Jason Brownlee March 4, 2020 at 5:57 am #
      
      Use the Keras API:
      https://machinelearningmastery.com/how-to-reduce-overfitting-in-deep-neural-networks-with-weight-constraints-in-keras/
      
      Reply
A Kranthi Kiran March 4, 2020 at 9:06 pm #

can I know how to build a front end for this model using flask? or
is there any other best way to build a front end rather than flask?

Reply
- Jason Brownlee March 5, 2020 at 6:34 am #
  
  I don’t have an example, sorry.
  
  Reply
- Eunike Kamase Elisabeth August 8, 2020 at 5:28 pm #
  
  hello, have you already know how to build the front end for backpropagation with flask?
  
  Reply
Prabhu Prasad Dev March 6, 2020 at 11:42 pm #

Is there any code or how to implement Spiking Neural Network(SNN).. I am very much interested to know about SNN bcoz it is the 3rd generation of neural network..Can u plz help me of details about SNN???

Reply
- Jason Brownlee March 7, 2020 at 7:17 am #
  
  I don’t have an example, sorry.
  
  Reply
Namitha Dsouza March 8, 2020 at 4:43 pm #

I am new to this field. I am sorry if you do not get my question.

# Train a network for a fixed number of epochs
def train_network(network, train, l_rate, n_epoch, n_outputs):
for epoch in range(n_epoch):
for row in train:
outputs = forward_propagate(network, row)
expected = [0 for i in range(n_outputs)]
expected[row[-1]] = 1
backward_propagate_error(network, expected)
update_weights(network, row, l_rate)

What is the use of these two lines? Is it only for binary classification or any classification with 3 or more classes can use this? Because this works perfectly for binary classification. But for other classifications, it gives an error.

expected = [0 for i in range(n_outputs)]
expected[row[-1]] = 1

Reply
- Jason Brownlee March 9, 2020 at 7:15 am #
  
  We are one hot encoding the target class.
  
  Reply
Carlos Meza March 13, 2020 at 1:47 pm #

Hello!! Im new on this. If I want to add 9 input variables instead of 7. What do I need to change in the code in order to make it work. Amazing publication!

Reply
- Jason Brownlee March 13, 2020 at 1:50 pm #
  
  Perhaps start with this much simler tutorial:
  https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
  
  Reply
  - Carlos Meza March 13, 2020 at 2:14 pm #
    
    Hello, thanks for your reply. I looked into it, you use [:,0:8] to define de input variables. However in this code is different, thats why I’m confused. Any other clue?
    
    Reply
    - Jason Brownlee March 14, 2020 at 8:04 am #
      
      Ah, if you are new to list/array slicing, this will help:
      https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-python/
      
      Reply
Alex Ramirez March 16, 2020 at 11:15 am #

Hello! How to calculate the recall/precision/F1Score from this excersise?

Reply
- Jason Brownlee March 16, 2020 at 1:31 pm #
  
  You could use the sklearn library to calculate the required metrics:
  https://scikit-learn.org/stable/modules/classes.html#sklearn-metrics-metrics
  
  I don’t have the capacity to implement this for you. If it is too advanced, I strongly recommend using Keras instead:
  https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
  
  Reply
Sumanta Das March 30, 2020 at 3:01 am #

How to modify the code to work with GPUs, without using fancy libraries?

Reply
- Jason Brownlee March 30, 2020 at 5:37 am #
  
  Not sure you can. Fancy libraries (keras on tensorflow) let you use the GPU.
  
  Reply
Abhishek March 30, 2020 at 10:13 pm #

Hi Jason,

Trying to execute, but I’m facing this Error. I’m running the Code on Spyder (Python 3.7)

Traceback (most recent call last):

File “”, line 1, in
runfile(‘C:/Users/duppa/Desktop/Wheat Seed Code New.py’, wdir=’C:/Users/duppa/Desktop’)

File “C:\Users\duppa\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py”, line 786, in runfile
execfile(filename, namespace)

File “C:\Users\duppa\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py”, line 110, in execfile
exec(compile(f.read(), filename, ‘exec’), namespace)

File “C:/Users/duppa/Desktop/Wheat Seed Code New.py”, line 204, in
scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)

File “C:/Users/duppa/Desktop/Wheat Seed Code New.py”, line 82, in evaluate_algorithm
train_set = sum(train_set, [])

File “C:\Users\duppa\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py”, line 2076, in sum
initial=initial)

File “C:\Users\duppa\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py”, line 86, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)

TypeError: ‘list’ object cannot be interpreted as an integer

Reply
- Abhishek March 31, 2020 at 6:02 am #
  
  Earlier, I tried running the code on Spyder (Python 3.7) and faced the Error of (TypeError: ‘list’ object cannot be interpreted as an integer). But when I executed the same code on Jupyter notebook I haven’t faced any Error. Output was successful
  
  Reply
  - Jason Brownlee March 31, 2020 at 8:19 am #
    
    Happy to hear that.
    
    I recommend not using an IDE or notebook in general:
    https://machinelearningmastery.com/faq/single-faq/why-dont-use-or-recommend-notebooks
    
    Reply
- Jason Brownlee March 31, 2020 at 8:09 am #
  
  I’m sorry to hear that, this will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Robin April 8, 2020 at 1:04 am #

Hi Jason,

Just wanted to extend my thanks for your tutorial. Two months ago I wanted to learn Python and was in the middle of learning more about AI and ML and used your tutorial to help me implement my first neural network. A lot of literature and texts use a more mathematical and lower level approach to neural networks using matrices, etc which isn’t intuitive to me but your tutorial just clicked as it was easy to conceptualize and was a more simple approach.

One of the ways I cement my own knowledge is writing about things I’m working on or worked on, and to really commit neural networks to memory, I wrote a tutorial myself for a neural network using learning rate and momentum parameters.

https://github.com/stratzilla/neural-network-tutorial/blob/master/neural-network-tutorial.ipynb

I would love to put your web page in an acknowledgements section if you were okay with that as I don’t think I would have figured out neural networks if it weren’t for your site.

Regards,
Robin

Reply
- Jason Brownlee April 8, 2020 at 7:55 am #
  
  Thanks Robin, well done on your progress!
  
  Yes, please link back.
  
  Reply
tom April 12, 2020 at 2:39 pm #

Hey Jason, amazing tutorial on how to implement a 3 layer neural network with 200 lines of code.

I have one question though, for the back propagation part, why is that error of (layer j, neutron i) = summation of (weight_k * error_j) * transfer_derivative(output)? Can you explain a little bit on mathematics? I know the error is derivatives of cost function, but how do you know the connection between error of current layer and error of next layer?

Reply
- Jason Brownlee April 13, 2020 at 6:11 am #
  
  Thanks.
  
  Sorry, I don’t dive into the theory, I recommend a good textbook like the 2016 “deep learning” or 1999 “neuralsmithing”.
  
  Reply
Quang Huy Chu April 12, 2020 at 11:51 pm #

Hi, first of all, thank you for posting this, it helps me very much in my Master’s research. But at my research, 4 output is required and at try to put my dataset, which is 48 samples of 16 inputs node and 4 output.
My question is:
As my research, with my small size dataset, choosing k = dataset size (48) is needed, apply k = 48, l_rate = 0.3, n_hidden = 42. But according to the result, the prediction is always repeated [0,0,0,0,0,0] ; [2,2,2,2,2,2] ; [0,0,0,0,0,0] ; [3,3,3,3,3,3], with different k also give the same prediction result (0,2,0,3). Can you figure it out why my NN give that strange result ?

Thank you very much.

Reply
- Quang Huy Chu April 12, 2020 at 11:58 pm #
  
  For example, here is my result:
  \NN_BP_1.py
  Predicted: [0, 0, 0, 0, 0, 0]
  Actual: [0, 1, 0, 3, 0, 2]
  Predicted: [2, 2, 2, 2, 2, 2]
  Actual: [2, 1, 2, 3, 0, 1]
  Predicted: [0, 0, 0, 0, 0, 0]
  Actual: [0, 2, 1, 0, 1, 3]
  Predicted: [3, 3, 3, 3, 3, 3]
  Actual: [1, 1, 3, 2, 0, 3]
  Predicted: [0, 0, 0, 0, 0, 0]
  Actual: [0, 0, 0, 1, 1, 0]
  Predicted: [2, 2, 2, 2, 2, 2]
  Actual: [2, 3, 1, 2, 0, 2]
  Predicted: [2, 2, 2, 2, 2, 2]
  Actual: [3, 2, 2, 1, 3, 3]
  Predicted: [1, 1, 1, 1, 1, 1]
  Actual: [3, 3, 1, 2, 3, 2]
  scores: [50.0, 33.33333333333333, 33.33333333333333, 33.33333333333333, 66.66666666666666, 50.0, 33.33333333333333, 16.666666666666664]
  Mean Accuracy: 39.583%
  
  Can you figure it out why I have this problem or the problem is my dataset is not good?
  
  Reply
  - Jason Brownlee April 13, 2020 at 6:18 am #
    
    Perhaps try alternate model configurations?
    Perhaps try alternate training configurations?
    Perhaps try scaling the data?
    Perhaps try monitoring loss during training.
    
    See these tutorials for debugging neural nets:
    https://machinelearningmastery.com/start-here/#better
    
    Reply
- Jason Brownlee April 13, 2020 at 6:17 am #
  
  You’re welcome.
  
  Perhaps use the Keras API instead, it will be much easier for you:
  https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
  
  Reply
  - Quang Huy Chu April 13, 2020 at 11:17 am #
    
    Hi Jason, thanks for the reply, after using your model and test with other datasets I found on the internet, your model works properly. Maybe it is my dataset is the problem.
    
    Reply
    - Jason Brownlee April 13, 2020 at 1:50 pm #
      
      Thanks.
      
      Reply
Bia April 14, 2020 at 5:14 am #

Hi Jason!

First, Thank you for this good tutorial, it really helps a lot.
My question is that in your code you just use one hidden layer but how to add more hidden layers in same above code. Kindly guide me as I am a beginner. Thank you:)

Reply
- Jason Brownlee April 14, 2020 at 6:29 am #
  
  You’re welcome.
  
  If you are a beginner, I recommend starting here:
  https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
  
  Reply
audrey April 14, 2020 at 5:48 am #

did you find something about Relu? i need this too thanks

Reply
- Jason Brownlee April 14, 2020 at 6:31 am #
  
  See thus tutorial:
  https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/
  
  Reply
Alex April 16, 2020 at 4:56 am #

If the transfer derivative is = output * (1.0 – output), when calculating errors in a hidden layer, which has a bias = 1, doesn’t that mean the transfer derivative is always 1 * (1-1) = 0 for the bias node? Therefore the error of the bias node is always 0 because error = (weight_k * error_j) * transfer_derivative(output)?

If that’s true, then the weights from the bias node never update because you multiply by the error.

I assume I’m missing something. How do the weights from a bias node get updated (i.e. how is the error of the bias ever not 0)?

Reply
- Jason Brownlee April 16, 2020 at 6:09 am #
  
  The implementation is based on the description in this book if you’d like to know more:
  https://amzn.to/3en9SPL
  
  Reply
Alex April 16, 2020 at 7:18 am #

I guess I’m not so much interested in why at the moment, but that in your implementation, which as far as I can tell is correct (it matches logically identically with a C/C++ I came up with), there is no way to update the weights coming off the bias.

This line [ neuron[‘delta’] = errors[j] * transfer_derivative(neuron[‘output’]) ] seems as if it will always result in neuron[‘delta’] == 0 for the bias. I noticed it my C++ implementation and when I went looking for answers, came across your post, and it looks like yours would result in 0 also.

So I’m more interested in if you found that to be the case. If so, it can escape detection because the network will still learn, just not as well or as fast, so with toy data this will not be noticed.

Reply
- Jason Brownlee April 16, 2020 at 1:20 pm #
  
  Interesting, thanks for sharing.
  
  I have not observed this issue, have you confirmed that indeed bias weights in the above implementation are unchanged after initialization?
  
  Reply
  - Alex June 4, 2020 at 3:47 am #
    
    Sorry for the long delay in answering, I got distracted, but remembered to come back to this. I did look at the weights and you are correct, they do update. However, there is still something missing (admittedly, most likely in my understanding). You create a bias node by adding 1 weight vector per layer (weights from bias to next level neurons), but I do not see anywhere where the bias node activation it is explicitly set to 1. It seems that the activation of the bias node is treated like all other nodes, and is free to change value (I printed them during training and they are never 1.0). So the weights are also updating because bias does not equal 1.
    
    If the bias activation is forced to stay at 1 (which seems correct for the algorithm), the weights from the bias cannot update because in the transfer derivative: 1 * (1-1) = 0. I did try that as well, and it shows the weights do not update. If I’m correct, this seems like a very subtle flaw, which would be undetectable in any simple learning problem because the network will learn and predict with or without a bias (I tried that too, and it does work either way).
    
    With that said, I’m still not convinced I’m correct. I might still be missing something, but after several hours in the code, I can’t find any way the bias activation holds to 1.0. When it isn’t 1.0, the bias weights will update because the transfer derivative is non-zero, but that violates the role of the bias node. When the bias activation is forced to 1.0 which is the correct value for the bias, the weights do not update because 1 * (1-1) = 0. So I’m still confused, but still open to the possibility I’m just not understanding something about the code.
    
    Reply
    - Jason Brownlee June 4, 2020 at 6:30 am #
      
      To understand the bias consider the forward and backward pass.
      
      For the forward pass, see the activate() function and notice that the bias activation (stored as the last element of the list) is added to the activation first. The same as 1*bias_weight.
      
      For the backward pass, the update_weights() function update the bias weight first, then the other weights.
      
      Perhaps re-read the text and code of the tutorial. This is all discussed.
      
      Reply
Maria Campero April 17, 2020 at 2:15 am #

Hi Jason,
First of all thank you very much for this post it is really helpful,
At the moment I’m writing a code for neural network with backpropagation in Phyton. I have 8 inputs and 7 outputs with one hidden layer(1neuron). I scaled the dataset then tryed to used your code but I have error of alignment. Can you please give me some advice to fix that issue

Reply
- Jason Brownlee April 17, 2020 at 6:24 am #
  
  This is an advanced tutorial and it sounds like you are having trouble.
  
  I recommend using the Keras API instead, it’s much simpler:
  https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
  
  Reply
sameer sakkhari April 25, 2020 at 1:16 am #

I have a dataset isolet in the form of csv file. I want to implement a Neural Network with backpropagation in python using tensorflow . How do I start ? How do I load my data?

You said to save the dataset in csv format in current working directory. But it is not able to recognize isolet

Reply
- Jason Brownlee April 25, 2020 at 7:00 am #
  
  This will show you how to load your data:
  https://machinelearningmastery.com/load-machine-learning-data-python/
  
  Reply
João Guilherme Cotta April 26, 2020 at 2:37 am #

Hello Jason,

Thanks for this tutorial, it is very helpful.

I would like to modify this code to use MLP with BP to predict the velocity of a car based on different inputs, such as velocity, acceleration, pedal position, etc. I am having some difficulty adapting the ‘expected’ part of the code, since in your example you are using only zeroes and ones, and my study case would have different values of velocity given by the dataset.

Do you have any advice regarding this?

Thanks in advance.

Reply
- Jason Brownlee April 26, 2020 at 6:16 am #
  
  Yes, as a beginner, I strongly recommend using Keras instead:
  https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
  
  Reply
Sid April 26, 2020 at 1:30 pm #

Hi, I used the updated dataset provided and copied the code exactly(copy/paste) to test. However, when I run the code I get the error “TypeError: ‘list’ object cannot be interpreted as an integer”. Do you know why this may be happening?

Reply
- Jason Brownlee April 27, 2020 at 5:26 am #
  
  Sorry to hear that you are having trouble, perhaps this will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Q.H.Chu May 1, 2020 at 12:24 pm #

Hi Jason, According to the post, Your Network is using only 1 Hidden layer (Maybe its called Shallow Feed forward NN) , is that hidden layer represent Logistic Regression step? And is there a way to add more hidden-layer, many hidden-layer will improve the accurate of netowk

And one more question, How can I choose a fit parameter (epoch, learning rate or hidden layer neurons number) for this network ? does it depends on output and input neurons?

Once again, thank you very much for posting this helpful post and looking for see your reply

Reply
- Jason Brownlee May 1, 2020 at 2:04 pm #
  
  Yes, one hidden layer. No not logistic regression.
  
  Tune the parameters of your model to your data.
  
  Reply
Ahmed Gad May 7, 2020 at 9:25 am #

Hi,

Please help with adapting the code to include Upper and Lower weights for each neuron without biases ==> Rough Neural Network.

Thanks!

Reply
- Jason Brownlee May 7, 2020 at 11:51 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/can-you-change-the-code-in-the-tutorial-to-___
  
  Reply
Ahmed Gad May 7, 2020 at 9:49 am #

The RNN structure replaces the traditional neuron by two neurons (lower neuron, upper neuron ) to represent lower and upper approximations of each attribute in the CTG data set, Its structure formed from 4 layers input, 2 hidden and output layers. The hidden layers have rough neurons which overlap and exchange information between each other, While the input and output layers consists of traditional neurons as in the figure(1):

This image illustrates the idea more: https://imgur.com/AZ0FTbY.

Reply
- Jason Brownlee May 7, 2020 at 11:51 am #
  
  Thanks for sharing.
  
  Reply
  - Ahmed Gad May 8, 2020 at 10:44 am #
    
    Please i need help where and how to customize the code!
    
    Reply
John Pillar May 10, 2020 at 4:43 pm #

HI Jason – thanks very much for a wonderfully clear and understandable description. I really appreciate your ‘gentle’ approach. I’ve re-coded your python into C – I find it’s the best way for me to really learn what’s going on.

Please – I have a couple of question – to apply softmax to the output – it’s easy enough to map the outputs using softmax so that they are ‘probabilities’ that sum to one, but – what changes do I need to make to the transfer function derivative in the backpropagation code. I’ve read several descriptions that say that backpropagation of the output layer errors after softmax follows exactly the same as sigmoid – so I’m confused. I think it should be different, but I may be missing something.

Also – cross-entropy loss is commonly described as a natural ‘partner’ to softmax, but actually, in practice, is the ‘error’ still (expected_value) – (predicted value) , just like you have in your code?

Thanks very much if you have time to consider my question – much appreciated.

Reply
- Jason Brownlee May 11, 2020 at 5:56 am #
  
  You’re welcome.
  
  Yes, softmax error would be the same, except calculated for each output node.
  
  Yes, you can see an implementation of cross entropy here:
  https://machinelearningmastery.com/cross-entropy-for-machine-learning/
  
  Reply
  - John May 18, 2020 at 6:36 am #
    
    Thanks Jason – appreciate your help.
    
    Reply
    - Jason Brownlee May 18, 2020 at 6:37 am #
      
      You’re welcome.
      
      Reply
Sandeep Kumar Dash May 29, 2020 at 9:41 pm #

Hi. Thanks for the great tutorial. How can I save the network diagram in file?

Reply
- Jason Brownlee May 30, 2020 at 5:59 am #
  
  THanks.
  
  Perhaps use Keras and see this tutorial:
  https://machinelearningmastery.com/visualize-deep-learning-neural-network-model-keras/
  
  Reply
Andirian Ahmad May 30, 2020 at 5:19 pm #

Hello sir, sorry for disturbance, may i ask, can we use other datasets instead of seeds dataset for this BPNN algorithm?

Reply
- Jason Brownlee May 31, 2020 at 6:20 am #
  
  Sure!
  
  Reply
Anon June 9, 2020 at 6:59 pm #

Hi Jason. Thank you so much for this tutorial. I would just like to know if this would work with the iris dataset?b

Reply
- Jason Brownlee June 10, 2020 at 6:10 am #
  
  Sure.
  
  Reply
Zach June 11, 2020 at 5:13 am #

Do you perhaps have a Java or C# version of this code? I’m trying to understand it in OOP principles and have done up to the end of prediction, but the last portion just confuses me

Reply
- Jason Brownlee June 11, 2020 at 6:06 am #
  
  Sorry I do not.
  
  Reply
Muhammad Basit Umair June 11, 2020 at 11:04 pm #

Sir kindly guide me about difference between “multilayer feed-forward network” and deep neural network (DNN).
Or can we say that, a multilayer feed-forward network is a deep neural network?

Thanks

Reply
- Jason Brownlee June 12, 2020 at 6:10 am #
  
  MLP can be made deep by adding many layers, so can a CNN, LSTM or any type of network.
  
  Reply
Nasimul June 28, 2020 at 8:05 pm #

I am getting this error, please help

File “F:/khaise/neuralnet.py”, line 80, in activate
activation += weights[i] * inputs[i]

TypeError: can’t multiply sequence by non-int of type ‘float’

Reply
- Jason Brownlee June 29, 2020 at 6:31 am #
  
  I’m sorry to hear that, this may help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
navid July 7, 2020 at 2:20 am #

Hello , i have question. this code only support 1 hidden layer?

Reply
- Jason Brownlee July 7, 2020 at 6:43 am #
  
  You can extend it to add more layers.
  
  Reply
DavidHE July 24, 2020 at 1:00 pm #

I’m trying to implement this tutorial in a language different than python, If during the training of the net the value of the varibale sum_error gets stuck or even goes up a little and down again, that means an error in the implementation?

Reply
- Jason Brownlee July 24, 2020 at 1:38 pm #
  
  Perhaps run the same code with the same initial weights and compare the output of each step?
  
  Reply
Shyam August 6, 2020 at 3:52 pm #

Hi,

How can I get a loss vs epoch graph for this code?

Reply
- Jason Brownlee August 7, 2020 at 6:22 am #
  
  Yes, you will have to implement it yourself though and use a train/test split instead of k-fold cross-validation.
  
  Reply
arun August 8, 2020 at 4:01 pm #

Hai In Initialize Network function hidden_layer variable store three random weight ,but we are given one hidden layer is used one weight for feed forward another one weight backward remaining one which propose used ??. similarly output_layer got three weight i did not under can you explain

Reply
- Jason Brownlee August 9, 2020 at 5:33 am #
  
  Sorry, I don’t follow. Can you please rephrase or elaborate on your question?
  
  Reply
arun August 8, 2020 at 8:09 pm #

Hi i need predicted = algorithm(train_set, test_set, *args) behind this line operation ?

Reply
- Jason Brownlee August 9, 2020 at 5:40 am #
  
  Sorry, what do you mean exactly?
  
  Reply
Nate August 14, 2020 at 9:38 am #

do you know how to extract the “deltas” for each input and synaptic weight?

Reply
- Jason Brownlee August 14, 2020 at 1:18 pm #
  
  Yes, the element added to each weight would be the deltas as you describe them.
  
  Reply
  - Nate August 15, 2020 at 12:54 am #
    
    But how would you extract them in the code? I want to print the weight before and after the delta was added. What part of the code would you modify?
    
    Reply
    - Jason Brownlee August 15, 2020 at 6:32 am #
      
      You could retrieve them from the part of the code that updates the model weights, in the backward_propagate_error function I guess.
      
      Reply
Niloo September 5, 2020 at 9:22 pm #

Hi
I am interested in machine learning. I read your code at this website and now I am willing to add some features so I have a question. How can we add more layer to this neural network or in the other words how can we make the number of hidden layers flexible? Could you please explain me or send me a link to learn more?
Thanks for your attention.

Reply
- Jason Brownlee September 6, 2020 at 6:04 am #
  
  Perhaps it would be easier for you to start with the Keras API here:
  https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
  
  Reply
Harsha September 25, 2020 at 8:04 am #

Oh Legend!!

You are extremely awesome – I can’t thank you enough.

As an aspiring ML engineer this is what all I needed. You will be remembered forever as the mentor who taught me ANN from scratch

Reply
- Jason Brownlee September 25, 2020 at 9:29 am #
  
  Thanks.
  
  Reply
Jon October 15, 2020 at 3:31 am #

I tried to convert this example to use ReLU by changing the transfer function to be:

def transfer(activation):
return 0.0 if activation <= 0.0 else activation

and the transfer_derivative to be:

def transfer_derivative(output):
return 0.0 if output <= 0.0 else 1.0

This seem to break the training system however and the error is never reduced.

Any thoughts?

Thanks for a great article anyway.

Reply
- Jason Brownlee October 15, 2020 at 6:16 am #
  
  Perhaps try cross entropy loss.
  Perhaps try changing the model architecture.
  Perhaps try changing the learning hyperparameters.
  
  Reply
  - Jon October 16, 2020 at 5:31 pm #
    
    Ok thanks Jason, sounds interesting, I’ll certainly take a look.
    
    Reply
Dinesh Kumar October 17, 2020 at 10:32 am #

Hi Jason,

Thanks for your help to understand the Back-props concepts with python. could you please help me how we will implement based computational graph

ex: https://i.imgur.com/0xUaxy6.png

Reply
- Jason Brownlee October 17, 2020 at 1:43 pm #
  
  I recommend using the Keras framework:
  https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
  
  Reply
JG October 31, 2020 at 10:19 pm #

Hi Jason,

I decided to follow this “old” tutorial, by the possibility of understand at low level the main AI’s functions such as network model definition (through a list of weights per neuron and layers on dictionary), how to activate manually the inputs and output of a neuron), network forward input propagation, but specially for me the core of the AI nets: the back error propagation), etc.

Finally I decided to jump into high level Api Keras model, to wrapper of these detailed functions into a more integrated ones such as Model/Sequential, with their methods of .fit, .evaluate, .predict, and tools such as to_categorical, etc.Besides sklearn libraries for normalization, kfold, onehotencoding, etc.

Of course I got better accuracies (97.5% as mean kfold) because I could used “relu”, activations functions, and output her layers types, etc…

So one more time, thanks for this tutorial to have the chance to understand better the motor that it is running below tensorflow and specially under Keras High level structures…

Reply
- Jason Brownlee November 1, 2020 at 7:30 am #
  
  Nice work!
  
  The tutorial really should be updated to use cross entropy and relu, e.g. modern ideas. I wrote this implementation like we used to in the 90s.
  
  Reply
JG October 31, 2020 at 10:52 pm #

More particularly the “backpropagation” ML concept, I rather prefer to cal it more intuitively and personal as “distribution of output error between all the weights / biases of neurons of all layers of the model” .

So “error’s distribution” between all errors contributors (model’s weights/biases) it is for me a better name and key idea than standard one of “backpropagation”…

Reply
- Jason Brownlee November 1, 2020 at 7:31 am #
  
  Agreed. That is the key learning from this tutorial!
  
  Reply
Lia Jusmai Theresia November 7, 2020 at 5:19 pm #

Hello, can you help me create a rainfall prediction code using the neural network in python?
I do not understand. I have monthly data spanning 10 years. The total data that I got was 120 data. How about the input, hidden, output layer? How do you get neron? and what parameters will be used? Thank you in advance

Reply
- Jason Brownlee November 8, 2020 at 6:38 am #
  
  I recommend starting with the tutorials here:
  https://machinelearningmastery.com/start-here/#deep_learning_time_series
  
  Reply
Arnav November 7, 2020 at 6:32 pm #

Can you fix my code please?
https://colab.research.google.com/drive/1Skfq3A1u7Mwdo72YBRWOm4x0SCp8mIFn?usp=sharing

Reply
- Jason Brownlee November 8, 2020 at 6:39 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
  
  Reply
Pedro H November 9, 2020 at 5:31 am #

Hi, is this code rigged for the 2:1:2 layout?

If yes can you point me some good articles to better understand the back forward prop?

Anyway, great work! It REALY helped me!

Reply
- Jason Brownlee November 9, 2020 at 6:16 am #
  
  Not really.
  
  You can adapt the architecture of the model directly.
  
  Reply
Chris Mahoney November 9, 2020 at 9:05 am #

Hi Jason,

I LOVED this article. It helped me immensely in learning about the intricacies of Neural Networks and Deep Learning in recently months. Thank you so much!

I note here that you do a node-by-node method of implementation. But there is also another method using matrix multiplication and linear algebra.

I’ve taken these concepts and processes, and written up a similar article. Except, I’ve used R and the matrix method. I’d love to know your thoughts on it:
https://towardsdatascience.com/vanilla-neural-networks-in-r-43b028f415?sk=f47b3d6f9f539e907d272966fa88bcb8

Thank you again for your assistance. It has helped me greatly!

Cheers,
Chris M

Reply
- Jason Brownlee November 9, 2020 at 1:15 pm #
  
  Thanks.
  
  Well done.
  
  Reply
Joey Hung November 22, 2020 at 3:26 am #

Hi Jason,

Thanks for your code.
However, when I was running it, it has below problem and I don’t know how to fix it. Could you help to fix it?

Traceback (most recent call last):
File “MDSHW3-2.py”, line 187, in
scores = evaluate_algorithm(df, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
File “MDSHW3-2.py”, line 59, in evaluate_algorithm
train_set.remove(fold)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Reply
- Jason Brownlee November 22, 2020 at 6:57 am #
  
  Sorry to hear that, these tips may help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Chiso Buso December 1, 2020 at 7:13 am #

Thank you very much, I would be glad if I can get python code for ANN-Using BP for a regression problem. Like the inputs of 10 parameters and outputs of the continuous value of 5 parameters.

Look forward to hearing from you.

Reply
- Jason Brownlee December 1, 2020 at 8:06 am #
  
  The above example can be adapted for your regression problem directly.
  
  Reply
Lokman Hakim December 5, 2020 at 2:49 am #

Can i know why we pass ‘None’ in the code “row = [1,0, None]” which is in the forward propagation phase.
Is it because it is related to bias?

Reply
- Jason Brownlee December 5, 2020 at 8:09 am #
  
  We mark the label as None.
  
  Reply
Sina Birecik December 21, 2020 at 9:17 am #

Hello Jason,
At first, many thanks for the tutorial, it has cleared a lot of things about DL in my mind.
I would like to make a contribution. It includes the code that allows your network to get deeper. People here can upgrade your code to multi hidden layers, each neuron number in each hidden layer can be adjusted. Just follow the instructions:
1) Copy the code in the link below and overwrite the whole “initialize_network” method.
https://pastebin.pl/view/fc96c453
2) Replace any “n_hidden” term with “hidden_list”.
3) At the end of the code, you can adjust hidden layers like the example below:
Example:
hidden_list = [5, 3, 7]
means there are 3 hidden layers, each hidden layer has 5, 3 and 7 neurons, respectively.

Best regards.

Reply
- Jason Brownlee December 21, 2020 at 1:53 pm #
  
  You’re welcome.
  
  Thanks for sharing!
  
  Reply
Frank December 22, 2020 at 12:01 pm #

Hello, I have a simple question, your data have 3 classes like 1-2-3 , for example if we have different number of classes and if they are string like YES NO , do we need to convert them to 0-1 or it does not matter . I checked the code for find the answer but i could not be able to find it. Please answer me .

Reply
- Jason Brownlee December 22, 2020 at 1:36 pm #
  
  Yes, class labels should be integer encoded then one hot encoded prior to modeling for neural networks.
  
  Perhaps start here:
  https://machinelearningmastery.com/one-hot-encoding-for-categorical-data/
  
  Reply
GreakBoy December 23, 2020 at 5:28 am #

Hı, ıt was a really good article , thank you for doing such a good work.
I have a question , in your csv data your all coloumns are integer and your classes are at the last coloum, numbers but what if
we have data csv like that and one class is Mercedes and the other is Porsche in this situation do we need to implement any extra code in your example code to convert, ı tried it still works but i am not sure about answer.

5001700134,2053150024,961776886,88349551,15594,793434083,Mercedes
4363829956,1773486023,8596657562,874662638,12190,763556063,Porsche

Reply
- Jason Brownlee December 23, 2020 at 5:37 am #
  
  Thanks.
  
  Perhaps scale the values prior to model and integer encode labels.
  
  Reply
Gloria January 22, 2021 at 8:38 pm #

Thanks for this useful article!
I want to change Sigmoid function into ReLU, so I modified the following 2 functions: (1) transfer(activation) and (2) transfer_derivative(output) as follows:

(1) return 1.0 / (1.0 + exp(-activation)) => return max(activation,0)
(2) return output * (1.0 – output) => return 1 if output>0 else 0

However, the network isn’t learning (accuracy ~33.3% for each fold, even when I train more epoches). Did I get something wrong? Thanks in advance!

Reply
- Jason Brownlee January 23, 2021 at 7:03 am #
  
  You’re welcome.
  
  Nice work!
  
  Perhaps check you didn’t change the activation function in the output layer!
  Perhaps change the loss to cross entropy?
  Perhaps change the architecture?
  Perhaps change the learning hyperparameters?
  
  Reply
  - Gloria January 25, 2021 at 7:18 pm #
    
    Hi Jason, thanks for the reply!
    I’ve checked the activations and weights, and found that it is the problem of ‘dying ReLU’. Some units (in this case the 3 output unit) always output 0 and cannot recover with further training.
    
    Reply
    - Jason Brownlee January 26, 2021 at 5:50 am #
      
      Try an alternate weight initialization, like “he”.
      
      Try scaling inputs to the range 0-1.
      
      Reply
Unnikrishnan February 11, 2021 at 1:28 pm #

Thanks Jason. Nice article.
Concepts of Backpropagation became clear now.

Reply
- Jason Brownlee February 12, 2021 at 5:43 am #
  
  Thank you, I’m happy it helps!
  
  Reply
AbdulAhad February 14, 2021 at 1:00 am #

Thanks for the great article. Still helping who want to know exactly what and how it happens from coding presepective.

Reply
- Jason Brownlee February 14, 2021 at 5:11 am #
  
  You’re welcome!
  
  Reply
David February 15, 2021 at 6:45 am #

Hi Jason,

Thanks a bunch for this. Please, can this code work in Python 3.x?

Thanks

Reply
- Jason Brownlee February 15, 2021 at 8:11 am #
  
  You’re welcome David!
  
  Yes, the code works with Python 3.
  
  Reply
Lukasz March 12, 2021 at 9:25 pm #

Hi Jason,

Thank you for great article! I run the code on my own dataset where I was predicting three label classes. The algorithm gave me a great results with the accuracy of the prediction above 97%. Right now I am trying to use the trained network to predict the results of the new dataset, where I would not provide any labels to calculate the accuracy. Do you have any recommendation for me? Thank you!

Reply
- Jason Brownlee March 13, 2021 at 5:31 am #
  
  You can remove the evaluation of the model, fit the model on all available data and call predict on new data.
  
  Reply
Gordon March 16, 2021 at 2:38 am #

Hi Jason, why do we copy the row and set row[-1] to None?
This is in the function evaluate_algorithm:

for row in fold:
row_copy = list(row)
test_set.append(row_copy)
row_copy[-1] = None

it seems like you could just do

for row in fold:
test_set.append(row)

Thanks for your help!

Reply
- Jason Brownlee March 16, 2021 at 4:51 am #
  
  So that the expected output value is not available to the model.
  
  Reply
  - vitor January 28, 2024 at 8:47 am #
    
    but why on inicialization we do:
    
    n_inputs = len(row) – 1
    
    then?
    
    Reply
    - James Carmichael January 29, 2024 at 7:05 am #
      
      Hi vitor…Please clarify the code portion you are referring to. Also, are you experiencing an error with the code provided? That will enable us to better guide you.
      
      Reply
mlhan March 27, 2021 at 2:58 am #

Do u have any idea if I want the user enters the input.how can I do it 🙁

Reply
- Jason Brownlee March 29, 2021 at 5:54 am #
  
  Yes, but this is a programming question, not a machine learning question.
  
  Perhaps you can develop a program/interface around your model.
  
  Reply
Ashwini April 16, 2021 at 5:18 am #

Hi,

I tried adding softmax activation function in the output layer for doing a multi-class classification.

# Forward propagate input to a network output
def forward_propagate(network, row):
inputs = row
#hidden layer
layer = network[0]
new_inputs = []
for neuron in layer:
activation = activate(neuron[‘weights’], inputs)
neuron[‘output’] = transfer(activation)
new_inputs.append(neuron[‘output’])
inputs = new_inputs
#output layer
layer = network[1]
new_inputs = []
for neuron in layer:
activation = activate(neuron[‘weights’], inputs)
neuron[‘output’] = softmax(activation)
new_inputs.append(neuron[‘output’])
inputs = new_inputs
return inputs

But when I’m running this the accuracy of the model is falling drastically from 77% to 35%.
Can you please suggest me why this is happening or any other additional changes should i do to maintain the accuracy

Reply
- Jason Brownlee April 16, 2021 at 5:34 am #
  
  This may help you with understanding softmax:
  https://machinelearningmastery.com/softmax-activation-function-with-python/
  
  Reply
redouane kassa April 21, 2021 at 11:34 pm #

Hello,
The derivative of sigmoid function should be f'(x)=f(x)*(1-f(x)) and not f'(x)=x(1-x). am I right?
Thank you

Reply
- Jason Brownlee April 22, 2021 at 5:41 am #
  
  Yes, that is what we use. output is not x, output is f(x), e.g. f(x)=output.
  
  Reply
James May 22, 2021 at 11:31 am #

Scores: [90.47619047619048, 92.85714285714286, 97.61904761904762, 92.85714285714286, 92.85714285714286]
Mean Accuracy: 93.333%

Reply
- Jason Brownlee May 23, 2021 at 5:22 am #
  
  Well done.
  
  Reply
winter May 27, 2021 at 11:53 pm #

Hello , Jason
Really thank you for your effort and nice resource.

can I get some idea to visualize error, and validation like keras ?

It would be a great help if I get some ideas.

Reply
- Jason Brownlee May 28, 2021 at 6:48 am #
  
  You’re welcome.
  
  Yes, you can create learning curves if you wish, perhaps use matplotlib to create the plots.
  
  Reply
sudip May 30, 2021 at 11:12 pm #

Hello Jason,
Really thank you for the nice tutorial. I am using your tutorial to train my time series data where train values are almost similar for all the classes. Your tutorial gives pretty nice result. i am trying to visualize error and accuracy like keras but could’s figure out.
By chance is there any tutorial or sources so that I could visualize my training informatin?

Reply
- Jason Brownlee May 31, 2021 at 5:49 am #
  
  You’re welcome.
  
  Perhaps you can plot the expected time series values as a line plot and plot the predicted values for the same time period on the same plot to provide a visual comparison of the values.
  
  Reply

sudip May 31, 2021 at 5:09 pm #

Hi Jason,,
I tried to follow your tutorial to visualize error,
Whenever I tried to plot it say that ‘ expected is not define’
In my understanding ‘expected’ is defined in a function where we used backpropagate.

Can I get some idea to solve this problem and plot error line?

from matplotlib import pyplot
from sklearn.metrics import mean_squared_error
# calculate errors

errors = list()
for i in range(len(expected)):
	# calculate error
	err = (expected[i] - predicted[i])**2
	# store error
	errors.append(err)
	# report error
	print('>%.1f, %.1f = %.3f' % (expected[i], predicted[i], err))
# plot errors
pyplot.plot(errors)
pyplot.xticks(ticks=[i for i in range(len(errors))], labels=predicted)
pyplot.xlabel('Predicted Value')
pyplot.ylabel('Mean Squared Error')
pyplot.show()

from matplotlib import pyplot

from sklearn.metrics import mean_squared_error

# calculate errors

errors = list()

for i in range(len(expected)):

# calculate error

err = (expected[i] - predicted[i])**2

# store error

errors.append(err)

# report error

print('>%.1f, %.1f = %.3f' % (expected[i], predicted[i], err))

# plot errors

pyplot.plot(errors)

pyplot.xticks(ticks=[i for i in range(len(errors))], labels=predicted)

pyplot.xlabel('Predicted Value')

pyplot.ylabel('Mean Squared Error')

pyplot.show()

Jason Brownlee June 1, 2021 at 5:29 am #

You must make predictions before you can plot them and their error.

Reply

sacin June 19, 2021 at 4:44 am #

Had a doubt,

in section 3.2 shouldn’t, error = (expected – output) * transfer_derivative(output)
be,
error = (output – expected) * transfer_derivative(output)

was thinking if this is flipped then the weights in the hidden layers might increase instead of decrease and vice versa.

was referring to https://coursera.org/share/3046ebc8c09a4bf792b4a00848f23c6c by andrew NG.

Reply
- Jason Brownlee June 19, 2021 at 5:56 am #
  
  It can be, it just changes the direction/sign.
  
  Reply
SUDIP LAUDARI June 20, 2021 at 10:57 pm #

HI Jason, still couldn’t plot smooth graph like keras does. It would be a great help if we can get an example from you.

Thanks

Reply
- Jason Brownlee June 21, 2021 at 5:38 am #
  
  You must adapt the example to not use cross-validation, but a train/test split instead.
  
  Then evaluate the model’s performance on a training set and validation set each epoch (iteration).
  
  Sorry, I don’t have the capacity to prepare an example for you.
  
  Reply
AIbird August 30, 2021 at 10:57 pm #

Hi Jason,
I am using your code for my project. Looks great. It was working perfectly before. I changed my datasets format where values look almost similar and they are in the the range of 0.05 to 0.12. Training CSV which contains all the values have around 300 rows and 120 columns.

Now my error doesn’t converge to near 0 . It is always like 89 or 91.
I plotted all n_folds accuracy with epoch, looks so fluctuation in accuracies.

Is there any idea or suggestion to make error near to 0 so that I can expect high mean accuracy?

Any idea or suggestion would be really appreciated.

Thank you

Reply
- Adrian Tam September 1, 2021 at 8:13 am #
  
  Would you try to use a scaler at preprocessing stage?
  
  Reply
AIbird September 1, 2021 at 10:46 am #

Hello Adrian
Thank you for your reply,

I tried by doing this technique:

from sklearn.preprocessing import MinMaxScaler

……………………….
scalar = MinMaxScaler()
normalized = scalar.fit_transform(dataset)

Do you mean by this?

Thank you.

Reply
- Adrian Tam September 1, 2021 at 11:27 am #
  
  Yes.
  
  Reply
  - AIbird September 6, 2021 at 8:24 pm #
    
    Finally solved the problem. I had issue in my code.
    
    I have one more question. The mean accuracy is for training set right?
    How can I calculate the accuracy in prediction?
    Can I get some help?
    
    Reply
    - Adrian Tam September 7, 2021 at 6:13 am #
      
      The accuracy_metric() function is to do this.
      
      Reply
Filip September 19, 2021 at 2:45 am #

This was grat thank you!

Reply
- Adrian Tam September 19, 2021 at 6:10 am #
  
  You’re welcomed. Glad you like it.
  
  Reply
Franck September 22, 2021 at 8:32 am #

New to the field : this is exactly what I was looking for! Jason, thanks for this post!

I tried to follow it step-by-step and ended-up with 2 questions.

Question 1 : updating model doesn’t seem to be so costly, is this because this is a toy program? Does tensorflow (for instance) do more tricky stuffs that make model update costly? On this toy example, this is not easy to understand why model update would be costly.

If I get it correctly, here https://machinelearningmastery.com/gentle-introduction-mini-batch-gradient-descent-configure-batch-size/ you say that mini batch is a tradeoff between SGD and Batch GD and that mini batch is more efficient because model update is done only after it has been evaluated (back propagated). My implementation of mini batch on this toy example would be

~/machinelearningmastery> git diff diff --git a/wheat_seeds.py b/wheat_seeds.py index 7b78b89..9e7bcd0 100644 --- a/wheat_seeds.py +++ b/wheat_seeds.py @@ -141,15 +141,22 @@ def update_weights(network, row, l_rate): neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j] neuron['weights'][-1] += l_rate * neuron['delta']
+def make_batch(iterable, batch_size=1): + n = len(iterable) + for i in range(0, n, batch_size): + yield iterable[i:min(i + batch_size, n)] + # Train a network for a fixed number of epochs def train_network(network, train, l_rate, n_epoch, n_outputs): for epoch in range(n_epoch): - for row in train: - outputs = forward_propagate(network, row) - expected = [0 for i in range(n_outputs)] - expected[row[-1]] = 1 - backward_propagate_error(network, expected) - update_weights(network, row, l_rate) + for batch in make_batch(train, batch_size=32): + for row in batch: # First backpropagate + outputs = forward_propagate(network, row) + expected = [0 for i in range(n_outputs)] + expected[row[-1]] = 1 + backward_propagate_error(network, expected) + for row in batch: # Then update model + update_weights(network, row, l_rate)

Looks like the exact same cost, no ? Did I miss something ?

Question 2: how, when, where the loss function is supposed to be computed in this toy example? For me sum_error (in the first version of train_network, only used for printing error but not for computation / gradient descent) is useless and is why it disappeared in the final version of train_network.

For classification, I expected cross entropy to be computed as error for output layer this way

machinelearningmastery> git diff diff --git a/wheat_seeds.py b/wheat_seeds.py index 7b78b89..0cb0444 100644 --- a/wheat_seeds.py +++ b/wheat_seeds.py @@ -3,7 +3,7 @@ from random import seed from random import randrange from random import random from csv import reader -from math import exp +from math import exp, log
# Load a CSV file def load_csv(filename): @@ -111,6 +111,9 @@ def forward_propagate(network, row): def transfer_derivative(output): return output * (1.0 - output)
+def cross_entropy(p, q, eps=1e-15): + return -sum([p[i]*log(q[i]+eps) for i in range(len(p))]) + # Backpropagate error and store in neurons def backward_propagate_error(network, expected): for i in reversed(range(len(network))): @@ -123,9 +126,8 @@ def backward_propagate_error(network, expected): error += (neuron['weights'][j] * neuron['delta']) errors.append(error) else: - for j in range(len(layer)): - neuron = layer[j] - errors.append(expected[j] - neuron['output']) + output = [neuron['output'] for neuron in layer] + errors.append(cross_entropy(expected, output)) for j in range(len(layer)): neuron = layer[j] neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])
… But the code breaks and I am not sure to get why?!…

Franck

Reply
- Adrian Tam September 23, 2021 at 3:53 am #
  
  It is too long for me to read it at once but let me answer the first question here. The training is costly because (1) there are many perceptrons to update and (2) there are many data to evaluate. If you consider the simplest gradient descent algorithm, your metric is the MSE function, which involves the entire dataset. If we have M perceptrons and N dataset, there are M weights to train (or more if there are bias terms) and the total number of gradients you need to compute is MxN in each iteration. If your toy example is small in both M and N, you will not notice that is a problem.
  
  Reply
Franck October 3, 2021 at 6:37 pm #

Adrian, thanks for the answer: make sense!

I tried to use cross entropy as loss function this way :

diff --git a/wheat_seeds.py b/wheat_seeds.py index 7b78b89..96617d7 100644 --- a/wheat_seeds.py +++ b/wheat_seeds.py @@ -111,6 +112,9 @@ def forward_propagate(network, row): def transfer_derivative(output): return output * (1.0 - output)
+def cross_entropy(p, q, eps=1e-15): + return -sum([p[i]*log2(q[i]+eps) for i in range(len(p))]) + # Backpropagate error and store in neurons def backward_propagate_error(network, expected): for i in reversed(range(len(network))): @@ -124,8 +128,9 @@ def backward_propagate_error(network, expected): errors.append(error) else: for j in range(len(layer)): - neuron = layer[j] - errors.append(expected[j] - neuron['output']) + neuron_onehot = [0. for neuron in layer] + neuron_onehot[j] = layer[j]['output'] + errors.append(cross_entropy(expected, neuron_onehot)) for j in range(len(layer)): neuron = layer[j] neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])

But classification results are bad : I plotted errors (can’t attach png) and I guess it’s because I’am victim of “vanishing gradient”. If you have any clue or advice, I would be glad to know 😀

Franck

Reply
- Adrian Tam October 6, 2021 at 8:02 am #
  
  Maybe try with a different activation function. People found that it is the key to mitigate vanishing gradient, but not always works. However, the example here is not deep. The issue of vanishing gradient should not pronounce.
  
  Reply
Franck October 21, 2021 at 11:05 pm #

Got it to work with a different activation function.

At this point, I feel like there is a bug in the code from the post : backprop should start from output with ds/ds = 1 (as far as I understood it with s = score) which is not the case unless I am wrong.

Am I wrong ?

Reply
- Adrian Tam October 22, 2021 at 4:13 am #
  
  I understand why you’re thinking like that but that’s meaningless because ds/ds is always 1. We are looking for more interesting subjects such as ds/dw. After all, you can’t change the score. You can only change the weights in the neural network. Hence we prefer to start with ds/dw
  
  Reply
SLC October 24, 2021 at 4:46 am #

Thank you for your code Mr. Jason. However, I am not understanding when you are going to predict the class using the trained network, why and from where are you giving the weights?
Thanks in advance.

Reply
- Adrian Tam October 27, 2021 at 1:44 am #
  
  Weights are in the neural networks. Every neuron is a function y=f(Wx) where x is the input (usually expressed as a vector of many values) and y is a single value (i.e., scalar). The W is the weight and it is the key thing we need to find during training.
  
  Reply
SD November 8, 2021 at 2:37 pm #

Hello everyone ,

I am training a huge dataset which have more than 90 features.

Can any one give me some idea to add PCA in the existing code?

So that I could reduce my the dimension of my data and use only main features while training.

Any help or suggestion would be really appreciated.

Thanks.

Reply
- Adrian Tam November 14, 2021 at 12:08 pm #
  
  90 features is not too much, but if you want to use PCA, you can check out this post on dimensionality reduction:
  https://machinelearningmastery.com/principal-components-analysis-for-dimensionality-reduction-in-python/
  
  Reply
Dyah wardani November 22, 2021 at 3:22 am #

I have a question about the function str_column_to_int(). Why the outputs from 1, 2, 3 change into 2, 0, 1 after use that function?

Reply
- Adrian Tam November 23, 2021 at 1:19 pm #
  
  I think in this case, your “1”, “2”, “3” are strings and 2, 0, 1 are integers. That’s the result of encoding strings into integers.
  
  Reply
Logan January 11, 2022 at 3:46 am #

Here’s just a small correction (I’m sorry for being particular.):

In the “3.1. Transfer Derivative” section, you’ve written
“Given an output value from a neuron, we need to calculate it’s slope.”

It wouldn’t make sense to say
“Given an output value from a neuron, we need to calculate it is slope.”
(“it is” instead of “it’s”)

Therefore, it should be
“Given an output value from a neuron, we need to calculate its slope.”
(“its” instead of “it’s”)

Reply
- James Carmichael January 11, 2022 at 8:40 am #
  
  Thank you for the feedback, Logan!
  
  Reply
Durga January 13, 2022 at 8:32 pm #

Hi, Mr. Jason thanks for your great tutorial. Can you make me clear about if and else condition in backpropagation error calculation? (if possible please explain these codes in detail)

# Backpropagate error and store in neurons
def backward_propagate_error(network, expected):
for i in reversed(range(len(network))):
layer = network[i]
errors = list()
#— (2) Error computed for the hidden layers: error = (weight_k * error_j) * transfer_derivative(output)
if i != len(network)-1:
for j in range(len(layer)):
error = 0.0
#— (A) error = Sum(delta * weight linked to this delta)
# for each neuron[LAYER N+1] linked to this neuron[LAYER N] (current layer)
for neuron in network[i + 1]:
error += (neuron[‘weights’][j] * neuron[‘delta’])
errors.append(error)
#— (1) Error computed for the last layer: error = (expected – output) * transfer_derivative(output)
else:
#— (A) Store the difference between expected and output for each output neuron in errors[]
for j in range(len(layer)):
neuron = layer[j]
errors.append(expected[j] – neuron[‘output’])
# — (B) Store the error signal in delta for each neuron
for j in range(len(layer)):
neuron = layer[j]
neuron[‘delta’] = errors[j] * transfer_derivative(neuron[‘output’])

# test backpropagation of error
network = [[{‘output’: 0.7105668883115941, ‘weights’: [0.13436424411240122, 0.8474337369372327, 0.763774618976614]}],
[{‘output’: 0.6213859615555266, ‘weights’: [0.2550690257394217, 0.49543508709194095]}, {‘output’: 0.6573693455986976, ‘weights’: [0.4494910647887381, 0.651592972722763]}]]
expected = [0, 1]
backward_propagate_error(network, expected)
for layer in network:
print(layer)

Reply
- James Carmichael February 21, 2022 at 2:19 pm #
  
  Hi Durga…Please narrow the content of your post down to a single question/comment so that I may better assist you.
  
  Reply
Rudra Sonkusare January 27, 2022 at 7:49 pm #

Is there any possible way to give a string for input? The string I am trying to give as input is not a meaningful word, for example string = “zgg7AiPkY37Yvne” and I want to give two of these strings as input to the neural network, any idea how this can be achieved? The current method I use is to convert each character into its decimal code then normalize it in range 0, 1 and thus convert in into a vector of floats.

Reply
- James Carmichael January 28, 2022 at 10:38 am #
  
  Hi Rudra…You may find the following of interest:
  
  https://machinelearningmastery.com/develop-n-gram-multichannel-convolutional-neural-network-sentiment-analysis/
  
  Reply
Andrii February 13, 2022 at 8:51 am #

Hello!
I’ve got an issue recently. I’ve implemented back propagation using your approach in C++, however epoch loss doesn’t go done. It may go done with smaller learning rate and bigger number of epochs, but at some point loss goes up to some value again. What can be a potential issue to it? I’ve checked that forward pass and backward pass both work fine.

Reply
- James Carmichael February 13, 2022 at 12:58 pm #
  
  Hi Andrii…While I cannot speak to the C++ implementation, I would recommend the following to move forward with improving your model performance:
  
  https://machinelearningmastery.com/better-deep-learning-neural-networks-crash-course/
  
  Reply
rebot333 February 17, 2022 at 2:02 pm #

Thank you so much this is a great lesson

Reply
- James Carmichael February 18, 2022 at 12:55 pm #
  
  You are very welcome! Thank you for the support!
  
  Reply
CEN April 3, 2022 at 7:04 pm #

hello, Mr. James Carmichael I used the code you created, and it was very useful. can you help me by providing a forecasting plot code for Backpropagation Algorithm with bipolar sigmoid?

Reply
- James Carmichael April 4, 2022 at 8:59 am #
  
  Hi CEN…Thank you for the feedback! The following resource will be a tremendous help regarding backpropagation.
  
  https://machinelearningmastery.com/implement-backpropagation-algorithm-scratch-python/
  
  Reply
Shraddha April 4, 2022 at 8:03 pm #

Thank you, James. The codes were very useful.
I tried to implement the above codes in my system. It worked as expected.
I was modifying the above code for the MNIST dataset by increasing the number of layers in the existing code.

# Initialize a network
def initialize_network(n_inputs, n_hidden, n_outputs):
network = list()
hidden_layer = [{‘weights’:[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
network.append(hidden_layer)
output_layer = [{‘weights’:[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
network.append(output_layer)
return network

So, can I write this function as

# Initialize a network
def initialize_network(n_inputs, n_hidden, n_outputs):
network = list()
hidden_layer1 = [{‘weights’:[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
network.append(hidden_layer1)
hidden_layer2 = [{‘weights’:[random() for i in range(n_hidden + 1)]} for i in range(n_hidden)]
network.append(hidden_layer2)
output_layer = [{‘weights’:[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
network.append(output_layer)
return network

Is this the correct way to do it?

And one more question, what about the weight update function, do I need to make their changes also?

Reply
- James Carmichael April 5, 2022 at 7:03 am #
  
  Hi Shraddha…Although I have not executed your code listing, I see no apparent deficiencies. Please let us know what you are specifically trying to accomplish with your code modifications so that we can better assist you.
  
  Reply
Shraddha Naik April 5, 2022 at 8:00 pm #

Thank you, James Sir. The above code is very useful.

I tried to implement the above codes in my system. It worked as expected.
But, when I changed the dataset to MNIST, I am getting only 10% accuracy after 1000 epochs. After using mini-batch SGD.

Kindly help me with this. Thank You

Reply
- James Carmichael April 6, 2022 at 8:40 am #
  
  Hi Shraddha…You may find the following helpful:
  
  https://machinelearningmastery.com/optimization-for-machine-learning-crash-course/
  
  Reply
Shraddha Naik April 5, 2022 at 8:09 pm #

def train_network(network, train, l_rate, n_epoch, n_outputs, kval):
for epoch in range(n_epoch):
import random
temp = random.choices(train,k=kval)
for row in temp:
outputs = forward_propagate(network, row)
expected = [0 for i in range(n_outputs)]
expected[row[-1]] = 1
backward_propagate_error(network, expected)
update_weights(network, row, l_rate)

Reply
- James Carmichael April 6, 2022 at 8:42 am #
  
  Thank you for the feedback Shraddha!
  
  Reply
Rahul May 14, 2022 at 5:22 pm #

Hello James,
I am working on a project with 9 types of variables and 1 output data. I want to use ANN to get weightage for each variables. I have tried this but I got only an Error in output and the expected data did not find equivalent weights for each individual.
Pl, help me.

Reply
- James Carmichael May 15, 2022 at 10:57 am #
  
  Hi Rahul…Please provide more detail of the nature of the error or errors so that we may better assist you.
  
  Reply
nicolas May 18, 2022 at 10:12 pm #

Hello James,
I am working on a project with 20 entries. But when I change the data packet it gives me errors in the code.
Please help me!
the data that i tried to put
[[1539.64, 1006.43, 1549539.885],
[1537.79, 1004.97, 1545432.816],
[1535.63, 1003.84, 1541526.819],
[1533.79, 1002.87645, 1538201.87],
[1531.65, 1001.80229, 1534410.477],
[1530.26316, 1000.99, 1531778.121],
[1528.75778, 1000.46, 1529461.009],
[1527.07, 999.89813, 1526914.437],
[1525.76684, 999.40577, 1524860.184],
[1524.24165, 999.11715, 1522895.973],
[1523.03339, 998.80306, 1521210.41],
[1521.88455, 998.56537, 1519701.209],
[1520.41, 998.26825, 1517777.03],
[1519.46802, 998.13243, 1516630.307],
[1518.08149, 997.87776, 1514859.757],
[1516.89304, 997.7, 1513404.186],
[1515.94228, 997.6, 1512304.019],
[1514.99151, 997.48, 1511173.731],
[1514.15959, 997.32, 1510101.642],
[1513.24844, 997.1, 1508860.02],
[1512.32, 996.97637, 1507747.304]]

Reply
- James Carmichael May 19, 2022 at 6:24 am #
  
  Hi Nicolas…Please specify what errors you are encountering so that we may better assist you.
  
  Reply
nicolas May 19, 2022 at 8:31 am #

Hi! James
Thank you for your time
This is the code that I used
And I changed the data but I’m having some errors in it
Could you please help me?

___________________________________________________________________

from math import exp
from random import seed
from random import random

# Initialize a network
def initialize_network(n_inputs, n_hidden, n_outputs):
network = list()
hidden_layer = [{‘weights’: [random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
network.append(hidden_layer)
output_layer = [{‘weights’: [random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
network.append(output_layer)
return network

# Calculate neuron activation for an input
def activate(weights, inputs):
activation = weights[-1]
for i in range(len(weights) – 1):
activation += weights[i] * inputs[i]
return activation

# Transfer neuron activation
def transfer(activation):
return 1.0 / (1.0 + exp(-activation))

# Forward propagate input to a network output
def forward_propagate(network, row):
inputs = row
for layer in network:
new_inputs = []
for neuron in layer:
activation = activate(neuron[‘weights’], inputs)
neuron[‘output’] = transfer(activation)
new_inputs.append(neuron[‘output’])
inputs = new_inputs
return inputs

# Calculate the derivative of an neuron output
def transfer_derivative(output):
return output * (1.0 – output)

# Backpropagate error and store in neurons
def backward_propagate_error(network, expected):
for i in reversed(range(len(network))):
layer = network[i]
errors = list()
if i != len(network) – 1:
for j in range(len(layer)):
error = 0.0
for neuron in network[i + 1]:
error += (neuron[‘weights’][j] * neuron[‘delta’])
errors.append(error)
else:
for j in range(len(layer)):
neuron = layer[j]
errors.append(neuron[‘output’] – expected[j])
for j in range(len(layer)):
neuron = layer[j]
neuron[‘delta’] = errors[j] * transfer_derivative(neuron[‘output’])

# Update network weights with error
def update_weights(network, row, l_rate):
for i in range(len(network)):
inputs = row[:-1]
if i != 0:
inputs = [neuron[‘output’] for neuron in network[i – 1]]
for neuron in network[i]:
for j in range(len(inputs)):
neuron[‘weights’][j] -= l_rate * neuron[‘delta’] * inputs[j]
neuron[‘weights’][-1] -= l_rate * neuron[‘delta’]

# Train a network for a fixed number of epochs
def train_network(network, train, l_rate, n_epoch, n_outputs):
for epoch in range(n_epoch):
sum_error = 0
for row in train:
outputs = forward_propagate(network, row)
expected = [0 for i in range(n_outputs)]
expected[row[-1]] = 1
sum_error += sum([(expected[i] – outputs[i]) ** 2 for i in range(len(expected))])
backward_propagate_error(network, expected)
update_weights(network, row, l_rate)
print(‘>epoch=%d, lrate=%.3f, error=%.3f’ % (epoch, l_rate, sum_error))

# Test training backprop algorithm
seed(1)
dataset = [[1539.64, 1006.43, 1549539.885],
[1537.79, 1004.97, 1545432.816],
[1535.63, 1003.84, 1541526.819],
[1533.79, 1002.87645, 1538201.87],
[1531.65, 1001.80229, 1534410.477],
[1530.26316, 1000.99, 1531778.121],
[1528.75778, 1000.46, 1529461.009],
[1527.07, 999.89813, 1526914.437],
[1525.76684, 999.40577, 1524860.184],
[1524.24165, 999.11715, 1522895.973],
[1523.03339, 998.80306, 1521210.41],
[1521.88455, 998.56537, 1519701.209],
[1520.41, 998.26825, 1517777.03],
[1519.46802, 998.13243, 1516630.307],
[1518.08149, 997.87776, 1514859.757],
[1516.89304, 997.7, 1513404.186],
[1515.94228, 997.6, 1512304.019],
[1514.99151, 997.48, 1511173.731],
[1514.15959, 997.32, 1510101.642],
[1513.24844, 997.1, 1508860.02],
[1512.32, 996.97637, 1507747.304]]
n_inputs = len(dataset[0]) – 1
n_outputs = len(set([row[-1] for row in dataset]))
network = initialize_network(n_inputs, 2, n_outputs)
train_network(network, dataset, 0.5, 20, n_outputs)
for layer in network:
print(layer)

—————————————————————————————————————————————–

Traceback (most recent call last):
File “C:\Users\Coder\Downloads\MLP_v1_1\1.py”, line 119, in
train_network(network, dataset, 0.5, 20, n_outputs)
File “C:\Users\Coder\Downloads\MLP_v1_1\1.py”, line 86, in train_network
expected[row[-1]] = 1
TypeError: list indices must be integers or slices, not float

Reply
- James Carmichael May 20, 2022 at 11:26 pm #
  
  Hi Nicolas…I do not see any issues from your code listing, however there could be formatting issues related to your code environment that are not readily apparent. Can you try the code in Google Colab?
  
  Reply
  - nicolas May 25, 2022 at 8:02 pm #
    
    James Thank you for everything
    
    Reply
nicolas May 19, 2022 at 10:47 pm #

Hi! James
Thank you for your time
This is the code that I used
And I changed the data but I’m having some errors in it
Could you please help me?

_______________________

from math import exp
from random import seed
from random import random

# Initialize a network
def initialize_network(n_inputs, n_hidden, n_outputs):
network = list()
hidden_layer = [{‘weights’: [random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
network.append(hidden_layer)
output_layer = [{‘weights’: [random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
network.append(output_layer)
return network

# Calculate neuron activation for an input
def activate(weights, inputs):
activation = weights[-1]
for i in range(len(weights) – 1):
activation += weights[i] * inputs[i]
return activation

# Transfer neuron activation
def transfer(activation):
return 1.0 / (1.0 + exp(-activation))

# Forward propagate input to a network output
def forward_propagate(network, row):
inputs = row
for layer in network:
new_inputs = []
for neuron in layer:
activation = activate(neuron[‘weights’], inputs)
neuron[‘output’] = transfer(activation)
new_inputs.append(neuron[‘output’])
inputs = new_inputs
return inputs

# Calculate the derivative of an neuron output
def transfer_derivative(output):
return output * (1.0 – output)

# Backpropagate error and store in neurons
def backward_propagate_error(network, expected):
for i in reversed(range(len(network))):
layer = network[i]
errors = list()
if i != len(network) – 1:
for j in range(len(layer)):
error = 0.0
for neuron in network[i + 1]:
error += (neuron[‘weights’][j] * neuron[‘delta’])
errors.append(error)
else:
for j in range(len(layer)):
neuron = layer[j]
errors.append(neuron[‘output’] – expected[j])
for j in range(len(layer)):
neuron = layer[j]
neuron[‘delta’] = errors[j] * transfer_derivative(neuron[‘output’])

# Update network weights with error
def update_weights(network, row, l_rate):
for i in range(len(network)):
inputs = row[:-1]
if i != 0:
inputs = [neuron[‘output’] for neuron in network[i – 1]]
for neuron in network[i]:
for j in range(len(inputs)):
neuron[‘weights’][j] -= l_rate * neuron[‘delta’] * inputs[j]
neuron[‘weights’][-1] -= l_rate * neuron[‘delta’]

# Train a network for a fixed number of epochs
def train_network(network, train, l_rate, n_epoch, n_outputs):
for epoch in range(n_epoch):
sum_error = 0
for row in train:
outputs = forward_propagate(network, row)
expected = [0 for i in range(n_outputs)]
expected[row[-1]] = 1
sum_error += sum([(expected[i] – outputs[i]) ** 2 for i in range(len(expected))])
backward_propagate_error(network, expected)
update_weights(network, row, l_rate)
print(‘>epoch=%d, lrate=%.3f, error=%.3f’ % (epoch, l_rate, sum_error))

# Test training backprop algorithm
seed(1)
dataset = [[1539.64, 1006.43, 1549539.885],
[1537.79, 1004.97, 1545432.816],
[1535.63, 1003.84, 1541526.819],
[1533.79, 1002.87645, 1538201.87],
[1531.65, 1001.80229, 1534410.477],
[1530.26316, 1000.99, 1531778.121],
[1528.75778, 1000.46, 1529461.009],
[1527.07, 999.89813, 1526914.437],
[1525.76684, 999.40577, 1524860.184],
[1524.24165, 999.11715, 1522895.973],
[1523.03339, 998.80306, 1521210.41],
[1521.88455, 998.56537, 1519701.209],
[1520.41, 998.26825, 1517777.03],
[1519.46802, 998.13243, 1516630.307],
[1518.08149, 997.87776, 1514859.757],
[1516.89304, 997.7, 1513404.186],
[1515.94228, 997.6, 1512304.019],
[1514.99151, 997.48, 1511173.731],
[1514.15959, 997.32, 1510101.642],
[1513.24844, 997.1, 1508860.02],
[1512.32, 996.97637, 1507747.304]]
n_inputs = len(dataset[0]) – 1
n_outputs = len(set([row[-1] for row in dataset]))
network = initialize_network(n_inputs, 2, n_outputs)
train_network(network, dataset, 0.5, 20, n_outputs)
for layer in network:
print(layer)

—————————————————————————————————————————————–

Traceback (most recent call last):
File “C:\Users\Coder\Downloads\MLP_v1_1\1.py”, line 119, in
train_network(network, dataset, 0.5, 20, n_outputs)
File “C:\Users\Coder\Downloads\MLP_v1_1\1.py”, line 86, in train_network
expected[row[-1]] = 1
TypeError: list indices must be integers or slices, not float

Reply
wafiq June 1, 2022 at 5:12 pm #

any code for backpropagation regression? i cant found better than on this page

Reply
- Adrian Tam June 1, 2022 at 11:30 pm #
  
  Maybe you can take a look at: https://machinelearningmastery.com/application-of-differentiations-in-neural-networks/
  For regression, what you need is the activation at last layer is a linear function f(x)=x, so the differentiation f'(x)=1. Just make this change to the code (either from this post or the code from the link above) and everything else should be just the same.
  
  Reply
NOOR AMIRAH June 2, 2022 at 10:29 am #

Can I have the train data that uses conjugate gradient method (fletcher-reeves)?

Reply
Noor Amirah June 2, 2022 at 10:30 am #

Hi, can I have the train data that uses conjugate gradient method (fletcher-reeves)?

Reply
Noor Amirah June 2, 2022 at 10:32 am #

I meant did you did you have coding for train dataset that uses fletcher-reeves method?

Reply
- James Carmichael June 3, 2022 at 9:14 am #
  
  Hi Noor…Did you try to implement the code listings that were provided in the tutorial?
  
  Reply
  - Noor Amirah June 8, 2022 at 2:26 am #
    
    Yes, i do but my project is about backpropagation with fletcher-reeves not stochastic..do you have the coding for that?
    
    Reply
Eduardo M August 2, 2022 at 6:57 am #

This line in the last for loop in the backpropagation function:

neuron[‘delta’] = errors[j] * transfer_derivative(neuron[‘output’])

I thought for the last layer (the first layer in the outer for loop), the delta is just actual – expected (what errors[j] is equal to). Isn’t the transfer_derivative term not supposed to be applied in this case ?

Reply
- James Carmichael August 2, 2022 at 9:02 am #
  
  Hi Eduardo…The following resource is another perspective that may help add clarity:
  
  https://pyimagesearch.com/2021/05/06/backpropagation-from-scratch-with-python/
  
  Reply
Confused Coder August 5, 2022 at 3:33 am #

Why do you add an extra weight to the hidden and output layers?

Reply
- James Carmichael August 5, 2022 at 9:36 am #
  
  Hello…Please specify the code listing portion you are referring to so that we may better assist you.
  
  Reply
Amir Vahedi August 8, 2022 at 6:45 pm #

Hi,
I have a question:
Why you haven’t used any python library such as NumPy and Pandas for this implementation?
Why haven’t some nested loops been simplified with the vectorization technique?
By doing these I bet the implementation would become more simple and also more efficient.
If you didn’t these things on purpose, I am eager to know your reasons.

Anyway, this post helped me a lot to understand the implementation behind the neural network, Thank you????

Reply
- James Carmichael August 9, 2022 at 10:08 am #
  
  Hi Amir…This tutorial and others on our site that are “from scratch” are meant to show you how to code in Python without the libraries so that you may gain understanding and appreciation for libraries such as NumPy and Pandas. After gaining this knowledge you may gain more confidence in utilizing available libraries as opposed to writing the code from scratch.
  
  Reply
willow September 21, 2022 at 6:33 pm #

Hi, please correct me if I am wrong, but in this example the conventional gradient descent algorithm is used and not stochastic gradient descent, since in the training loop for each training sample the weights are being updated.

Reply
- James Carmichael September 22, 2022 at 5:25 am #
  
  Hi Willow…You may find the following of interest:
  
  https://machinelearningmastery.com/difference-between-backpropagation-and-stochastic-gradient-descent/
  
  Reply
Efemena January 13, 2023 at 11:46 pm #

Hi James. Thanks for the tutorial and I’m more grateful for your responses. You are amazing am these years. I am currently working on predicting long term future electricity load demand as a project. I intend to use bpnn in carrying out this forcasting. How do I write a code to use only previous available data in predicting future load demand to say 7 years ahead.

Reply
Efemena January 13, 2023 at 11:48 pm #

The data available is monthly load demand and some economic factors. My interest is predicting load demand 7 yearsv into the future while having historical monthly load demand and factors as inputs to the neural network

Reply
- James Carmichael January 14, 2023 at 8:11 am #
  
  Hi Efemena…The following resource is a great starting point:
  
  https://machinelearningmastery.com/how-to-develop-lstm-models-for-multi-step-time-series-forecasting-of-household-power-consumption/
  
  Reply
Hapsoro March 21, 2023 at 10:33 am #

Hi James. Thanks for the tutorial, I’m so appreciate…but there is one thing that confuses me, can you show for the output of 3 neurons especially for the n_outputs part.

n_outputs = len([row[-1] for row in dataset])

and my dataset

[[0.21,0.34, 0.65,0, 0 ,1],
[0.55, 0.67, 0.19, 0, 1, 0],
[0.77, 0.20, 0.31, 1, 0, 0]]

Thanks before

Reply
- James Carmichael March 22, 2023 at 10:03 am #
  
  Hi Hapsoro…Thank you for feedback! Trying to understand your question. Did you execute your code? If so, what were your results?
  
  Reply
Kentaro March 29, 2023 at 7:38 am #

Please add indentation for things besides just the functions, as the lack of indentation makes the code very hard to read.

Reply
- James Carmichael March 30, 2023 at 7:10 am #
  
  Thank you for your feedback and suggestions Kentaro!
  
  Reply
Bhaskar September 21, 2023 at 7:42 pm #

Hi
I need perfect code for Feed Forward Neural Network In r programming
please help me

Reply
- James Carmichael September 22, 2023 at 9:20 am #
  
  Hi Bhaskar…The following resource is a great starting point:
  
  https://scientistcafe.com/ids/r/ch12dnn
  
  Reply
Jim October 27, 2023 at 4:21 am #

I get the following error:

derivative = output * (1 – output)
~~^~~~~~~~
TypeError: unsupported operand type(s) for -: ‘int’ and ‘list’

How do I make the parameters of the same type?

Thanks.

Reply
Jim October 27, 2023 at 4:41 am #

same with this code:

error = (output – expected) * transfer_derivative(output)

TypeError: unsupported operand type(s) for -: ‘list’ and ‘list’

Thanks.

Reply
- James Carmichael October 27, 2023 at 9:27 am #
  
  Hi Jim…The following discussion may be of interest to you:
  
  https://stackoverflow.com/questions/26685679/typeerror-unsupported-operand-types-for-list-and-list
  
  Reply
Michael Roy Ames October 29, 2023 at 10:54 am #

Thank you very much for this well written tutorial, Jason. I quite enjoyed figuring it all out, though it took we a couple of weeks to get up-to-speed on the terminology and make it happen.

After completing the basic assignment, I updated the code and tried:
a) different seeds, learning rates, and epochs
b) additional transfer functions: tanh, and gaussian
c) multiple hidden layers
d) multiple hidden layers of different sizes (numbers of neurons)

QUESTION:
One thing that got me stuck was the lack of a good visualization tool for viewing the network of layers as they are initialized and trained. I coded a primitive one to troubleshoot and improve my understanding, but there must be something better out there… any suggestions?

Now I am looking forward to reading more of your (many!) books – and learning as much as I can.

Thanks again.

Reply
- James Carmichael October 30, 2023 at 8:04 am #
  
  Hi Michael…You are very welcome! The following resource may be of interest to you:
  
  https://www.analyticsvidhya.com/blog/2022/03/visualize-deep-learning-models-using-visualkeras/
  
  Reply
givonz November 8, 2023 at 5:55 am #

You didn’t indent your code. This leaves room for lots of logic errors. Where exactly do your for loops end?

Reply
givonz November 8, 2023 at 5:58 am #

You should make a note that they need to toggle to code to get the properly formatted python code,.

Reply
- James Carmichael November 8, 2023 at 10:01 am #
  
  Thank you for your feedback givonz!
  
  Reply
Matthew December 1, 2023 at 12:20 pm #

Should the 500 epochs be repeating 5 times?

Reply
- James Carmichael December 2, 2023 at 11:32 am #
  
  Hi Matthew…The following resource may be of interest to you:
  
  https://machinelearningmastery.com/repeated-k-fold-cross-validation-with-python/
  
  Reply

Navigation

How to Code a Neural Network with Backpropagation In Python (from scratch)

Description

Backpropagation Algorithm

Wheat Seeds Dataset

Tutorial

1. Initialize Network

2. Forward Propagate

2.1. Neuron Activation

2.2. Neuron Transfer

2.3. Forward Propagation

3. Back Propagate Error

3.1. Transfer Derivative

3.2. Error Backpropagation

4. Train Network

4.1. Update Weights

4.2. Train Network

5. Predict

6. Wheat Seeds Dataset

Extensions

Review

Discover How to Code Algorithms From Scratch!

No Libraries, Just Python Code.

Finally, Pull Back the Curtain on
Machine Learning Algorithms

More On This Topic

837 Responses to How to Code a Neural Network with Backpropagation In Python (from scratch)

Leave a Reply Click here to cancel reply.

Navigation

Description

Backpropagation Algorithm

Wheat Seeds Dataset

Tutorial

1. Initialize Network

2. Forward Propagate

2.1. Neuron Activation

2.2. Neuron Transfer

2.3. Forward Propagation

3. Back Propagate Error

3.1. Transfer Derivative

3.2. Error Backpropagation

4. Train Network

4.1. Update Weights

4.2. Train Network

5. Predict

6. Wheat Seeds Dataset

Extensions

Review

Discover How to Code Algorithms From Scratch!

No Libraries, Just Python Code.

Finally, Pull Back the Curtain on Machine Learning Algorithms

More On This Topic

837 Responses to How to Code a Neural Network with Backpropagation In Python (from scratch)

Leave a Reply Click here to cancel reply.

Finally, Pull Back the Curtain on
Machine Learning Algorithms