How To Implement The Perceptron Algorithm From Scratch In Python

By Jason Brownlee on August 13, 2019 in Code Algorithms From Scratch 170

The Perceptron algorithm is the simplest type of artificial neural network.

It is a model of a single neuron that can be used for two-class classification problems and provides the foundation for later developing much larger networks.

In this tutorial, you will discover how to implement the Perceptron algorithm from scratch with Python.

After completing this tutorial, you will know:

How to train the network weights for the Perceptron.
How to make predictions with the Perceptron.
How to implement the Perceptron algorithm for a real-world classification problem.

Kick-start your project with my new book Machine Learning Algorithms From Scratch, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Update Jan/2017: Changed the calculation of fold_size in cross_validation_split() to always be an integer. Fixes issues with Python 3.
Update Aug/2018: Tested and updated to work with Python 3.6.

How To Implement The Perceptron Algorithm From Scratch In Python
Photo by Les Haines, some rights reserved.

Description

This section provides a brief introduction to the Perceptron algorithm and the Sonar dataset to which we will later apply it.

Perceptron Algorithm

The Perceptron is inspired by the information processing of a single neural cell called a neuron.

A neuron accepts input signals via its dendrites, which pass the electrical signal down to the cell body.

In a similar way, the Perceptron receives input signals from examples of training data that we weight and combined in a linear equation called the activation.

activation = sum(weight_i * x_i) + bias

1	activation = sum(weight_i * x_i) + bias

The activation is then transformed into an output value or prediction using a transfer function, such as the step transfer function.

prediction = 1.0 if activation >= 0.0 else 0.0

1	prediction = 1.0 if activation >= 0.0 else 0.0

In this way, the Perceptron is a classification algorithm for problems with two classes (0 and 1) where a linear equation (like or hyperplane) can be used to separate the two classes.

It is closely related to linear regression and logistic regression that make predictions in a similar way (e.g. a weighted sum of inputs).

The weights of the Perceptron algorithm must be estimated from your training data using stochastic gradient descent.

Stochastic Gradient Descent

Gradient Descent is the process of minimizing a function by following the gradients of the cost function.

This involves knowing the form of the cost as well as the derivative so that from a given point you know the gradient and can move in that direction, e.g. downhill towards the minimum value.

In machine learning, we can use a technique that evaluates and updates the weights every iteration called stochastic gradient descent to minimize the error of a model on our training data.

The way this optimization algorithm works is that each training instance is shown to the model one at a time. The model makes a prediction for a training instance, the error is calculated and the model is updated in order to reduce the error for the next prediction.

This procedure can be used to find the set of weights in a model that result in the smallest error for the model on the training data.

For the Perceptron algorithm, each iteration the weights (w) are updated using the equation:

w = w + learning_rate * (expected - predicted) * x

1	w = w + learning_rate * (expected - predicted) * x

Where w is weight being optimized, learning_rate is a learning rate that you must configure (e.g. 0.01), (expected – predicted) is the prediction error for the model on the training data attributed to the weight and x is the input value.

Sonar Dataset

The dataset we will use in this tutorial is the Sonar dataset.

This is a dataset that describes sonar chirp returns bouncing off different services. The 60 input variables are the strength of the returns at different angles. It is a binary classification problem that requires a model to differentiate rocks from metal cylinders.

It is a well-understood dataset. All of the variables are continuous and generally in the range of 0 to 1. As such we will not have to normalize the input data, which is often a good practice with the Perceptron algorithm. The output variable is a string “M” for mine and “R” for rock, which will need to be converted to integers 1 and 0.

By predicting the class with the most observations in the dataset (M or mines) the Zero Rule Algorithm can achieve an accuracy of 53%.

You can learn more about this dataset at the UCI Machine Learning repository. You can download the dataset for free and place it in your working directory with the filename sonar.all-data.csv.

Tutorial

This tutorial is broken down into 3 parts:

Making Predictions.
Training Network Weights.
Modeling the Sonar Dataset.

These steps will give you the foundation to implement and apply the Perceptron algorithm to your own classification predictive modeling problems.

1. Making Predictions

The first step is to develop a function that can make predictions.

This will be needed both in the evaluation of candidate weights values in stochastic gradient descent, and after the model is finalized and we wish to start making predictions on test data or new data.

Below is a function named predict() that predicts an output value for a row given a set of weights.

The first weight is always the bias as it is standalone and not responsible for a specific input value.

# Make a prediction with weights
def predict(row, weights):
	activation = weights[0]
	for i in range(len(row)-1):
		activation += weights[i + 1] * row[i]
	return 1.0 if activation >= 0.0 else 0.0

# Make a prediction with weights

def predict(row, weights):

activation = weights[0]

for i in range(len(row)-1):

activation += weights[i + 1] * row[i]

return 1.0 if activation >= 0.0 else 0.0

We can contrive a small dataset to test our prediction function.

X1			X2			Y
2.7810836		2.550537003		0
1.465489372		2.362125076		0
3.396561688		4.400293529		0
1.38807019		1.850220317		0
3.06407232		3.005305973		0
7.627531214		2.759262235		1
5.332441248		2.088626775		1
6.922596716		1.77106367		1
8.675418651		-0.242068655		1
7.673756466		3.508563011		1

X1 X2 Y

2.7810836 2.550537003 0

1.465489372 2.362125076 0

3.396561688 4.400293529 0

1.38807019 1.850220317 0

3.06407232 3.005305973 0

7.627531214 2.759262235 1

5.332441248 2.088626775 1

6.922596716 1.77106367 1

8.675418651 -0.242068655 1

7.673756466 3.508563011 1

We can also use previously prepared weights to make predictions for this dataset.

Putting this all together we can test our predict() function below.

# Make a prediction with weights
def predict(row, weights):
	activation = weights[0]
	for i in range(len(row)-1):
		activation += weights[i + 1] * row[i]
	return 1.0 if activation >= 0.0 else 0.0

# test predictions
dataset = [[2.7810836,2.550537003,0],
	[1.465489372,2.362125076,0],
	[3.396561688,4.400293529,0],
	[1.38807019,1.850220317,0],
	[3.06407232,3.005305973,0],
	[7.627531214,2.759262235,1],
	[5.332441248,2.088626775,1],
	[6.922596716,1.77106367,1],
	[8.675418651,-0.242068655,1],
	[7.673756466,3.508563011,1]]
weights = [-0.1, 0.20653640140000007, -0.23418117710000003]
for row in dataset:
	prediction = predict(row, weights)
	print("Expected=%d, Predicted=%d" % (row[-1], prediction))

# Make a prediction with weights

def predict(row, weights):

activation = weights[0]

for i in range(len(row)-1):

activation += weights[i + 1] * row[i]

return 1.0 if activation >= 0.0 else 0.0

# test predictions

dataset = [[2.7810836,2.550537003,0],

[1.465489372,2.362125076,0],

[3.396561688,4.400293529,0],

[1.38807019,1.850220317,0],

[3.06407232,3.005305973,0],

[7.627531214,2.759262235,1],

[5.332441248,2.088626775,1],

[6.922596716,1.77106367,1],

[8.675418651,-0.242068655,1],

[7.673756466,3.508563011,1]]

weights = [-0.1, 0.20653640140000007, -0.23418117710000003]

for row in dataset:

prediction = predict(row, weights)

print("Expected=%d, Predicted=%d" % (row[-1], prediction))

There are two inputs values (X1 and X2) and three weight values (bias, w1 and w2). The activation equation we have modeled for this problem is:

activation = (w1 * X1) + (w2 * X2) + bias

1	activation = (w1 * X1) + (w2 * X2) + bias

Or, with the specific weight values we chose by hand as:

activation = (0.206 * X1) + (-0.234 * X2) + -0.1

1	activation = (0.206 * X1) + (-0.234 * X2) + -0.1

Running this function we get predictions that match the expected output (y) values.

Expected=0, Predicted=0
Expected=0, Predicted=0
Expected=0, Predicted=0
Expected=0, Predicted=0
Expected=0, Predicted=0
Expected=1, Predicted=1
Expected=1, Predicted=1
Expected=1, Predicted=1
Expected=1, Predicted=1
Expected=1, Predicted=1

Expected=0, Predicted=0

Expected=1, Predicted=1

Now we are ready to implement stochastic gradient descent to optimize our weight values.

2. Training Network Weights

We can estimate the weight values for our training data using stochastic gradient descent.

Stochastic gradient descent requires two parameters:

Learning Rate: Used to limit the amount each weight is corrected each time it is updated.
Epochs: The number of times to run through the training data while updating the weight.

These, along with the training data will be the arguments to the function.

There are 3 loops we need to perform in the function:

Loop over each epoch.
Loop over each row in the training data for an epoch.
Loop over each weight and update it for a row in an epoch.

As you can see, we update each weight for each row in the training data, each epoch.

Weights are updated based on the error the model made. The error is calculated as the difference between the expected output value and the prediction made with the candidate weights.

There is one weight for each input attribute, and these are updated in a consistent way, for example:

w(t+1)= w(t) + learning_rate * (expected(t) - predicted(t)) * x(t)

1	w(t+1)= w(t) + learning_rate * (expected(t) - predicted(t)) * x(t)

The bias is updated in a similar way, except without an input as it is not associated with a specific input value:

bias(t+1) = bias(t) + learning_rate * (expected(t) - predicted(t))

1	bias(t+1) = bias(t) + learning_rate * (expected(t) - predicted(t))

Now we can put all of this together. Below is a function named train_weights() that calculates weight values for a training dataset using stochastic gradient descent.

# Estimate Perceptron weights using stochastic gradient descent
def train_weights(train, l_rate, n_epoch):
	weights = [0.0 for i in range(len(train[0]))]
	for epoch in range(n_epoch):
		sum_error = 0.0
		for row in train:
			prediction = predict(row, weights)
			error = row[-1] - prediction
			sum_error += error**2
			weights[0] = weights[0] + l_rate * error
			for i in range(len(row)-1):
				weights[i + 1] = weights[i + 1] + l_rate * error * row[i]
		print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error))
	return weights

# Estimate Perceptron weights using stochastic gradient descent

def train_weights(train, l_rate, n_epoch):

weights = [0.0 for i in range(len(train[0]))]

for epoch in range(n_epoch):

sum_error = 0.0

for row in train:

prediction = predict(row, weights)

error = row[-1] - prediction

sum_error += error**2

weights[0] = weights[0] + l_rate * error

for i in range(len(row)-1):

weights[i + 1] = weights[i + 1] + l_rate * error * row[i]

print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error))

return weights

You can see that we also keep track of the sum of the squared error (a positive value) each epoch so that we can print out a nice message each outer loop.

We can test this function on the same small contrived dataset from above.

# Make a prediction with weights
def predict(row, weights):
	activation = weights[0]
	for i in range(len(row)-1):
		activation += weights[i + 1] * row[i]
	return 1.0 if activation >= 0.0 else 0.0

# Estimate Perceptron weights using stochastic gradient descent
def train_weights(train, l_rate, n_epoch):
	weights = [0.0 for i in range(len(train[0]))]
	for epoch in range(n_epoch):
		sum_error = 0.0
		for row in train:
			prediction = predict(row, weights)
			error = row[-1] - prediction
			sum_error += error**2
			weights[0] = weights[0] + l_rate * error
			for i in range(len(row)-1):
				weights[i + 1] = weights[i + 1] + l_rate * error * row[i]
		print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error))
	return weights

# Calculate weights
dataset = [[2.7810836,2.550537003,0],
	[1.465489372,2.362125076,0],
	[3.396561688,4.400293529,0],
	[1.38807019,1.850220317,0],
	[3.06407232,3.005305973,0],
	[7.627531214,2.759262235,1],
	[5.332441248,2.088626775,1],
	[6.922596716,1.77106367,1],
	[8.675418651,-0.242068655,1],
	[7.673756466,3.508563011,1]]
l_rate = 0.1
n_epoch = 5
weights = train_weights(dataset, l_rate, n_epoch)
print(weights)

# Make a prediction with weights

def predict(row, weights):

activation = weights[0]

for i in range(len(row)-1):

activation += weights[i + 1] * row[i]

return 1.0 if activation >= 0.0 else 0.0

# Estimate Perceptron weights using stochastic gradient descent

def train_weights(train, l_rate, n_epoch):

weights = [0.0 for i in range(len(train[0]))]

for epoch in range(n_epoch):

sum_error = 0.0

for row in train:

prediction = predict(row, weights)

error = row[-1] - prediction

sum_error += error**2

weights[0] = weights[0] + l_rate * error

for i in range(len(row)-1):

weights[i + 1] = weights[i + 1] + l_rate * error * row[i]

print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error))

return weights

# Calculate weights

dataset = [[2.7810836,2.550537003,0],

[1.465489372,2.362125076,0],

[3.396561688,4.400293529,0],

[1.38807019,1.850220317,0],

[3.06407232,3.005305973,0],

[7.627531214,2.759262235,1],

[5.332441248,2.088626775,1],

[6.922596716,1.77106367,1],

[8.675418651,-0.242068655,1],

[7.673756466,3.508563011,1]]

l_rate = 0.1

n_epoch = 5

weights = train_weights(dataset, l_rate, n_epoch)

print(weights)

We use a learning rate of 0.1 and train the model for only 5 epochs, or 5 exposures of the weights to the entire training dataset.

Running the example prints a message each epoch with the sum squared error for that epoch and the final set of weights.

>epoch=0, lrate=0.100, error=2.000
>epoch=1, lrate=0.100, error=1.000
>epoch=2, lrate=0.100, error=0.000
>epoch=3, lrate=0.100, error=0.000
>epoch=4, lrate=0.100, error=0.000
[-0.1, 0.20653640140000007, -0.23418117710000003]

>epoch=0, lrate=0.100, error=2.000

>epoch=1, lrate=0.100, error=1.000

>epoch=2, lrate=0.100, error=0.000

>epoch=3, lrate=0.100, error=0.000

>epoch=4, lrate=0.100, error=0.000

[-0.1, 0.20653640140000007, -0.23418117710000003]

You can see how the problem is learned very quickly by the algorithm.

Now, let’s apply this algorithm on a real dataset.

3. Modeling the Sonar Dataset

In this section, we will train a Perceptron model using stochastic gradient descent on the Sonar dataset.

The example assumes that a CSV copy of the dataset is in the current working directory with the file name sonar.all-data.csv.

The dataset is first loaded, the string values converted to numeric and the output column is converted from strings to the integer values of 0 to 1. This is achieved with helper functions load_csv(), str_column_to_float() and str_column_to_int() to load and prepare the dataset.

We will use k-fold cross validation to estimate the performance of the learned model on unseen data. This means that we will construct and evaluate k models and estimate the performance as the mean model error. Classification accuracy will be used to evaluate each model. These behaviors are provided in the cross_validation_split(), accuracy_metric() and evaluate_algorithm() helper functions.

We will use the predict() and train_weights() functions created above to train the model and a new perceptron() function to tie them together.

Below is the complete example.

# Perceptron Algorithm on the Sonar Dataset
from random import seed
from random import randrange
from csv import reader

# Load a CSV file
def load_csv(filename):
	dataset = list()
	with open(filename, 'r') as file:
		csv_reader = reader(file)
		for row in csv_reader:
			if not row:
				continue
			dataset.append(row)
	return dataset

# Convert string column to float
def str_column_to_float(dataset, column):
	for row in dataset:
		row[column] = float(row[column].strip())

# Convert string column to integer
def str_column_to_int(dataset, column):
	class_values = [row[column] for row in dataset]
	unique = set(class_values)
	lookup = dict()
	for i, value in enumerate(unique):
		lookup[value] = i
	for row in dataset:
		row[column] = lookup[row[column]]
	return lookup

# Split a dataset into k folds
def cross_validation_split(dataset, n_folds):
	dataset_split = list()
	dataset_copy = list(dataset)
	fold_size = int(len(dataset) / n_folds)
	for i in range(n_folds):
		fold = list()
		while len(fold) < fold_size:
			index = randrange(len(dataset_copy))
			fold.append(dataset_copy.pop(index))
		dataset_split.append(fold)
	return dataset_split

# Calculate accuracy percentage
def accuracy_metric(actual, predicted):
	correct = 0
	for i in range(len(actual)):
		if actual[i] == predicted[i]:
			correct += 1
	return correct / float(len(actual)) * 100.0

# Evaluate an algorithm using a cross validation split
def evaluate_algorithm(dataset, algorithm, n_folds, *args):
	folds = cross_validation_split(dataset, n_folds)
	scores = list()
	for fold in folds:
		train_set = list(folds)
		train_set.remove(fold)
		train_set = sum(train_set, [])
		test_set = list()
		for row in fold:
			row_copy = list(row)
			test_set.append(row_copy)
			row_copy[-1] = None
		predicted = algorithm(train_set, test_set, *args)
		actual = [row[-1] for row in fold]
		accuracy = accuracy_metric(actual, predicted)
		scores.append(accuracy)
	return scores

# Make a prediction with weights
def predict(row, weights):
	activation = weights[0]
	for i in range(len(row)-1):
		activation += weights[i + 1] * row[i]
	return 1.0 if activation >= 0.0 else 0.0

# Estimate Perceptron weights using stochastic gradient descent
def train_weights(train, l_rate, n_epoch):
	weights = [0.0 for i in range(len(train[0]))]
	for epoch in range(n_epoch):
		for row in train:
			prediction = predict(row, weights)
			error = row[-1] - prediction
			weights[0] = weights[0] + l_rate * error
			for i in range(len(row)-1):
				weights[i + 1] = weights[i + 1] + l_rate * error * row[i]
	return weights

# Perceptron Algorithm With Stochastic Gradient Descent
def perceptron(train, test, l_rate, n_epoch):
	predictions = list()
	weights = train_weights(train, l_rate, n_epoch)
	for row in test:
		prediction = predict(row, weights)
		predictions.append(prediction)
	return(predictions)

# Test the Perceptron algorithm on the sonar dataset
seed(1)
# load and prepare data
filename = 'sonar.all-data.csv'
dataset = load_csv(filename)
for i in range(len(dataset[0])-1):
	str_column_to_float(dataset, i)
# convert string class to integers
str_column_to_int(dataset, len(dataset[0])-1)
# evaluate algorithm
n_folds = 3
l_rate = 0.01
n_epoch = 500
scores = evaluate_algorithm(dataset, perceptron, n_folds, l_rate, n_epoch)
print('Scores: %s' % scores)
print('Mean Accuracy: %.3f%%' % (sum(scores)/float(len(scores))))

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

# Perceptron Algorithm on the Sonar Dataset

from random import seed

from random import randrange

from csv import reader

# Load a CSV file

def load_csv(filename):

dataset = list()

with open(filename, 'r') as file:

csv_reader = reader(file)

for row in csv_reader:

if not row:

continue

dataset.append(row)

return dataset

# Convert string column to float

def str_column_to_float(dataset, column):

for row in dataset:

row[column] = float(row[column].strip())

# Convert string column to integer

def str_column_to_int(dataset, column):

class_values = [row[column] for row in dataset]

unique = set(class_values)

lookup = dict()

for i, value in enumerate(unique):

lookup[value] = i

for row in dataset:

row[column] = lookup[row[column]]

return lookup

# Split a dataset into k folds

def cross_validation_split(dataset, n_folds):

dataset_split = list()

dataset_copy = list(dataset)

fold_size = int(len(dataset) / n_folds)

for i in range(n_folds):

fold = list()

while len(fold) < fold_size:

index = randrange(len(dataset_copy))

fold.append(dataset_copy.pop(index))

dataset_split.append(fold)

return dataset_split

# Calculate accuracy percentage

def accuracy_metric(actual, predicted):

correct = 0

for i in range(len(actual)):

if actual[i] == predicted[i]:

correct += 1

return correct / float(len(actual)) * 100.0

# Evaluate an algorithm using a cross validation split

def evaluate_algorithm(dataset, algorithm, n_folds, *args):

folds = cross_validation_split(dataset, n_folds)

scores = list()

for fold in folds:

train_set = list(folds)

train_set.remove(fold)

train_set = sum(train_set, [])

test_set = list()

for row in fold:

row_copy = list(row)

test_set.append(row_copy)

row_copy[-1] = None

predicted = algorithm(train_set, test_set, *args)

actual = [row[-1] for row in fold]

accuracy = accuracy_metric(actual, predicted)

scores.append(accuracy)

return scores

# Make a prediction with weights

def predict(row, weights):

activation = weights[0]

for i in range(len(row)-1):

activation += weights[i + 1] * row[i]

return 1.0 if activation >= 0.0 else 0.0

# Estimate Perceptron weights using stochastic gradient descent

def train_weights(train, l_rate, n_epoch):

weights = [0.0 for i in range(len(train[0]))]

for epoch in range(n_epoch):

for row in train:

prediction = predict(row, weights)

error = row[-1] - prediction

weights[0] = weights[0] + l_rate * error

for i in range(len(row)-1):

weights[i + 1] = weights[i + 1] + l_rate * error * row[i]

return weights

# Perceptron Algorithm With Stochastic Gradient Descent

def perceptron(train, test, l_rate, n_epoch):

predictions = list()

weights = train_weights(train, l_rate, n_epoch)

for row in test:

prediction = predict(row, weights)

predictions.append(prediction)

return(predictions)

# Test the Perceptron algorithm on the sonar dataset

seed(1)

# load and prepare data

filename = 'sonar.all-data.csv'

dataset = load_csv(filename)

for i in range(len(dataset[0])-1):

str_column_to_float(dataset, i)

# convert string class to integers

str_column_to_int(dataset, len(dataset[0])-1)

# evaluate algorithm

n_folds = 3

l_rate = 0.01

n_epoch = 500

scores = evaluate_algorithm(dataset, perceptron, n_folds, l_rate, n_epoch)

print('Scores: %s' % scores)

print('Mean Accuracy: %.3f%%' % (sum(scores)/float(len(scores))))

A k value of 3 was used for cross-validation, giving each fold 208/3 = 69.3 or just under 70 records to be evaluated upon each iteration. A learning rate of 0.1 and 500 training epochs were chosen with a little experimentation.

You can try your own configurations and see if you can beat my score.

Running this example prints the scores for each of the 3 cross-validation folds then prints the mean classification accuracy.

We can see that the accuracy is about 72%, higher than the baseline value of just over 50% if we only predicted the majority class using the Zero Rule Algorithm.

Scores: [76.81159420289855, 69.56521739130434, 72.46376811594203]
Mean Accuracy: 72.947%

1 2	Scores: [76.81159420289855, 69.56521739130434, 72.46376811594203] Mean Accuracy: 72.947%

Extensions

This section lists extensions to this tutorial that you may wish to consider exploring.

Tune The Example. Tune the learning rate, number of epochs and even data preparation method to get an improved score on the dataset.
Batch Stochastic Gradient Descent. Change the stochastic gradient descent algorithm to accumulate updates across each epoch and only update the weights in a batch at the end of the epoch.
Additional Regression Problems. Apply the technique to other classification problems on the UCI machine learning repository.

Did you explore any of these extensions?
Let me know about it in the comments below.

Review

In this tutorial, you discovered how to implement the Perceptron algorithm using stochastic gradient descent from scratch with Python.

You learned.

How to make predictions for a binary classification problem.
How to optimize a set of weights using stochastic gradient descent.
How to apply the technique to a real classification predictive modeling problem.

Do you have any questions?
Ask your question in the comments below and I will do my best to answer.

170 Responses to How To Implement The Perceptron Algorithm From Scratch In Python

Philip Brierley November 2, 2016 at 7:07 am #

There is a derivation of the backprop learning rule at http://www.philbrierley.com/code.html and also similar code in a bunch of other languages from Fortran to c to php.

With help we did get it working in Python, with some nice plots that show the learning proceeding.

https://github.com/gavrol/NeuralNets

Reply
- Jason Brownlee November 2, 2016 at 9:10 am #
  
  Thanks for sharing Philip.
  
  Reply
  - Misge November 4, 2016 at 3:44 pm #
    
    Sorry to bother you but I want to understand whats wrong in using your code? I think you also used someone else’s code right? At least you read and reimplemented it. I hope my question will not offend you.
    
    Reply
    - Jason Brownlee November 5, 2016 at 7:28 am #
      
      I wrote the code from scratch myself.
      
      The code works, what problem are you having exactly?
      
      Reply
    - Mohsin July 10, 2019 at 1:40 am #
      
      Thanks it worked for me with python3.7
      
      Reply
      - Jason Brownlee July 10, 2019 at 8:14 am #
        
        Nice work.
    - anjali September 12, 2019 at 8:17 pm #
      
      nice
      
      Reply
- hinna September 15, 2018 at 10:30 pm #
  
  sir I used ,
  dataset=[[1,1,6,1],
  [1,7,2,1],
  [1,8,9,1],
  [1,9,9,1],
  [1,4,8,1],
  [1,8,5,1],
  [1,2,1,0],
  [1,3,3,0],
  [1,2,4,0],
  [1,7,1,0],
  [1,1,3,0],
  [1,5,2,1]
  ]
  this dataset and code was:
  # Make a prediction with weights
  def predict(row, weights):
  activation = weights[0]
  for i in range(len(row)-2):
  activation += weights[i + 1] * row[i+1]
  return 1.0 if activation >= 0.0 else 0.0
  
  # Estimate Perceptron weights using stochastic gradient descent
  
  def train_weights(train, l_rate, n_epoch):
  weights = [0.0 for i in range(len(train[0]))]
  for epoch in range(n_epoch):
  print(“Epoch no “,epoch)
  for row in train:
  print(“\n\nrow is “,row)
  print(weights)
  prediction = predict(row, weights)
  error = row[-1] – prediction
  weights[0] = weights[0] + l_rate * error
  for i in range(len(row)-2):
  weights[i + 1] = weights[i + 1] + l_rate * error * row[i+1]
  return weights
  
  # Perceptron Algorithm With Stochastic Gradient Descent
  def perceptron(train,l_rate, n_epoch):
  predictions = list()
  weights = train_weights(train, l_rate, n_epoch)
  for row in train:
  prediction = predict(row, weights)
  predictions.append(prediction)
  return(predictions)
  
  p=perceptron(dataset,l_rate,n_epoch)
  print(p)
  but output m getting is biased for the last entry of my dataset…so code not working well on this dataset 🙁
  
  Reply
  - Jason Brownlee September 16, 2018 at 6:01 am #
    
    Perhaps use Keras instead, this code is for learning how perceptron works rather than for solving problems.
    
    Reply
Andre Logunov November 3, 2016 at 12:06 pm #

Hi, Jason!

A very informative web-site you’ve got! I’m thinking of making a compilation of ML materials including yours. I wonder if I could use your wonderful tutorials in a book on ML in Russian provided of course your name will be mentioned? It’s just a thought so far.

Reply
- Jason Brownlee November 4, 2016 at 9:03 am #
  
  No Andre, please do not use my materials in your book.
  
  Reply
Stefan November 4, 2016 at 3:12 am #

Thanks for the interesting lesson. I’m reviewing the code now but I’m confused, where are the train and test values in the perceptron function coming from? I can’t find their origin.

Reply
- Stefan Lop November 4, 2016 at 6:38 am #
  
  I’m also receiving a ValueError(“empty range for randrange()”) error, the script seems to loop through a couple of randranges in the cross_validation_split function before erroring, not sure why. Was the script you posted supposed to work out of the box? Because I cannot get it to work and have been using the exact same data set you are working with.
  
  Reply
  - Jason Brownlee November 4, 2016 at 11:15 am #
    
    Hi Stefan, sorry to hear that you are having problems.
    
    Yes, the script works out of the box on Python 2.7.
    
    Perhaps there was a copy-paste error?
    Perhaps you are on a different platform like Python 3 and the script needs to be modified slightly?
    
    Are you able to share more details?
    
    Reply
    - Stefan November 5, 2016 at 12:15 am #
      
      Was running Python 3, works fine in 2 haha thanks!
      
      Reply
      - Jason Brownlee November 5, 2016 at 7:28 am #
        
        Glad to hear it.
  - Jason Brownlee January 3, 2017 at 9:52 am #
    
    I have updated the cross_validation_split() function in the above example to address issues with Python 3.
    
    Reply
- Jason Brownlee November 4, 2016 at 9:13 am #
  
  In the full example, the code is not using train/test nut instead k-fold cross validation, which like multiple train/test evaluations.
  
  Learn more about the test harness here:
  https://machinelearningmastery.com/create-algorithm-test-harness-scratch-python/
  
  Reply
  - Stefan November 5, 2016 at 12:22 am #
    
    But the train and test arguments in the perceptron function must be populated by something, where is it? I can’t find anything that would pass a value to those train and test arguments.
    
    Reply
    - Jason Brownlee November 5, 2016 at 7:31 am #
      
      Hi Stefan,
      
      The train and test arguments come from the call in evaluate_algorithm to algorithm() on line 67.
      
      Algorithm is a parameter which is passed in on line 114 as the perceptron() function.
      
      So, this means that each loop on line 58 that the train and test lists of observations come from the prepared cross-validation folds.
      
      To deeply understand this test harness code see the blog post dedicated to it here:
      https://machinelearningmastery.com/create-algorithm-test-harness-scratch-python/
      
      Reply
      - Stefan November 8, 2016 at 1:42 am #
        
        Oh boy, big time brain fart on my end I see it now. Thanks so much for your help, I’m really enjoying all of the tutorials you have provided so far.
      - Jason Brownlee November 8, 2016 at 9:54 am #
        
        I’m glad to hear you made some progress Stefan.
Amita misra November 12, 2016 at 6:34 pm #

Thanks for such a simple and basic introductory tutorial for deep learning. I had been trying to find something for months but it was all theano and tensor flow and left me intimidating. This is really a good place for a beginner like me.

Reply
- Jason Brownlee November 14, 2016 at 7:30 am #
  
  I’m glad to hear that Amita.
  
  Reply
vedhavyas November 20, 2016 at 10:42 pm #

Hi Jason,

Implemented in Golang. Here are my results

Id 2, predicted 53, total 70, accuracy 75.71428571428571
Id 1, predicted 53, total 69, accuracy 76.81159420289855
Id 0, predicted 52, total 69, accuracy 75.36231884057972
mean accuracy 75.96273291925466

no. of folds: 3
learningRate: 0.01
epochs: 500

Reply
- Jason Brownlee November 22, 2016 at 6:48 am #
  
  Very nice work vedhavyas!
  
  Do you have a link to your golang version you can post?
  
  Reply
Tim November 22, 2016 at 8:32 pm #

Hi Jason!

Thanks for the great tutorial! A ‘from-scratch’ implementation always helps to increase the understanding of a mechanism.

I have a question though: I thought to have read somewhere that in ‘stochastic’ gradient descent, the weights have to be initialised to a small random value (hence the “stochastic”) instead of zero, to prevent some nodes in the net from becoming or remaining inactive due to zero multiplication. I see in your gradient descent algorithm, you initialise the weights to zero. Could you elaborate some on the choice of the zero init value? My understanding may be incomplete, but this question popped up as I was reading.

Thanks!

Reply
- Jason Brownlee November 23, 2016 at 8:57 am #
  
  This can help with convergence Tim, but is not strictly required as the example above demonstrates.
  
  Reply
  - Tim November 23, 2016 at 7:40 pm #
    
    Thanks Jason! That clears it up!
    
    Reply
kero hakem December 22, 2016 at 1:55 am #

Thanks for the great tutorial! but how i can use this perceptron in predicting multiple classes

Reply
- Jason Brownlee December 22, 2016 at 6:36 am #
  
  You can use a one-vs-all approach for multi-class classification:
  https://en.wikipedia.org/wiki/Multiclass_classification#One-vs.-rest
  
  Generally, I would recommend moving on to something like a multilayer perceptron with backpropagation.
  
  Reply
PN February 22, 2017 at 5:52 am #

Thanks for your great website. I use part of your tutorials in my machine learning class if it’s allowed.

Reply
- Jason Brownlee February 22, 2017 at 10:06 am #
  
  Yes, use them any way you want, please credit the source.
  
  Reply
  - Maryiam December 18, 2019 at 8:51 am #
    
    which instruction will be use on cmd prompt to run this code
    
    Reply
    - Jason Brownlee December 18, 2019 at 1:29 pm #
      
      Perhaps this will help:
      https://machinelearningmastery.com/faq/single-faq/how-do-i-run-a-script-from-the-command-line
      
      Reply
      - Maryiam December 19, 2019 at 8:43 pm #
        
        Thank you so much sir
      - Jason Brownlee December 20, 2019 at 6:45 am #
        
        You’re welcome.
Aniket Saxena March 20, 2017 at 3:01 am #

Hello Sir, please tell me to visualize the progress and final result of my program, how I can use matplotlib to output an image for each iteration of algorithm.

Reply
- Jason Brownlee March 20, 2017 at 8:17 am #
  
  You could create and save the image within the epoch loop.
  
  Reply
Aniket Saxena March 22, 2017 at 1:20 am #

Hello Sir, as i have gone through the above code and found out the epoch loop in two functions like in def train_weights and def perceptron and since I’m a beginner in machine learning so please guide me how can i create and save the image within epoch loop to visualize output of perceptron algorithm at each iteration

Reply
- Jason Brownlee March 22, 2017 at 8:08 am #
  
  Sorry, I do not have an example of graphing performance. Consider using matplotlib.
  
  Reply
Sahiba March 24, 2017 at 4:43 am #

Hi Jason,

Thank you for this explanation. I have a question – why isn’t the bias updating along with the weights?

Reply
- Jason Brownlee March 24, 2017 at 8:01 am #
  
  It is, what do you mean exactly?
  
  Reply
Aniket Saxena March 29, 2017 at 4:33 am #

Hello Jason,
Here in the above code i didn’t understand few lines in evaluate_algorithm function. Please guide me why we use these lines in train_set and row_copy.

train_set.remove(fold)
train_set = sum(train_set, [])

and,

row_copy[-1] = None

Reply
- Jason Brownlee March 29, 2017 at 9:11 am #
  
  We clear the known outcome so the algorithm cannot cheat when being evaluated.
  
  Reply
Aniket Saxena March 29, 2017 at 1:13 pm #

Sir,
One more question that after assigning row_copy in test_set, why do we set the last element of row_copy to None, i.e.,
row_copy[-1] = None

Reply
- Jason Brownlee March 30, 2017 at 8:46 am #
  
  So that the outcome variable is not made available to the algorithm used to make a prediction.
  
  Reply
Aniket Saxena March 31, 2017 at 2:11 am #

And there is a question that the lookup dictionary’s value is updated at every iteration of for loop in function str_column_to_int() and that we returns the lookup dictionary then why we use second for loop to update the rows of the dataset in the following lines :
for i, value in enumerate(unique):
lookup[value] = i
for row in dataset:
row[column] = lookup[row[column]]
return lookup

Does it affect the dataset values after having passed the lookup dictionary and if yes, does the dataset which have been passed to the function evaluate_algorithm() may also alter in the following function call statement :

scores = evaluate_algorithm(dataset, perceptron, n_folds, l_rate, n_epoch)

Reply
Michel May 17, 2017 at 7:17 am #

Hello, I would like to understand 2 points of the code?
1 ° because on line 10, you use train [0]?
2 ° According to the formula of weights, w (t + 1) = w (t) + learning_rate * (expected (t) – predicted (t)) * x (t), then because it used in the code “weights [i + 1 ] = Weights [i + 1] + l_rate * error * row [i] “,
Where does this plus 1 come from in the weigthts after equality?

Reply
- Jason Brownlee May 17, 2017 at 8:44 am #
  
  Because the weight at index zero contains the bias term.
  
  Reply
  - Michel May 25, 2017 at 2:36 am #
    
    Sorry, I still do not get it. Can you explain it a little better?
    
    Reply
Sri June 4, 2017 at 3:36 pm #

Hi, I just finished coding the perceptron algorithm using stochastic gradient descent, i have some questions :

1) When i train the perceptron on the entire sonar data set with the goal of reaching the minimum “the sum of squared errors of prediction” with learning rate=0.1 and number of epochs=500 the error get stuck at 40.

What do i do to minimize this error?

2) This question is regarding the k-fold cross validation test. A model trained on k folds must be less generalized compared to a model trained on the entire dataset. If this is true then how valid is the k-fold cross validation test?

3) To find the best combination of “learning rate” and “no. of epochs” looks like the real trick behind the learning process. How to find this best combination?

Reply
- Jason Brownlee June 5, 2017 at 7:39 am #
  
  You could try different configurations of learning rate and epochs.
  
  k-fold cross validation gives a more robust estimate of the skill of the model when making predictions on new data compared to a train/test split, at least in general.
  
  There is no “Best” anything in machine learning, just lots of empirical trial and error to see what works well enough for your problem domain:
  https://machinelearningmastery.com/a-data-driven-approach-to-machine-learning/
  
  Reply
Vaibhav Rai July 18, 2017 at 5:44 pm #

Hello sir!
Can you help me fixing out an error in the randrange function.
ValueError: empty range for randrange()

Reply
- Jason Brownlee July 19, 2017 at 8:21 am #
  
  This may be a python 2 vs python 3 things. I used Python 2 in the development of the example.
  
  Reply
  - Vaibhav Rai July 19, 2017 at 4:07 pm #
    
    actually I changed the mydata_copy with mydata in cross_validation_split to correct that error but now a key error:137 is occuring there.
    
    Reply
    - Jason Brownlee July 19, 2017 at 4:11 pm #
      
      Are you able to post more information about your environment (Python version) and the error (the full trace)?
      
      Reply
      - Vaibhav Rai July 19, 2017 at 5:13 pm #
        
        Sir my python version is 3.6 and the error is
        KeyError: 137
      - Jason Brownlee July 20, 2017 at 6:17 am #
        
        Sorry, the example was developed for Python 2.7.
        
        I believe the code requires modification to work in Python 3.
Vaibhav Rai July 21, 2017 at 4:38 pm #

Can you please tell me which other function can we use to do the job of generating indices in place of randrange.

Reply
- Jason Brownlee July 22, 2017 at 8:31 am #
  
  What is wrong with randrange() it is supported in Py2 and Py3.
  https://docs.python.org/3/library/random.html#random.randrange
  
  Reply
Alex Godfrey August 29, 2017 at 12:31 am #

How is the baseline value of just over 50% arrived at?

Reply
- Jason Brownlee August 29, 2017 at 5:08 pm #
  
  By predicting the majority class, or the first class in this case.
  
  Learn about the Zero Rule algorithm here:
  https://machinelearningmastery.com/implement-baseline-machine-learning-algorithms-scratch-python/
  
  Reply
Carla Meyer Castro September 28, 2017 at 4:24 am #

Hi, I have a question with this function

# Convert string column to float
def str_column_to_float(dataset, column):
for row in dataset:
row[column] = float(row[column].strip())

What is it returns?

Reply
- Jason Brownlee September 28, 2017 at 5:28 am #
  
  Nothing, it modifies the provided column directly.
  
  Reply
Dhruv October 31, 2017 at 2:53 pm #

i want to find near similar records by comparing one row with all the rest in file.How should i inplement this using sklearn and python.Please help me out.

Reply
- Jason Brownlee October 31, 2017 at 2:59 pm #
  
  Perhaps you can calculate the Euclidean distance between rows.
  
  You may have to implement it yourself in Python.
  
  Reply
Deva November 3, 2017 at 12:11 am #

row[column]=float(row[column].strip()) is creating an error
ValueError : could not string to float : R

Reply
- Jason Brownlee November 3, 2017 at 5:19 am #
  
  Sorry to hear that, are you using the code and data in the post exactly?
  
  Reply
Olu March 20, 2018 at 10:07 pm #

How would you extend this code to Recurrent Net without the Keras library?

Reply
- Jason Brownlee March 21, 2018 at 6:32 am #
  
  An RNN would require a completely new implementation.
  
  Reply
Bala April 6, 2018 at 3:32 pm #

Hey Jason,
A very great and detailed article indeed.
I just wanted to ask when I run your code my accuracy and values slightly differ ie I get about 74.396% and the values also alter every time I run the code again but every so slightly. Sometimes I also hit 75%.
Why does this happen?
My logic is because the k-fold validation randomly creates 3 splits for the data-set it is depending on this for its learning since test data changes randomly. Is my logic right?
Thanks Jason.

Reply
- Jason Brownlee April 6, 2018 at 3:54 pm #
  
  This can happen, see this post on why:
  https://machinelearningmastery.com/randomness-in-machine-learning/
  
  Reply
Martin April 30, 2018 at 11:51 pm #

Hello Jason,
Very nice tutorial it really helped me understand the idea behind the perceptron! But my question to you is, how is this different from a normal gradient descent? I cannot see where the stochastic part comes in? Are you not supposed to sample the dataset and perform your calculations on subsets?

Thanks in advance,
Martin

Reply
- Jason Brownlee May 1, 2018 at 5:34 am #
  
  Gradient descent is just the optimizaiton algorithm.
  
  Here we apply it to solving the perceptron weights.
  
  Reply
Ravi June 25, 2018 at 9:49 am #

in ‘Training Network Weights’
the formula is defined as
w(t+1) = w(t) + learning_rate * learning_rate *(expected(t)- predicted(t)) * x(t)
bias(t+1) = bias(t) + learning_rate *(expected(t)- predicted(t)) * x(t)

so t=0, w(1) = w(0) + learning_rate * learning_rate *(expected(0)- predicted(0)) * x(0)
this is conflicting with the code in ‘train_weights’ function

In ‘train_weights’ function:
following snapshot:
# Estimate Perceptron weights using stochastic gradient descent
def train_weights(train, l_rate, n_epoch):
weights = [0.0 for i in range(len(train[0]))]
for epoch in range(n_epoch):
for row in train:
prediction = predict(row, weights)
error = row[-1] – prediction
weights[0] = weights[0] + l_rate * error
for i in range(len(row)-1):
weights[i + 1] = weights[i + 1] + l_rate * error * row[i]
return weights

Question:
Iteration 1: (i=0)
for i in range(len(row)-1):
weights[i + 1] = weights[i + 1] + l_rate * error * row[i]
so, weights[0 + 1] = weights[0 + 1] + l_rate * error * row[0] (i.e) weights[1] = weights[1] + l_rate * error * row[0] , do we need to consider weights[1] and row[0] for calculating weights[1] ? (but not weights[1] and row[1] for calculating weights[1] )
Confusion is row[0] is used to calculate weights[1]

Per formula mentioned in ”Training Network Weights’ – my understanding is

weights[0] = bias term
but the formula pattern must be followed

weights[1] = weights[0] + l_rate * error * row[0]
weights[2] = weights[1] + l_rate * error * row[1]

Instead of (‘train_weights’)
weights[1] = weights[1] + l_rate * error * row[0]
weights[2] = weights[2] + l_rate * error * row[1]

I would request you to explain why it is different in ‘train_weights’ function?

Reply
- Jason Brownlee June 25, 2018 at 2:38 pm #
  
  How so, where is the conflict exactly?
  
  Reply
Ben Hine July 20, 2018 at 10:14 am #

Love your tutorials. I do have a nit-picky question though. Why do you include x in your weight update formula? That is, if you include x, ‘weight update’ would be a misnomer. It should be called an input update formula? Am I off base here? Thanks.

Reply
- Jason Brownlee July 21, 2018 at 6:28 am #
  
  We are changing/updating the weights of the model, not the input. Input is immutable. Therefore, it is a weight update formula.
  
  Reply
Ben Hine July 21, 2018 at 10:41 pm #

Thank you for the reply. I guess, I am having a challenging time as to what role X is playing the formula. Also, regarding your “contrived” data set… how did you come up with it? Are you randomly creating x1 and x2 values and then arbitrarily assigning zeroes and ones as outputs, then using the neural network to come up with the appropriate weights to satisfy the “expected” outputs using the given bias and weights as the starting point?

Reply
- Jason Brownlee July 22, 2018 at 6:24 am #
  
  The network learns a set of weights that correctly maps inputs to outputs.
  
  This is the foundation of all neural networks.
  
  Reply
Ben Hine July 22, 2018 at 12:10 pm #

I probably did not word my question correctly, but thanks. I think I understand, now, the role variable x is playing in the weight update formula. Before I go into that, let me share that I think a neural network could still learn without it. Here goes: 1. the difference between zero and one will always be 1, 0 or -1. The weight will increment by a factor of the product of the difference, learning rate, and input variable. If we omit the input variable, the increment values change by a factor of the product of just the difference and learning rate, so it will not break down the neuron’s ability to update the weight. So I don’t really see the need for the input variable. Perhaps there is solid reason? One possible reason that I see is that if the values of inputs are always larger than the weights in neural network data sets, then the role it plays is that it makes the update value larger, given that the input values are always greater than 1. Sorry if this is obvious, but I did not see it right away, but I like to know the purpose of all the components in a formula. Thanks. Having fun with your code though. So far so good!

Reply
Ben Hine July 24, 2018 at 3:34 pm #

Sorry if my previous question is too convoluted to understand, but I am wondering if you agree that the input x is not needed for the weight formula to work in your code. Any, the codes works, in Python 3.6 (Jupyter Notebook) and with no changes to it yet, my numbers are:

Scores: [81.15942028985508, 69.56521739130434, 62.31884057971014]
Mean Accuracy: 71.014%

I will play with the parameters and report back to see if I can improve upon it. I, for one, would not think 71.014 would give a mine sweeping manager a whole lot of confidence.

Reply
- Jason Brownlee July 25, 2018 at 6:11 am #
  
  If you remove x from the equation you no longer have the perceptron update algorithm. That is fine if it works for you.
  
  Reply
Ben Hine July 25, 2018 at 2:00 pm #

This is really great code for people like me, who are just getting to know perceptrons. I’d like to point out though, for ultra beginners, that the code:
lookup[value] = i is some what unintuitive and potentially confusing. As you know ‘lookup’ is defined as a dict, and dicts store data in key-value pairs. But this snippet is actually designating the variable ‘value’ (‘R’ and ‘M’) as the keys and ‘i’ (0, 1) as the values. Just thought it was worth noting. Please don’t hate me :). I could have never written this myself.

Reply
- Jason Brownlee July 25, 2018 at 2:41 pm #
  
  Thanks for the note Ben, sorry I didn’t explain it clearly.
  
  Reply
  - Ben Hine July 26, 2018 at 12:26 pm #
    
    No worries.
    
    Reply
Ben Hine July 26, 2018 at 12:26 pm #

Jason, there is so much to admire about this code, but there is something that is unusual. The cross_validation_split generates random indexes, but indexes are repeated either in the same fold or across all three folds. What we are left with is repeated observations, while leaving out others. This is acceptable? I have not seen a folding method like this before.

Reply
- Jason Brownlee July 26, 2018 at 2:28 pm #
  
  I don’t think that is the case Ben.
  
  You can see more on this implementation of k-fold CV here:
  https://machinelearningmastery.com/implement-resampling-methods-scratch-python/
  
  You can more more about CV in general here:
  https://machinelearningmastery.com/faq/single-faq/how-does-k-fold-cross-validation-work
  
  Reply
Ben Hine July 27, 2018 at 1:50 pm #

Thanks Jason, I did go through the code in the first link. It does help solidify my understanding of cross validation split. So your result for the 10 data points, after running cross validation split implies that each of the four folds always have unique numbers from the 10 data points. Wouldn’t it be even more random, especially for a large dataset, to shuffle the entire set of points before selecting data points for the next fold? Yes, data would repeat, but there is another element of randomness.

Going back to my question about repeating indexes outputted by the cross validation split function in the neural net work code, I printed out each index number for each fold. In fold zero, I got the index number ‘7’, three times. This is what I ran:

# Split a dataset into k folds
def cross_validation_split(dataset, n_folds):
dataset_split = list()
dataset_copy = list(dataset)
fold_size = int(len(dataset) / n_folds)
print(“fold_size =%s” % int(len(dataset)/n_folds))
for i in range(n_folds):
fold = list()
print(“fold = %s” % i)
while len(fold) < fold_size:
index = randrange(len(dataset_copy))
print("index = %s" % index)
fold.append(dataset_copy.pop(index))
dataset_split.append(fold)
return dataset_split

There were other repeats in this fold too. Repeats are also in fold one and two. Am I not understanding something here? Sorry to be the devil's advocate, but I am perplexed.

Reply
- Ben Hine July 27, 2018 at 2:16 pm #
  
  Actually, after some more research I’m convinced randrange is not the way to go here if you want unique values, especially for progressively larger datasets. For example, the following site used randrange(100) and their code produced at least one repeating value. I think this might work:
  import random
  random.sample(range(interval), count)
  
  in the first pass, interval = 69, count = 69
  in the second pass, interval = 70-138, count = 69
  in the third pass, interval = 139-208, count =69
  
  I’ll implement this when I return to look at your page and tell you how it goes.
  
  I don’t take any pleasure in pointing this out, I just want to understand everything. I am really enjoying the act of taking your algorithm apart and putting it back together. I admire its sophisticated simplicity and hope to code like this in future. I plan to look at the rest of this and keep looking at your other examples if they have the same qualities. 🙂
  
  Reply
  - Ben Hine July 27, 2018 at 2:17 pm #
    
    I forgot to post the site: https://www.geeksforgeeks.org/randrange-in-python/
    
    Reply
- Jason Brownlee July 28, 2018 at 6:26 am #
  
  Note that we are reducing the size of dataset_copy with each selection by removing the selection.
  
  This means that the index will repeat but will point to different data.
  
  You can confirm this by testing the function on a small contrived dataset of 10 examples of integer values as in the post I linked and see that no values are repeated in the folds.
  
  Perhaps take a moment to study the function again?
  
  Reply
  - Ben Hine July 28, 2018 at 1:56 pm #
    
    Wow. Yep. That’s easy to see. I just got put in my place. There is a lot going on but orderly. I missed it. Thanks. Sorry about that.
    
    Reply
    - Jason Brownlee July 29, 2018 at 6:04 am #
      
      Sorry Ben, I don’t want to put anyone in there place, just to help.
      
      Perhaps the code is too complicated.
      
      Reply
Ben Hine July 30, 2018 at 1:39 am #

Please don’t be sorry. Code is great. If it’s too complicated that is my shortcoming, but I love learning something new every day. I am really enjoying it. I really find it interesting that you use lists instead of dataframes too. This is gold. I just want to know it really well and understand all the function and methods you are using.

Reply
- Jason Brownlee July 30, 2018 at 5:53 am #
  
  I chose lists instead of numpy arrays or data frames in order to stick to the Python standard library.
  
  These examples are for learning, not optimized for performance.
  
  Reply
Gop September 10, 2018 at 11:59 am #

How do we show testing data points linearly or not linearly separable?

Reply
- Jason Brownlee September 10, 2018 at 2:09 pm #
  
  Whether you can draw a line to separate them or fit them for classification and regression respectively.
  
  Reply
  - Gop September 11, 2018 at 4:40 am #
    
    Thanks Jason, Could you please elaborate on this as I am new to this?
    
    Reply
    - Jason Brownlee September 11, 2018 at 6:32 am #
      
      Plot your data and see if you can separate it or fit it with a line.
      
      Or don’t, assume it can be and evaluate the performance of the model. If it performs poorly, it is likely not separable.
      
      Reply
Felipe September 21, 2018 at 12:18 pm #

Hi, I tried your tutorial and had a lot of fun changing the learning rate, I got to:
lRate: 1.875000, n_epoch: 300 Scores:
[82.6086956521739, 72.46376811594203, 73.91304347826086]
Mean Accuracy: 76.329%

I don’t know if this would help anybody… but I thought I’d share.

Keep posting more tutorials!

Reply
- Jason Brownlee September 21, 2018 at 2:21 pm #
  
  Very nice work!
  
  Reply
muluken October 1, 2018 at 6:04 pm #

hi , am muluken from Ethiopia. i want to work my Msc thesis work on predicting geolocation prediction of Gsm users using python programming and regression based method. however, i wouldn’t get the best training method in python programming and how to normalize the data to make it fit to the model as a training data set. please say sth about it .

Reply
- Jason Brownlee October 2, 2018 at 6:22 am #
  
  I recommend using scikit-learn for your project, you can get started here:
  https://machinelearningmastery.com/start-here/#python
  
  Reply
Jackson Scott October 2, 2018 at 12:19 pm #

Why does the learning rate not particularly matter when its changed in regards to the mean accuracy.
Currently, I have the learning rate at 9000 and I am still getting the same accuracy as before.

Reply
- Jason Brownlee October 3, 2018 at 6:13 am #
  
  Perhaps the problem is very simple and the model will learn it regardless.
  
  Reply
James October 28, 2018 at 12:33 am #

thanks for your time sir, can you tell me somewhere i can find these kind of codes made with MATLAB?
thank you.

Reply
- Jason Brownlee October 28, 2018 at 6:13 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/do-you-have-tutorials-in-octave-or-matlab
  
  Reply
ahmed khalifa November 21, 2018 at 3:12 am #

this very simple and excellent ,, thanks man

Reply
- Jason Brownlee November 21, 2018 at 7:53 am #
  
  Thanks. I’m glad it helped.
  
  Reply
Rukshar February 4, 2019 at 9:03 pm #

Can you please suggest some datasets from UCI ML repo. to perform example 3?

Reply
- Jason Brownlee February 5, 2019 at 8:18 am #
  
  Perhaps some of those listed here:
  https://machinelearningmastery.com/tour-of-real-world-machine-learning-problems/
  
  Reply
Aman Dhiman February 8, 2019 at 6:58 pm #

Very good guide for a beginner like me ! could you help with the weights you have mentioned in the above example. I am writing my own perceptron by looking at your example as a guide, now I don’t want to use the same weight vector as yours , but would like to generate the same 100% accurate prediction for the example dataset of yours. Please guide me how to initialize best random weights for a efficient perceptron. Thank you.

Reply
- Jason Brownlee February 9, 2019 at 5:54 am #
  
  You can change the random number seed to get a different random set of weights.
  
  Reply
Toufik February 18, 2019 at 10:29 pm #

Thank’s Jason , i would classify more than two classes with iris calssification using single layer , can you help me ?

Reply
- Jason Brownlee February 19, 2019 at 7:25 am #
  
  Perhaps start with this tutorial instead:
  https://machinelearningmastery.com/multi-class-classification-tutorial-keras-deep-learning-library/
  
  Reply
Toufik February 19, 2019 at 9:21 pm #

hello but i would use just the perceptron for 3 classes in the output

Reply
- Jason Brownlee February 20, 2019 at 8:03 am #
  
  It is designed for binary classification, perhaps use an MLP instead?
  
  Reply
Aya March 2, 2019 at 8:31 pm #

In the fourth line of your code which is
for i in range(len(row)-1):

I think there is a mistake here it should be for i in range(len(weights)-1):
also, the same mistake in line 18

and many thanks for sharing your knowledge. well organized and explained topic.

Reply
- Jason Brownlee March 3, 2019 at 8:00 am #
  
  Thanks, why do you think it is a mistake?
  
  Reply
Yavuzhan ERDEM April 10, 2019 at 9:01 pm #

Hello Mr.Jason,

I didn’t understand that why are you sending three inputs to predict function?
we have two inputs x1 and x2 so that should we send two inputs to predict . Could you explain ?

Reply
- Jason Brownlee April 11, 2019 at 6:37 am #
  
  Where in the tutorial exactly?
  
  Reply
Saurabh Kmar August 2, 2019 at 7:40 pm #

Jason Brownlee bhai tu paida kyun hua??

Reply
- Jason Brownlee August 3, 2019 at 7:49 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/can-you-do-some-consulting
  
  Reply
Dereje August 26, 2019 at 1:12 am #

Dear Jason Thank you very much for the code on the Perceptron algorithm on Sonar dataset. In the code where do we exactly use the function str_column_to_int? I could not find it. Thank you in advance.

Reply
- Jason Brownlee August 26, 2019 at 6:20 am #
  
  Good question, line 109 of the final example.
  
  Reply
  - Dereje August 26, 2019 at 8:56 am #
    
    Thank you for your reply. May be I didn’t understand the code. I was expecting an assigned variable for the output of str_column_to_int which is not the case, like dataset_int = str_column_to_int . That is why I asked you.
    
    I have tried for 4-folds, l_rate = 0.1 and n_epoch = 500: Here is the output
    
    Scores: [80.76923076923077, 82.6923076923077, 73.07692307692307, 71.15384615384616]
    Mean Accuracy: 76.923%
    
    Reply
    - Jason Brownlee August 26, 2019 at 2:14 pm #
      
      Nice work!
      
      Reply
Rukshar Alam September 9, 2019 at 6:57 pm #

Do give us more exercises to practice. Your tutorials are concise, easy-to-understand.

Reply
- Jason Brownlee September 10, 2019 at 5:40 am #
  
  Thanks for the suggestion!
  
  Reply
Mike September 12, 2019 at 1:59 am #

Hi Jason,

I have tried your Perceptron example, with the sonar all data.csv dataset. But I am not getting the same Socres and Mean Accuracy, you got , as you can see here:

Scores: [0.0, 1.4492753623188406, 0.0]
Mean Accuracy: 0.483%

I went step by step with the previous codes you show in your tutorial and they run fine. Could you please give a hand on this.

Tnakk you

Reply
- Jason Brownlee September 12, 2019 at 5:21 am #
  
  That is a very low score. I have some suggestions here that may help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
Sangeeth September 16, 2019 at 3:31 am #

Hi,
The last element of dataset is either 0 or 1. I dont see the bias in weights. Should not we add 1 in the first element of X data set, when updating weights?.
Thanks

Reply
- Jason Brownlee September 16, 2019 at 6:39 am #
  
  The first weight (index 0) is the bias.
  
  The bias weight is learned, not 1.
  
  Reply
Efrain October 31, 2019 at 7:54 am #

Hi Jason
I run your code, but I got different results than you.. why?

Scores: [50.0, 66.66666666666666, 50.0]
Mean Accuracy: 55.556%

Reply
- Jason Brownlee October 31, 2019 at 8:17 am #
  
  Perhaps confirm you are using Python 2.7 or 3.6?
  Perhaps try running the example a few times?
  
  Reply
Home January 26, 2020 at 1:07 am #

Hi Jason

I am confused about what gets entered into the function on line 19 of the code in section 2?

weights[i + 1] = weights[i + 1] + l_rate * error * row[i]

I’m new to Neural Networks and am trying to get this code working to understand a Perceptron better before I go into a masked R-CNN for body part recognition (for combat sports)

The code works in python; I have confirmed that, however, like in section 1, I want to understand your math fully. I got it correctly confirmed by using excel, and I’m finding it difficult to know what exactly gets plugged into the formula above (as I cant discern from the code)

I have the excel file id love to send you, or maybe you can make line 19 clearer to me on a response. by possibly giving me an example

I appreciate your work here; it has really helped me to date. Looking forward to your response

Thanks

Rory

Reply
- Home January 26, 2020 at 1:22 am #
  
  could you define for me the elements in that function
  
  thanks
  
  Reply
  - Jason Brownlee January 26, 2020 at 5:23 am #
    
    Yes.
    
    – weights are the parameters of the model.
    – weights[0] is the bias, like an intercept in regression. An offset.
    – weights[i+1] is a weight for one input variable/column.
    – l_rate is the learning rate, a hyperparameter we set to tune how fast the model learns from the data.
    – error is the prediction error made by the model on a sample
    – row[i] is the value of one input variable/column
    
    I hope that helps.
    
    Reply
- Jason Brownlee January 26, 2020 at 5:20 am #
  
  What are you confused about in that line exactly? Perhaps I can answer your specific question?
  
  Reply
Home January 26, 2020 at 2:52 am #

Jason,

I may have solved my inadequacies with understanding the code,… from the formula; i did a print of certain variables within the function to understand the math better… I got the following in my excel sheet

Wt 0.722472523 0
W[t+1] 0.116618823 0
W[t+2] -0.234181177 1
W[t+3] -0.234181177 1
W[t+4] -0.234181177 1

after five epochs, does this look correct

Reply
Home January 28, 2020 at 5:18 am #

Jason

I got through the code and implemented with PY3.8.1.

Scores: [81.15942028985508, 69.56521739130434, 62.31884057971014]
Mean Accuracy: 71.014%

increased learning rate and epoch increases accuracy

great tutorial. thanks

Reply
- Jason Brownlee January 28, 2020 at 7:58 am #
  
  Well done!
  
  Reply
nn April 3, 2020 at 12:20 pm #

LevelOfViolence CriticsRating Watched
0 1 1.2 -1
1 1 3.5 1
2 1 4.2 1
3 2 3.9 1
4 2 2.8 -1
5 3 3.0 -1
6 5 4.5 -1
7 4 1.8 -1
8 1 2.1 -1
9 3 4.8 1
10 5 4.9 1
11 3 1.5 -1
12 3 2.6 -1

Reply
nn April 3, 2020 at 12:21 pm #

three columns last one is label first two is xn,yn..how to implement perceptron

Reply
- Jason Brownlee April 3, 2020 at 1:16 pm #
  
  Perhaps start with this much simpler library:
  https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
  
  Reply
nn April 10, 2020 at 8:34 am #

not able to solve the problem..i m sharing my code here
def misclasscified(w_vector,x_vector,train_label):
X1_train = [i[0] for i in x_vector]
X2_train = [i[1] for i in x_vector]
mis_classified_list = []
for j in range(len(train_label)):
i = 0
predicted_label= w_vector[i]+ w_vector[i+1] * X1_train[j]+ w_vector[i+2] * X2_train[j]
if (predicted_label >= 0):
predicted_label = 1
else:
predicted_label = -1
if (predicted_label != train_label[j]):
mis_classified_list.append([X1_train[j],X2_train[j]])

return mis_classified_list

w_vector =np.random.rand(3,1);
x_vector = train_data
train_label = [-1,1,1,1,-1,-1,-1,-1,-1,1,1,-1,-1]
obj = misclasscified(w_vector,x_vector,train_label)
obj

Reply
- Jason Brownlee April 10, 2020 at 8:41 am #
  
  This is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code
  
  Reply
Bertie April 11, 2020 at 11:43 pm #

Thanks for a great tutorial! Just a quick question here:
is it really called Stochastic Gradient Descent, when you do not randomly pick a row to update your parameters with? I was under the impression that one should randomly pick a row for it to be correct…
Thanks a bunch =)

Reply
- Jason Brownlee April 12, 2020 at 6:21 am #
  
  Technically “stochastic” GD or “online” GD refers to updating the weights after each row of data, and shuffling the data after each epoch.
  
  Reply
Sai Srinivas August 5, 2020 at 3:46 pm #

why do we need to multiply with x in the weight update rule ??

Reply
john September 26, 2020 at 3:19 am #

Hello,
Thanks for the good work!

In lines 75-78:
activation = weights[0]
for i in range(len(row)-1):
activation += weights[i + 1] * row[i]

I believe you should start with activation = weights[0]*row[0], and then activation += weights[i + 1] * row[i+1], otherwise, the dot-product is shifted.

Correct me if I am wrong.

Reply
- Jason Brownlee September 26, 2020 at 6:22 am #
  
  No, 0 is reserved for the bias that has no input.
  
  Perhaps re-read the part of the tutorial where this is mentioned.
  
  Reply
Vaishnavi October 21, 2020 at 6:41 am #

I want to implement XOR Gate using perceptron in Python. Can I try using multilayered perceptron where NAND, OR gates are in hidden layer and ‘AND Gate’ will give the output? Single layer perceptron is not giving me the output. Or, is there any other faster method?

Reply
- Jason Brownlee October 21, 2020 at 6:47 am #
  
  Why do you want to use logic gates in the perceptron algorithm?
  
  I don’t understand, sorry.
  
  Reply
Vaishnavi October 22, 2020 at 4:36 pm #

I’m a student. I got an assignment to write code for perceptron network to solve XOR problem and analyse the effect of learning rate. I calculated the weights myself, but I need to make a code so that the program itself updates the weights.

Reply
- Jason Brownlee October 23, 2020 at 6:03 am #
  
  Perhaps you can use the above as a starting point.
  
  Reply
Armando February 6, 2021 at 9:14 am #

Thanks for the awesome article, learning about it I decided to implement a more simple version of it for clarity.

It is also 2 parameters and 3 weights, and the purpose is to verify if a point (x,y) is above or below a line.

I added graphical visualizations to see the model learning in action.

https://gist.github.com/amaynez/012f6adab976246e8f8a9e77e00d7989

Reply
- Jason Brownlee February 6, 2021 at 2:22 pm #
  
  Well done.
  
  Reply
Mohamed Ahmed Mohamedzain March 19, 2021 at 8:13 am #

Please
I’d like a code for Perceptron learning , the logic operation AND

Reply
- Jason Brownlee March 20, 2021 at 5:14 am #
  
  Great, the above tutorial is a good starting point. You will need to prepare a dataset, e.g. rows of data with input and output for the AND logic.
  
  Reply
xiaoou wang April 15, 2021 at 7:35 am #

Great tutorial, just pass by to say hello.

l_rate = 0.05 gives 75.36% accuracy, yeah

Reply
- Jason Brownlee April 16, 2021 at 5:26 am #
  
  Well done!
  
  Reply
joey May 22, 2021 at 1:27 am #

Run perceptron_sonar_001.py
Scores: [10.135135135135135, 12.837837837837837, 17.56756756756757]
Mean Accuracy: 13.514%
my random data set

Reply
- Jason Brownlee May 22, 2021 at 5:34 am #
  
  Ouch.
  
  Reply
Magda September 30, 2021 at 3:31 am #

Thanks to this great tutorial. It helped me to understand and implement my own NN.

Reply
Yasukawa May 30, 2023 at 8:39 am #

I could understand the concept of Perceptron with your codes. Thank you very much. Meantime, with the following dataset (created by myself)
datasets = [
[1.23412321, 2.45612453, 0],
[2.45678234, 4,23411223, 0],
[3.12309812, 1.45615523, 0],
[1.12453278, 3.16782391, 0],
[1.34123190, -0.98713654, 0],
[2.34590238, 3.140243321, 1],
[6.89761151, 1.87644201, 1],
[6.986309341, 2.91381529, 1],
[4.08977865, 1.09812765, 1],
[3.56743452, 1.87655543, 1]
]

the following error occurred: ‘IndexError: list index out of range’

Please kindly give me your advice. Thank you very much

Reply
- James Carmichael May 30, 2023 at 8:51 am #
  
  You are very welcome Yasukawa! The following resource may provide some insight into this error type:
  
  https://www.freecodecamp.org/news/list-index-out-of-range-python-error-solved/
  
  Reply
sudhir December 6, 2024 at 7:37 pm #

create perception with appropriate number of inputs and outputs. train it using fixed increment learning algorithm intill no change in weights is required

Reply

Navigation

How To Implement The Perceptron Algorithm From Scratch In Python

Description

Perceptron Algorithm

Stochastic Gradient Descent

Sonar Dataset

Tutorial

1. Making Predictions

2. Training Network Weights

3. Modeling the Sonar Dataset

Extensions

Review

Discover How to Code Algorithms From Scratch!

No Libraries, Just Python Code.

Finally, Pull Back the Curtain on
Machine Learning Algorithms

More On This Topic

170 Responses to How To Implement The Perceptron Algorithm From Scratch In Python

Leave a Reply Click here to cancel reply.

Navigation

Description

Perceptron Algorithm

Stochastic Gradient Descent

Sonar Dataset

Tutorial

1. Making Predictions

2. Training Network Weights

3. Modeling the Sonar Dataset

Extensions

Review

Discover How to Code Algorithms From Scratch!

No Libraries, Just Python Code.

Finally, Pull Back the Curtain on Machine Learning Algorithms

More On This Topic

170 Responses to How To Implement The Perceptron Algorithm From Scratch In Python

Leave a Reply Click here to cancel reply.

Finally, Pull Back the Curtain on
Machine Learning Algorithms