A Gentle Introduction to Particle Swarm Optimization

Particle swarm optimization (PSO) is one of the bio-inspired algorithms and it is a simple one to search for an optimal solution in the solution space. It is different from other optimization algorithms in such a way that only the objective function is needed and it is not dependent on the gradient or any differential form of the objective. It also has very few hyperparameters.

In this tutorial, you will learn the rationale of PSO and its algorithm with an example. After competing this tutorial, you will know:

  • What is a particle swarm and their behavior under the PSO algorithm
  • What kind of optimization problems can be solved by PSO
  • How to solve a problem using particle swarm optimization
  • What are the variations of the PSO algorithm

Kick-start your project with my new book Optimization for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Particle Swarms

Particle Swarm Optimization was proposed by Kennedy and Eberhart in 1995. As mentioned in the original paper, sociobiologists believe a school of fish or a flock of birds that moves in a group “can profit from the experience of all other members”. In other words, while a bird flying and searching randomly for food, for instance, all birds in the flock can share their discovery and help the entire flock get the best hunt.

Particle Swarm Optimization

Particle Swarm Optimizsation.
Photo by Don DeBold, some rights reserved.

While we can simulate the movement of a flock of birds, we can also imagine each bird is to help us find the optimal solution in a high-dimensional solution space and the best solution found by the flock is the best solution in the space. This is a heuristic solution because we can never prove the real global optimal solution can be found and it is usually not. However, we often find that the solution found by PSO is quite close to the global optimal.

Example Optimization Problem

PSO is best used to find the maximum or minimum of a function defined on a multidimensional vector space. Assume we have a function $f(X)$ that produces a real value from a vector parameter $X$ (such as coordinate $(x,y)$ in a plane) and $X$ can take on virtually any value in the space (for example, $f(X)$ is the altitude and we can find one for any point on the plane), then we can apply PSO. The PSO algorithm will return the parameter $X$ it found that produces the minimum $f(X)$.

Let’s start with the following function

$$
f(x,y) = (x-3.14)^2 + (y-2.72)^2 + \sin(3x+1.41) + \sin(4y-1.73)
$$

Plot of f(x,y)

As we can see from the plot above, this function looks like a curved egg carton. It is not a convex function and therefore it is hard to find its minimum because a local minimum found is not necessarily the global minimum.

So how can we find the minimum point in this function? For sure, we can resort to exhaustive search: If we check the value of $f(x,y)$ for every point on the plane, we can find the minimum point. Or we can just randomly find some sample points on the plane and see which one gives the lowest value on $f(x,y)$ if we think it is too expensive to search every point. However, we also note from the shape of $f(x,y)$ that if we have found a point with a smaller value of $f(x,y)$, it is easier to find an even smaller value around its proximity.

This is how a particle swarm optimization does. Similar to the flock of birds looking for food, we start with a number of random points on the plane (call them particles) and let them look for the minimum point in random directions. At each step, every particle should search around the minimum point it ever found as well as around the minimum point found by the entire swarm of particles. After certain iterations, we consider the minimum point of the function as the minimum point ever explored by this swarm of particles.

Want to Get Started With Optimization Algorithms?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Algorithm Details

Assume we have $P$ particles and we denote the position of particle $i$ at iteration $t$ as $X^i(t)$, which in the example of above, we have it as a coordinate $X^i(t) = (x^i(t), y^i(t)).$ Besides the position, we also have a velocity for each particle, denoted as $V^i(t)=(v_x^i(t), v_y^i(t))$. At the next iteration, the position of each particle would be updated as
$$
X^i(t+1) = X^i(t)+V^i(t+1)
$$
or, equivalently,
$$
\begin{aligned}
x^i(t+1) &= x^i(t) + v_x^i(t+1) \\
y^i(t+1) &= y^i(t) + v_y^i(t+1)
\end{aligned}
$$
and at the same time, the velocities are also updated by the rule
$$
V^i(t+1) =
w V^i(t) + c_1r_1(pbest^i – X^i(t)) + c_2r_2(gbest – X^i(t))
$$
where $r_1$ and $r_2$ are random numbers between 0 and 1, constants $w$, $c_1$, and $c_2$ are parameters to the PSO algorithm, and $pbest^i$ is the position that gives the best $f(X)$ value ever explored by particle $i$ and $gbest$ is that explored by all the particles in the swarm.

Note that $pbest^i$ and $X^i(t)$ are two position vectors and the difference $pbest^i – X^i(t)$ is a vector subtraction. Adding this subtraction to the original velocity $V^i(t)$ is to bring the particle back to the position $pbest^i$. Similar are for the difference $gbest – X^i(t)$.

Vector subtraction.
Diagram by Benjamin D. Esham, public domain.

We call the parameter $w$ the inertia weight constant. It is between 0 and 1 and determines how much should the particle keep on with its previous velocity (i.e., speed and direction of the search). The parameters $c_1$ and $c_2$ are called the cognitive and the social coefficients respectively. They controls how much weight should be given between refining the search result of the particle itself and recognizing the search result of the swarm. We can consider these parameters controls the trade off between exploration and exploitation.

The positions $pbest^i$ and $gbest$ are updated in each iteration to reflect the best position ever found thus far.

One interesting property of this algorithm that distinguishs it from other optimization algorithms is that it does not depend on the gradient of the objective function. In gradient descent, for example, we look for the minimum of a function $f(X)$ by moving $X$ to the direction of $-\nabla f(X)$ as it is where the function going down the fastest. For any particle at the position $X$ at the moment, how it moves does not depend on which direction is the “down hill” but only on where are $pbest$ and $gbest$. This makes PSO particularly suitable if differentiating $f(X)$ is difficult.

Another property of PSO is that it can be parallelized easily. As we are manipulating multiple particles to find the optimal solution, each particles can be updated in parallel and we only need to collect the updated value of $gbest$ once per iteration. This makes map-reduce architecture a perfect candidate to implement PSO.

Implementation

Here we show how we can implement PSO to find the optimal solution.

For the same function as we showed above, we can first define it as a Python function and show it in a contour plot:

Contour plot of f(x,y)

Here we plotted the function $f(x,y)$ in the region of $0\le x,y\le 5$. We can create 20 particles at random locations in this region, together with random velocities sampled over a normal distribution with mean 0 and standard deviation 0.1, as follows:

which we can show their position on the same contour plot:

Initial position of particles

From this, we can already find the $gbest$ as the best position ever found by all the particles. Since the particles did not explore at all, their current position is their $pbest^i$ as well:

The vector pbest_obj is the best value of the objective function found by each particle. Similarly, gbest_obj is the best scalar value of the objective function ever found by the swarm. We are using min() and argmin() functions here because we set it as a minimization problem. The position of gbest is marked as a star below

Position of gbest is marked as a star

Let’s set $c_1=c_2=0.1$ and $w=0.8$. Then we can update the positions and velocities according to the formula we mentioned above, and then update $pbest^i$ and $gbest$ afterwards:

The following is the position after the first iteration. We mark the best position of each particle with a black dot to distinguish from their current position, which are set in blue.

Positions of particles after one iteration

We can repeat the above code segment for multiple times and see how the particles explore. This is the result after the second iteration:

Positions of particles after two iterations

and this is after the 5th iteration, note that the position of $gbest$ as denoted by the star changed:

Positions of particles after 5 iterations

and after 20th iteration, we already very close to the optimal:

Positions of particles after 20 iterations

This is the animation showing how we find the optimal solution as the algorithm progressed. See if you may find some resemblance to the movement of a flock of birds:

Animation of particle movements

So how close is our solution? In this particular example, the global minimum we found by exhaustive search is at the coordinate $(3.182,3.131)$ and the one found by PSO algorithm above is at $(3.185,3.130)$.

Variations

All PSO algorithms are mostly the same as we mentioned above. In the above example, we set the PSO to run in a fixed number of iterations. It is trivial to set the number of iterations to run dynamically in response to the progress. For example, we can make it stop once we cannot see any update to the global best solution $gbest$ in a number of iterations.

Research on PSO were mostly on how to determine the hyperparameters $w$, $c_1$, and $c_2$ or varying their values as the algorithm progressed. For example, there are proposals making the inertia weight linear decreasing. There are also proposals trying to make the cognitive coefficient $c_1$ decreasing while the social coefficient $c_2$ increasing to bring more exploration at the beginning and more exploitation at the end. See, for example, Shi and Eberhart (1998) and Eberhart and Shi (2000).

Complete Example

It should be easy to see how we can change the above code to solve a higher dimensional objective function, or switching from minimization to maximization. The following is the complete example of finding the minimum point of the function $f(x,y)$ proposed above, together with the code to generate the plot animation:

Further Reading

These are the original papers that proposed the particle swarm optimization, and the early research on refining its hyperparameters:

  • Kennedy J. and Eberhart R. C. Particle swarm optimization. In Proceedings of the International Conference on Neural Networks; Institute of Electrical and Electronics Engineers. Vol. 4. 1995. pp. 1942–1948. DOI: 10.1109/ICNN.1995.488968
  • Eberhart R. C. and Shi Y. Comparing inertia weights and constriction factors in particle swarm optimization. In Proceedings of the 2000 Congress on Evolutionary Computation (CEC ‘00). Vol. 1. 2000. pp. 84–88. DOI: 10.1109/CEC.2000.870279
  • Shi Y. and Eberhart R. A modified particle swarm optimizer. In Proceedings of the IEEE International Conferences on Evolutionary Computation, 1998. pp. 69–73. DOI: 10.1109/ICEC.1998.699146

Summary

In this tutorial we learned:

  • How particle swarm optimization works
  • How to implement the PSO algorithm
  • Some possible variations in the algorithm

As particle swarm optimization does not have a lot of hyper-parameters and very permissive on the objective function, it can be used to solve a wide range of problems.

Get a Handle on Modern Optimization Algorithms!

Optimization for Maching Learning

Develop Your Understanding of Optimization

...with just a few lines of python code

Discover how in my new Ebook:
Optimization for Machine Learning

It provides self-study tutorials with full working code on:
Gradient Descent, Genetic Algorithms, Hill Climbing, Curve Fitting, RMSProp, Adam, and much more...

Bring Modern Optimization Algorithms to
Your Machine Learning Projects


See What's Inside

63 Responses to A Gentle Introduction to Particle Swarm Optimization

  1. Avatar
    Iraj September 16, 2021 at 2:01 pm #

    Very good article Jason. thanks for sharing.
    We can use it to find and set the initial weights of a neural network to save training time. And use bep to find global minimum where weights are optimized and the error in minimum.
    Gradient decent can do that but we have error vanishing issue and training time is a lot in complex networks, so instead of initializing the network with some random weights we can start with the optimized weights from pso and bep will do rest of the job.
    I have done it on several projects and worked fine.

    • Avatar
      Kiran Lonikar March 7, 2022 at 5:42 pm #

      I agree, you can use it to initialize weights of neural networks, and hopefully train in less epochs. Can you tell if you have done it using some popular libraries like tensorflow or pytorch?

  2. Avatar
    Bibhuti September 17, 2021 at 12:24 pm #

    How to combine pso with ann Or with svm for regression task. Or any other machine learning algorithm.
    I guess can we use loss function as object function .

    • Avatar
      Adrian Tam September 25, 2021 at 5:35 am #

      Yes, you wrap the entire model as the objective function then apply your favorite optimization algorithm to it.

      • Avatar
        Amr Rashed September 17, 2023 at 10:51 pm #

        Thank you, sir. This article has been incredibly informative. Do you have an example illustrating how we can apply PSO to our ML model?

  3. Avatar
    Adrian Tam September 19, 2021 at 6:09 am #

    PSO is an optimization method. For example, you can use it to do search on optimal hyperparameters. Simply make your machine learning model as the objective function and apply PSO on the parameter, for example.

  4. Avatar
    Jeremy September 24, 2021 at 9:23 am #

    Adrian,

    Outstanding article! Particle swarming techniques are becoming a big area of interest at work and this is a fantastic primer on the topic!

    • Avatar
      Adrian Tam September 24, 2021 at 9:36 am #

      Glad you like it!

  5. Avatar
    Firas Obeid September 24, 2021 at 10:02 am #

    Thank you Jason! I have been searching for this for a long time to replace it instead of backprop.

    • Avatar
      Adrian Tam September 25, 2021 at 4:13 am #

      Hope you like it!

  6. Avatar
    Ron Johnson September 25, 2021 at 2:02 am #

    Thank YOU!

    Once again, you not only covered the topic very precisely but you also created an impressive demonstration on how the PSO algorithm functions. Hats off to you for breaking it down the way you did!!

    • Avatar
      Adrian Tam September 25, 2021 at 4:36 am #

      Thanks. Glad you enjoyed it.

  7. Avatar
    Dipesh Somvanshi October 9, 2021 at 9:15 pm #

    Can you explain this fancy boolean indexing?

    pbest[:, (pbest_obj >= obj)] = X[:, (pbest_obj >= obj)]

    • Avatar
      Adrian Tam October 13, 2021 at 5:45 am #

      If X is a vector, (X > n) will give you a boolean vector for “true” whenever element of X greater than n and “false” otherwise. Then Y[(X > n)] will select the elements from Y on the position that (X > n) is “true”. This essentially picks a sub-vector or sub-matrix from Y according to the boolean value.

  8. Avatar
    Jack October 11, 2021 at 11:26 am #

    Dear Adrian Tam
    Thank you for your wonderful your blog and how to combine pso and machine learning algorithm in other dataset? like xgboost

    • Avatar
      Adrian Tam October 13, 2021 at 7:19 am #

      Depends on why you want to do that. If you try to explore the hyperparameters for XGBoost, you can wrap your XGBoost model as a function, which the input are the hyperparameter, output is the score. Then feed this function into optimization to find the best-performing hyperparamters.

  9. Avatar
    Jeremy January 7, 2022 at 8:15 am #

    Adrian,

    Good afternoon! If you have a minute, I’m wondering if you can help expand on what you have. Say I want to have the particles, instead of searching for the ‘x’, chase after a moving ‘x’. Is there a relatively efficient way to make that happen? It seems like I could keep the data initialization section the same for simplicity. Then, maybe define points of a square, for example, and use those four points for pbest_obj and gbest_obj?

    This is a great piece of code! I’ve gotten a lot out of it by modifying the objective function and hyperparameters to get a good idea of how the algorithm works. Thank you very much for posting this and for your time!

  10. Avatar
    Erfan March 11, 2022 at 4:16 am #

    Dear Adrian,

    absolutely useful article. Especially the graphical part! I only noticed a very simple correction in the following line:
    pbest_obj = np.array([pbest_obj, obj]).max(axis=0)
    I guess max(axis=0) should change to min(axis=0) as we are going to find the minimum of the objective function. You already have corrected it in the “Complete Example” part.

    Thanks again,

    • Avatar
      James Carmichael March 11, 2022 at 1:01 pm #

      Great feedback Erfan!

  11. Avatar
    Betty April 12, 2022 at 8:31 am #

    How can I set the constraint and number of iterations?

  12. Avatar
    Ryan April 21, 2022 at 10:56 pm #

    Hi,
    Thanks for the tutorial. Good job!
    I am working on a dataset to find the accuracy of Hand Gesture classification using different algorithms like KNN and SVM (using the sklearn libraries in python). Now I have to optimize the accuracy found by the SVM using Particle Swarm optimization algorithm. But I don’t know where to begin with with this optimization problem following your guide to PSO.

    I have my implementaion of SVM part done , and now I want to optimize it using the PSO.

    Any help regarding this matter will be really appreciated. Or even if you can point me in the right direction, that would also be helpful.

    Thanks again.

    • Avatar
      James Carmichael April 22, 2022 at 1:02 am #

      Hi Ryan…Please specify exactly what you are trying to accomplish and/or items from the tutorial that you are not clear how to apply.

      • Avatar
        Ryan April 22, 2022 at 8:54 am #

        Thanks for the answer,

        I will try to explain from the beginning.

        I have a dataset in a CSV format, that contains information about the hand gestures in the American sign language, extracted in the form of features and it contains decimal values(no strings). As a reference I have attached the reference to a paper for better understanding or explain what I am trying to do.

        https://revistas.usal.es/index.php/2255-2863/article/view/ADCAIJ2021102123136

        Now, I want to calculate that which ANN algorithm variant (like k nearest neighbours, Support Vector Machines, Naive Bayes, Decision Tree etc.) gives me the best accuracy on this dataset. So, using the built in libraries in Python(numpy, pandas, sklearn), I created a python code, split my data into training and testing, and applied the algorithms e.g. SVM on my dataset and got the accuracy of 75%. Now, I would like to improve this accuracy using optimization algorithms like PSO or Genetic Algorihtm. For that part I need help. Like.., how do I optimize the accuracy of SVM using particle swarm optimazation in the python code.

        You have a sample PSO code in python above. How do I follow your example and get the best possible results for my dataset on which I have applied the SVM algortihm…?

        Thanks again for taking your time out to read my comment.

  13. Avatar
    Ryan April 23, 2022 at 8:32 pm #

    Awaiting your kind response.

    Thanks

  14. Avatar
    Monish April 24, 2022 at 3:56 pm #

    How can i add constraints to the function?

  15. Avatar
    Ryan May 5, 2022 at 7:59 am #

    Hi James,

    Can you please take some time to comment on my question that I asked above. Would really appreciate your help.

    Thanks

  16. Avatar
    Majid May 27, 2022 at 4:38 am #

    Amazing demonstration, Thank you kind sir
    a small problem under 3rd plot, in the code snippet I believe in line 9, max(axis=0) should be changed to min(axis=0), like the complete example code, line 42

    much appreciated again, excellent explanation.

  17. Avatar
    Keith Anthony August 30, 2022 at 7:29 am #

    Do you have an example multiobjective problem using PSO (or another, like NSGA-III, or MOEA/D),
    or perhaps using pyMoo, playpus, or jMetalPy?

    • Avatar
      James Carmichael August 31, 2022 at 5:45 am #

      Hi Keith…We do not currently have examples directly related to those concepts.

      • Avatar
        Peter January 23, 2023 at 8:36 am #

        Do you have any suggestions, or references, on how PSO could be applied to a multi-objective optimisation problem please?

  18. Avatar
    noor September 28, 2022 at 6:29 am #

    Thank you very much , i have question please tell me how can i use this example to make optimization on schedule on Gantt chart.

    • Avatar
      James Carmichael September 28, 2022 at 6:47 am #

      Hi noor…You are very welcome! Please clarify the specific goals of your model and the nature of your input so that we may better assist you.

      • Avatar
        Noor September 28, 2022 at 6:35 pm #

        How do I apply the code above as an optimization to the Gantt Chart

      • Avatar
        noor September 29, 2022 at 11:53 pm #

        Or, How can the code be combined with the Gantt chart?

      • Avatar
        noor September 30, 2022 at 12:19 am #

        ؟؟؟؟ help me

  19. Avatar
    sara September 30, 2022 at 12:23 am #

    I am need to use the code with Gantt chart scheduling of PSO method , how can i do it ??

  20. Avatar
    sara September 30, 2022 at 9:48 pm #

    thank you so much, it’s helpful but can I get the code for this pepper ?

  21. Avatar
    sara September 30, 2022 at 9:50 pm #

    ***paper

  22. Avatar
    Siddharth Agrawal October 30, 2022 at 7:24 pm #

    Hi, I have a question when it comes to solving any optimization problem. I am currently working on the budget spend optimization for various medias. I have historical data in the format x1, x2, x3,x4 and y where x1, x2, x3 and x4 are the input variables and are the spend per channel and y is the output that is the total revenue achieved based on the spend of x1 + x2 + x3 + x4. The biggest question is how to derive the objective function from the historical performance data ? Could you please help me with this. I have been trying to figure out from the last few days but no luck yet. 🙁

    • Avatar
      Sandra December 2, 2022 at 1:28 pm #

      Hi
      I need to use pso to Model keras conv1d
      How add this to model ? To optimize hyper prameter

  23. Avatar
    Siddharth Agrawal October 30, 2022 at 7:28 pm #

    Hi, I have been trying to solve the marketing spend optimization problem using cvxpy library. I have the input as x1,x2,x3 and x4 with output y. So, x1, x2,x3 and x4 are the spend per media channel (we have 4 channels) and then y is the total revenue achieved due to the x1 + x2 + x3 + x4 spend combined. I also have individual channel level revenue for all the 4 channels. Now my biggest challenge is how to derive the objective function to solve such problems ? Could you share any thoughts here or any guidance will be really very helpful.

  24. Avatar
    Sandra December 3, 2022 at 7:24 pm #

    Hi
    I need to use pso to Model keras conv1d
    How add this to model ? To optimize hyper prameter

  25. Avatar
    Firman December 10, 2022 at 4:22 pm #

    are you have pyton code model for this case but using simulated annealing algorithm

  26. Avatar
    Lakshmi Kanta Barman January 16, 2023 at 4:10 am #

    Can you help me by discussing All Members Based Optimization technique in step by step with a small example. Thanks in advance.

  27. Avatar
    Elia January 23, 2023 at 11:21 am #

    Hi
    Thanks ???? for everything
    I ask how can use pso to optimize Hyperprameter filter for cnn1d?? What’s code for this if u can helpe me please

    • Avatar
      Adrian Tam January 24, 2023 at 2:27 am #

      It may not be easily done in a few lines of code but I can share with you the method: Create a function for the CNN, such that it takes hyperparameter as input and some evaluation metric as output (e.g., a MSE of a fixed test dataset). In the function, you build the CNN using the hyperparameter, do the training, and perform evaluation. Then apply PSO on this function which the PSO is to change the hyperparameter and observe the function’s output.

      Note that, since each iteration in PSO is to create multiple CNN and evaluate it, there’s a lot of computation involved. You need to be patient.

  28. Avatar
    Dean February 24, 2023 at 7:28 pm #

    Thanks you, Json, it’s an awsome article, and I began to learn the algorithm from your work.
    I am curious about the code below:

    python
    pbest_obj = np.array([pbest_obj, obj]).max(axis=0)

    I think it’s maybe a mistabke, and the function you want to use is min(axis=0)

    • Avatar
      James Carmichael February 25, 2023 at 10:16 am #

      Hi Dean…You are very welcome! Please elborate on why you disagree with the original code.

  29. Avatar
    MBT February 26, 2023 at 4:59 am #

    This website is by far the best website for learning optimization, Python, and machine learning. I found that there are at least 12 optimization algorithms in Python. Is there any code for ICA? Imperialist competitive algorithm?

  30. Avatar
    Mebarka Allaoui November 4, 2023 at 11:07 pm #

    Thank you, Adrian Tam, for this post. It is really helpful. I have a question. Could you please help me apply the PSO algorithm to the kullback-leibler divergence function?

Leave a Reply