Differential Evolution from Scratch in Python

Differential evolution is a heuristic approach for the global optimisation of nonlinear and non- differentiable continuous space functions.

The differential evolution algorithm belongs to a broader family of evolutionary computing algorithms. Similar to other popular direct search approaches, such as genetic algorithms and evolution strategies, the differential evolution algorithm starts with an initial population of candidate solutions. These candidate solutions are iteratively improved by introducing mutations into the population, and retaining the fittest candidate solutions that yield a lower objective function value.

The differential evolution algorithm is advantageous over the aforementioned popular approaches because it can handle nonlinear and non-differentiable multi-dimensional objective functions, while requiring very few control parameters to steer the minimisation. These characteristics make the algorithm easier and more practical to use.

In this tutorial, you will discover the differential evolution algorithm for global optimisation.

After completing this tutorial, you will know:

  • Differential evolution is a heuristic approach for the global optimisation of nonlinear and non- differentiable continuous space functions.
  • How to implement the differential evolution algorithm from scratch in Python.
  • How to apply the differential evolution algorithm to a real-valued 2D objective function.

Kick-start your project with my new book Optimization for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

  • June/2021: Fixed mutation operation in the code to match the description.

Tutorial Overview

This tutorial is divided into three parts; they are:

  1. Differential Evolution
  2. Differential Evolution Algorithm From Scratch
  3. Differential Evolution Algorithm on the Sphere Function

Differential Evolution

Differential evolution is a heuristic approach for the global optimisation of nonlinear and non- differentiable continuous space functions.

For a minimisation algorithm to be considered practical, it is expected to fulfil five different requirements:

(1) Ability to handle non-differentiable, nonlinear and multimodal cost functions.
(2) Parallelizability to cope with computation intensive cost functions.
(3) Ease of use, i.e. few control variables to steer the minimization. These variables should
also be robust and easy to choose.
(4) Good convergence properties, i.e. consistent convergence to the global minimum in
consecutive independent trials.

A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces, 1997.

The strength of the differential evolution algorithm stems from the fact that it was designed to fulfil all of the above requirements.

Differential Evolution (DE) is arguably one of the most powerful and versatile evolutionary optimizers for the continuous parameter spaces in recent times.

Recent advances in differential evolution: An updated survey, 2016.

The algorithm begins by randomly initiating a population of real-valued decision vectors, also known as genomes or chromosomes. These represent the candidates solutions to the multi- dimensional optimisation problem.

At each iteration, the algorithm introduces mutations into the population to generate new candidate solutions. The mutation process adds the weighted difference between two population vectors to a third vector, to produce a mutated vector. The parameters of the mutated vector are again mixed with the parameters of another predetermined vector, the target vector, during a process known as crossover that aims to increase the diversity of the perturbed parameter vectors. The resulting vector is known as the trial vector.

DE generates new parameter vectors by adding the weighted difference between two population vectors to a third vector. Let this operation be called mutation.
In order to increase the diversity of the perturbed parameter vectors, crossover is introduced.

A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces, 1997.

These mutations are generated according to a mutation strategy, which follows a general naming convention of DE/x/y/z, where DE stands for Differential Evolution, while x denotes the vector to be mutated, y denotes the number of difference vectors considered for the mutation of x, and z is the type of crossover in use. For instance, the popular strategies:

  • DE/rand/1/bin
  • DE/best/2/bin

Specify that vector x can either be picked randomly (rand) from the population, or else the vector with the lowest cost (best) is selected; that the number of difference vectors under consideration is either 1 or 2; and that crossover is performed according to independent binomial (bin) experiments. The DE/best/2/bin strategy, in particular, appears to be highly beneficial in improving the diversity of the population if the population size is large enough.

The usage of two difference vectors seems to improve the diversity of the population if the number of population vectors NP is high enough.

A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces, 1997.

A final selection operation replaces the target vector, or the parent, by the trial vector, its offspring, if the latter yields a lower objective function value. Hence, the fitter offspring now becomes a member of the newly generated population, and subsequently participates in the mutation of further population members. These iterations continue until a termination criterion is reached.

The iterations continue till a termination criterion (such as exhaustion of maximum functional evaluations) is satisfied.

Recent advances in differential evolution: An updated survey, 2016.

The differential evolution algorithm requires very few parameters to operate, namely the population size, NP, a real and constant scale factor, F ∈ [0, 2], that weights the differential variation during the mutation process, and a crossover rate, CR ∈ [0, 1], that is determined experimentally. This makes the algorithm easy and practical to use.

In addition, the canonical DE requires very few control parameters (3 to be precise: the scale factor, the crossover rate and the population size) — a feature that makes it easy to use for the practitioners.

Recent advances in differential evolution: An updated survey, 2016.

There have been further variants to the canonical differential evolution algorithm described above,
which one may read on in Recent advances in differential evolution – An updated survey, 2016.

Now that we are familiar with the differential evolution algorithm, let’s look at how to implement it from scratch.

Want to Get Started With Optimization Algorithms?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Differential Evolution Algorithm From Scratch

In this section, we will explore how to implement the differential evolution algorithm from scratch.
The differential evolution algorithm begins by generating an initial population of candidate solutions. For this purpose, we shall use the rand() function to generate an array of random values sampled from a uniform distribution over the range, [0, 1).

We will then scale these values to change the range of their distribution to (lower bound, upper bound), where the bounds are specified in the form of a 2D array with each dimension corresponding to each input variable.

It is within these same bounds that the objective function will also be evaluated. An objective function of choice and the bounds on each input variable may, therefore, be defined as follows:

We can evaluate our initial population of candidate solutions by passing it to the objective function as input argument.

We shall be replacing the values in obj_all with better ones as the population evolves and converges towards an optimal solution.

We can then loop over a predefined number of iterations of the algorithm, such as 100 or 1,000, as specified by parameter, iter, as well as over all candidate solutions.

The first step of the algorithm iteration performs a mutation process. For this purpose, three random candidates, a, b and c, that are not the current one, are randomly selected from the population and a mutated vector is generated by computing: a + F * (b – c). Recall that F ∈ [0, 2] and denotes the mutation scale factor.

The mutation process is performed by the function, mutation, to which we pass a, b, c and F as input arguments.

Since we are operating within a bounded range of values, we need to check whether the newly mutated vector is also within the specified bounds, and if not, clip its values to the upper or lower limits as necessary. This check is carried out by the function, check_bounds.

The next step performs crossover, where specific values of the current, target, vector are replaced by the corresponding values in the mutated vector, to create a trial vector. The decision of which values to replace is based on whether a uniform random value generated for each input variable falls below a crossover rate. If it does, then the corresponding values from the mutated vector are copied to the target vector.

The crossover process is implemented by the crossover() function, which takes the mutated and target vectors as input, as well as the crossover rate, cr ∈ [0, 1], and the number of input variables.

A final selection step replaces the target vector by the trial vector if the latter yields a lower objective function value. For this purpose, we evaluate both vectors on the objective function and subsequently perform selection, storing the new objective function value in obj_all if the trial vector is found to be the fittest of the two.

We can tie all steps together into a differential_evolution() function that takes as input arguments the population size, the bounds of each input variable, the total number of iterations, the mutation scale factor and the crossover rate, and returns the best solution found and its evaluation.

Now that we have implemented the differential evolution algorithm, let’s investigate how to use it to optimise an objective function.

Differential Evolution Algorithm on the Sphere Function

In this section, we will apply the differential evolution algorithm to an objective function.
We will use a simple two-dimensional sphere objective function specified within the bounds, [-5, 5]. The sphere function is continuous, convex and unimodal, and is characterised by a single global minimum at f(0, 0) = 0.0.

We will minimise this objective function with the differential evolution algorithm, based on the strategy DE/rand/1/bin.

In order to do so, we must define values for the algorithm parameters, specifically for the population size, the number of iterations, the mutation scale factor and the crossover rate. We set these values empirically to, 10, 100, 0.5 and 0.7 respectively.

We also define the bounds of each input variable.

Next, we carry out the search and report the results.

Tying this all together, the complete example is listed below.

Running the example reports the progress of the search including the iteration number, and the response from the objective function each time an improvement is detected.

At the end of the search, the best solution is found and its evaluation is reported.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that the algorithm converges very close to f(0.0, 0.0) = 0.0 in about 33 improvements out of 100 iterations.

We can plot the objective function values returned at every improvement by modifying the differential_evolution() function slightly to keep track of the objective function values and return this in the list, obj_iter.

We can then create a line plot of these objective function values to see the relative changes at every improvement during the search.

Tying this together, the complete example is listed below.

Running the example creates a line plot.

The line plot shows the objective function evaluation for each improvement, with large changes initially and very small changes towards the end of the search as the algorithm converged on the optima.

Line Plot of Objective Function Evaluation for Each Improvement During the Differential Evolution Search

Line Plot of Objective Function Evaluation for Each Improvement During the Differential Evolution Search

Want to Get Started With Optimization Algorithms?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Papers

Books

Articles

Summary

In this tutorial, you discovered the differential evolution algorithm.
Specifically, you learned:

  • Differential evolution is a heuristic approach for the global optimisation of nonlinear and non- differentiable continuous space functions.
  • How to implement the differential evolution algorithm from scratch in Python.
  • How to apply the differential evolution algorithm to a real-valued 2D objective function.

Get a Handle on Modern Optimization Algorithms!

Optimization for Maching Learning

Develop Your Understanding of Optimization

...with just a few lines of python code

Discover how in my new Ebook:
Optimization for Machine Learning

It provides self-study tutorials with full working code on:
Gradient Descent, Genetic Algorithms, Hill Climbing, Curve Fitting, RMSProp, Adam, and much more...

Bring Modern Optimization Algorithms to
Your Machine Learning Projects


See What's Inside

30 Responses to Differential Evolution from Scratch in Python

  1. Avatar
    Jim Winburn June 16, 2021 at 11:58 am #

    Hi, I’m getting a list index out of range on pyplot.plot(solution[2], ‘.-‘)
    Any ideas?

  2. Avatar
    JG June 18, 2021 at 8:07 pm #

    Hi Jason,

    Thank you for the tutorial. I see so much future on these Differential Evolution and Genetic Algorithms to look for global optimization !

    I did not have so much time to play with this tutorial about Differential Evolution code, but I already play with the scipy API : differential_evolution(). So I think this tutorial is just talking about buiding your own code (from scratch) algorithm. ok?

    Anyway I give more attention to your previous: Genetic Algorithm (from scratch) :https://machinelearningmastery.com/simple-genetic-algorithm-from-scratch-in-python/

    So, the first question I see both of them (Genetic Algorithm and Differential Evolution code) very very similar in process if not completely equal.
    The only thing I see is that the mutation process is performed over a binary version on variables in genetic algorithm and here in DE not . Also I see the pattern strategy DE/x/y/z ..Am I right?

    The second question is due to this process (genetic & diff evolution) are focus into get global optimum …and are so quickly, stables and efficient convergents to the global optimum…. that I guest someone must be playing with them to “replace” back-propagation or Stochastic Gradient descent to have a new option to get ML/DL model weights’s update, am I right?

    thanks

    • Avatar
      Jason Brownlee June 19, 2021 at 5:53 am #

      How new solutions are created and the representation used are the main differences.

      Yes, but generally backprop/SGD is a better solution for neural bets.

  3. Avatar
    WF June 18, 2021 at 9:15 pm #

    Hi,

    the text says for the mutation a + F * (b – c), but the code is written as

    def mutation(x, F):
    return x[0] + F * x[1] – x[2]

    cheers

  4. Avatar
    Anthony The Koala June 18, 2021 at 9:34 pm #

    Dear Dr Jason,
    Another way is to go to the “complete code’. At the top of the code, there is an icon which looks like “” which opens the plain code. Copy the the code by pressing CTRL+A and then CTRL+C.

    Then open a DOS window and type python to run the python interpreter under DOS.

    Paste code and at the end of the last line at pyplot.show(), press enter.

    I tried it and the code works.

    Alternatively, run an interpreter such as “Spyder”, NOT the Idle IDE.

    Paste the code = Ctrl+V.

    It works.
    Thank you,
    Anthony of Sydney.

    • Avatar
      Anthony The Koala June 18, 2021 at 10:42 pm #

      Dear Dr Jason,
      The icon looks like that toggles between plain code and line-numbered code.
      Don’t know why the disappeared when enclosed in quote marks.
      So the toggle between plain code and line-numbered code by clicking on the icon.
      Thank you,
      Anthony of Sydney

      • Avatar
        Anthony The Koala June 19, 2021 at 2:14 am #

        Sorry I was referring to the icon that looked like which toggles plain coder and line-numbered code. It disappeared.

        Thank you,
        Anthony of Sydney

        • Avatar
          Anthony The Koala June 19, 2021 at 2:20 am #

          The icon is a combination of the gt (greater than) sign and lt (less than). The image of the icon is lt gt.

          For an inexpllicable reason, the actual gt and lt signs don’t appear. The use of the signs look like the toggle between plain code and numbered code.

          Thank you,
          Anthony of Sydney

    • Avatar
      Jason Brownlee June 19, 2021 at 5:54 am #

      Thanks for sharing.

  5. Avatar
    Anthony The Koala June 20, 2021 at 5:37 pm #

    Dear Dr Jason,
    In your tutorial, you had an objective function

    What if you have a set of data and don’t know the objective function? How does one determine the objective function from raw data?

    Thank you,
    Anthony of Sydney

    • Avatar
      Jason Brownlee June 21, 2021 at 5:36 am #

      In machine learning, the objective function is often a loss function, e.g. minimizing loss of the model when making predictions on a training dataset.

  6. Avatar
    Anthony The Koala June 24, 2021 at 8:19 am #

    Dear Dr Jason,
    How do we extract the loss function to use in this differential evolution algorithm?
    Thank you,
    Anthony of Sydney

    • Avatar
      Jason Brownlee June 25, 2021 at 6:08 am #

      What do you mean by “extract the loss function”?

      Generally, you define a loss function you want to minimize in machine learning, typically an error function for a model against a training dataset.

      • Avatar
        Anthony The Koala June 26, 2021 at 2:02 pm #

        Dear Dr Jason,
        I want to be able to use the loss function from any of the ML functions xgboost, keras and plug that into the differential evolution algorithm in case the cost functions from the ML algorithms in xgboost and keras have a non-differentiable function.

        That is why I want to be able to extract the cost functions from the xgboost and keras and plug it in the differential evolution algorithm.

        Thank you,
        Anthony of Sydney

        • Avatar
          Jason Brownlee June 27, 2021 at 4:34 am #

          Yes, it is as I stated, the loss between the predictions and expected values in the training dataset.

          You may need to write custom code to use arbitrary optimization algorithms with GBDT or neural nets.

          • Avatar
            Anthony The Koala June 27, 2021 at 3:05 pm #

            Dear Dr Jason,
            Thank you for your reply.
            Three ways of asking the same question – no need to concern using differential evolution if using packages such as Keras, XGBoost, and Torch which have their own optimization techniques.

            * So I can conclude that when using ML algorithms in Keras, XGBoost and Torch to name a few ML packages, there is no real need to use differential evolution?

            * In other words, the differential evolution algorithms in this tutorial and at https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.differential_evolution.html are for known functions that cannot be differentiated.

            * In other words, it’s pointless using the differential evolution for known packages, Keras, XGBoostr and Torch etc.

            Always appreciating your answers,

            Thank you,

            Anthony of Sydney

          • Avatar
            Jason Brownlee June 28, 2021 at 7:57 am #

            Not pointless, but you need a good reason. The conventional methods used for each algorithm are the best known methods.

  7. Avatar
    Anthony The Koala June 28, 2021 at 2:09 pm #

    Dear Dr Jason,
    Thank you for your reply.
    Anthony of Sydney

    • Avatar
      Jason Brownlee June 29, 2021 at 4:45 am #

      You’re welcome.

      • Avatar
        Eri August 26, 2021 at 10:49 pm #

        If I want to optimize more than one output at the same time, how can I do it?

        • Stefania Cristina
          Stefania Cristina August 31, 2021 at 5:38 pm #

          If what you are after is solving a multi-objective optimization problem using DE, then there are several ways to do it: (paper). In essence, and quoting from the paper, “Before the evaluation of the objective function, the latter method multiplies weight to each of the objectives based on the relative importance of the objectives. Then form a new objective function by summing the weighted objectives. Thus it converts multi objectives in to single objective.

  8. Avatar
    Eyal December 6, 2021 at 12:55 am #

    Suppose my objective function is relative – I can tell that x is less fit than y, but I cannot score them directly with a scalar – which adjustments will be required for this to work? Or is there another evolutionary algorithm I should try?

    • Adrian Tam
      Adrian Tam December 8, 2021 at 7:38 am #

      That’s difficult but I would try to ask myself, why I would tell x is less fit than y? If you can give a logic on this, you can come up with a scalar measure.

      • Avatar
        Eyal December 10, 2021 at 12:13 am #

        Suppose you are learning how to play chess, and you compare players by letting them play against each other for a few games. You can say “X won against Y in 8/10 games so X is better”, but giving them a scalar value is rather meaningless – especially because this ordering is not necessarily transitive(the ordering X > Y, Y > Z, Z > X is possible). You can have the entire population play against the entire population – in this scheme the fitness is, say, 1 point for victory and -1 for a defeat. The problem is that it requires population_size^2 * k games each generation.

        In real life tournaments, chess players are given ELO that represents an estimation for their likelihood to win. I experimented a little with that but since unfit genomes get eliminated, the average ELO keeps creeping up, and normalizing it rendered the whole scheme almost useless. I also suspect that the relatively small size of the simulated population also plays a part here. Perhaps that is possible, but it is so far from being trivial that I wonder if it’s worth it.

        This problem is also tricky because there is no measurable objective, so It’s hard to compare different experiments, and hard to evaluate your solution (except for local convergence, which might be a “red queen” problem).

        • Adrian Tam
          Adrian Tam December 10, 2021 at 4:23 am #

          Every model has some assumption. Should this assumption violated, you can’t expect the model to work. I think it is the situation here. If X>Y, Y>Z, Z>X, then why not I can claim Y>X? This hints that you may run into a problem under the assumptions.

          • Avatar
            Eyal December 10, 2021 at 5:43 am #

            If I understood you correctly – you assumed X > Y, which contradicts your claim that Y > X.

            Imagine a game of rock-paper-scissors.
            X always plays rock
            Y always plays paper
            Z always player scissors

            X > Z > Y > X

            Which goes to show that ordering this relation through a scalar will not hold.

            Note that rock-paper-scissors is an uninteresting game in which – without prior knowledge of the opponent – all strategies are equally fit, but the same situation can happen in all sort of games.

  9. Avatar
    Agrover112 March 8, 2022 at 4:44 pm #

    Hey Jason how can I adapt this for more than 2 X , like if my X has X1,X2…Xn?Ex: [1,2,3,4]

    That would be great!? If you could help me out on that.

    • Avatar
      James Carmichael March 9, 2022 at 5:53 am #

      Hi Agrover112…Please specify the portion of the coding you wish to modify and why.

Leave a Reply