**Stochastic optimization** refers to the use of randomness in the objective function or in the optimization algorithm.

Challenging optimization algorithms, such as high-dimensional nonlinear objective problems, may contain multiple local optima in which deterministic optimization algorithms may get stuck.

Stochastic optimization algorithms provide an alternative approach that permits less optimal local decisions to be made within the search procedure that may increase the probability of the procedure locating the global optima of the objective function.

In this tutorial, you will discover a gentle introduction to stochastic optimization.

After completing this tutorial, you will know:

- Stochastic optimization algorithms make use of randomness as part of the search procedure.
- Examples of stochastic optimization algorithms like simulated annealing and genetic algorithms.
- Practical considerations when using stochastic optimization algorithms such as repeated evaluations.

Let’s get started.

## Tutorial Overview

This tutorial is divided into three parts; they are:

- What Is Stochastic Optimization?
- Stochastic Optimization Algorithms
- Practical Considerations for Stochastic Optimization

## What Is Stochastic Optimization?

Optimization refers to optimization algorithms that seek the inputs to a function that result in the minimum or maximum of an objective function.

Stochastic optimization or stochastic search refers to an optimization task that involves randomness in some way, such as either from the objective function or in the optimization algorithm.

Stochastic search and optimization pertains to problems where there is randomness noise in the measurements provided to the algorithm and/or there is injected (Monte Carlo) randomness in the algorithm itself.

— Page xiii, Introduction to Stochastic Search and Optimization, 2003.

Randomness in the objective function means that the evaluation of candidate solutions involves some uncertainty or noise and algorithms must be chosen that can make progress in the search in the presence of this noise.

Randomness in the algorithm is used as a strategy, e.g. stochastic or probabilistic decisions. It is used as an alternative to deterministic decisions in an effort to improve the likelihood of locating the global optima or a better local optima.

Standard stochastic optimization methods are brittle, sensitive to stepsize choice and other algorithmic parameters, and they exhibit instability outside of well-behaved families of objectives.

— The Importance Of Better Models In Stochastic Optimization, 2019.

It is more common to refer to an algorithm that uses randomness than an objective function that contains noisy evaluations when discussing stochastic optimization. This is because randomness in the objective function can be addressed by using randomness in the optimization algorithm. As such, stochastic optimization may be referred to as “*robust optimization*.”

A deterministic algorithm may be misled (e.g. “*deceived*” or “*confused*“) by the noisy evaluation of candidate solutions or noisy function gradients, causing the algorithm to bounce around or get stuck (e.g. fail to converge).

Methods for stochastic optimization provide a means of coping with inherent system noise and coping with models or systems that are highly nonlinear, high dimensional, or otherwise inappropriate for classical deterministic methods of optimization.

— Stochastic Optimization, 2011.

Using randomness in an optimization algorithm allows the search procedure to perform well on challenging optimization problems that may have a nonlinear response surface. This is achieved by the algorithm taking locally suboptimal steps or moves in the search space that allow it to escape local optima.

Randomness can help escape local optima and increase the chances of finding a global optimum.

— Page 8, Algorithms for Optimization, 2019.

The randomness used in a stochastic optimization algorithm does not have to be true randomness; instead, pseudorandom is sufficient. A pseudorandom number generator is almost universally used in stochastic optimization.

Use of randomness in a stochastic optimization algorithm does not mean that the algorithm is random. Instead, it means that some decisions made during the search procedure involve some portion of randomness. For example, we can conceptualize this as the move from the current to the next point in the search space made by the algorithm may be made according to a probability distribution relative to the optimal move.

Now that we have an idea of what stochastic optimization is, let’s look at some examples of stochastic optimization algorithms.

## Stochastic Optimization Algorithms

The use of randomness in the algorithms often means that the techniques are referred to as “heuristic search” as they use a rough rule-of-thumb procedure that may or may not work to find the optima instead of a precise procedure.

Many stochastic algorithms are inspired by a biological or natural process and may be referred to as “metaheuristics” as a higher-order procedure providing the conditions for a specific search of the objective function. They are also referred to as “*black box*” optimization algorithms.

Metaheuristics is a rather unfortunate term often used to describe a major subfield, indeed the primary subfield, of stochastic optimization.

— Page 7, Essentials of Metaheuristics, 2011.

There are many stochastic optimization algorithms.

Some examples of stochastic optimization algorithms include:

- Iterated Local Search
- Stochastic Hill Climbing
- Stochastic Gradient Descent
- Tabu Search
- Greedy Randomized Adaptive Search Procedure

Some examples of stochastic optimization algorithms that are inspired by biological or physical processes include:

- Simulated Annealing
- Evolution Strategies
- Genetic Algorithm
- Differential Evolution
- Particle Swarm Optimization

Now that we are familiar with some examples of stochastic optimization algorithms, let’s look at some practical considerations when using them.

## Practical Considerations for Stochastic Optimization

There are important considerations when using stochastic optimization algorithms.

The stochastic nature of the procedure means that any single run of an algorithm will be different, given a different source of randomness used by the algorithm and, in turn, different starting points for the search and decisions made during the search.

The pseudorandom number generator used as the source of randomness can be seeded to ensure the same sequence of random numbers is provided each run of the algorithm. This is good for small demonstrations and tutorials, although it is fragile as it is working against the inherent randomness of the algorithm.

Instead, a given algorithm can be executed many times to control for the randomness of the procedure.

This idea of multiple runs of the algorithm can be used in two key situations:

- Comparing Algorithms
- Evaluating Final Result

Algorithms may be compared based on the relative quality of the result found, the number of function evaluations performed, or some combination or derivation of these considerations. The result of any one run will depend upon the randomness used by the algorithm and alone cannot meaningfully represent the capability of the algorithm. Instead, a strategy of repeated evaluation should be used.

Any comparison between stochastic optimization algorithms will require the repeated evaluation of each algorithm with a different source of randomness and the summarization of the probability distribution of best results found, such as the mean and standard deviation of objective values. The mean result from each algorithm can then be compared.

In cases where multiple local minima are likely to exist, it can be beneficial to incorporate random restarts after our terminiation conditions are met where we restart our local descent method from randomly selected initial points.

— Page 66, Algorithms for Optimization, 2019.

Similarly, any single run of a chosen optimization algorithm alone does not meaningfully represent the global optima of the objective function. Instead, a strategy of repeated evaluation should be used to develop a distribution of optimal solutions.

The maximum or minimum of the distribution can be taken as the final solution, and the distribution itself will provide a point of reference and confidence that the solution found is “*relatively good*” or “*good enough*” given the resources expended.

**Multi-Restart**: An approach for improving the likelihood of locating the global optima via the repeated application of a stochastic optimization algorithm to an optimization problem.

The repeated application of a stochastic optimization algorithm on an objective function is sometimes referred to as a multi-restart strategy and may be built in to the optimization algorithm itself or prescribed more generally as a procedure around the chosen stochastic optimization algorithm.

Each time you do a random restart, the hill-climber then winds up in some (possibly new) local optimum.

— Page 26, Essentials of Metaheuristics, 2011.

## Further Reading

This section provides more resources on the topic if you are looking to go deeper.

### Related Tutorials

### Papers

### Books

- Algorithms for Optimization, 2019.
- Introduction to Stochastic Search and Optimization, 2003.
- Essentials of Metaheuristics, 2011.

### Articles

- Stochastic optimization, Wikipedia.
- Heuristic (computer science), Wikipedia.
- Metaheuristic, Wikipedia.

## Summary

In this tutorial, you discovered a gentle introduction to stochastic optimization.

Specifically, you learned:

- Stochastic optimization algorithms make use of randomness as part of the search procedure.
- Examples of stochastic optimization algorithms like simulated annealing and genetic algorithms.
- Practical considerations when using stochastic optimization algorithms such as repeated evaluations.

**Do you have any questions?**

Ask your questions in the comments below and I will do my best to answer.

Do you have a book yet. If so, let mlm.com help me find it easier.

Thanks,

Not yet.

Hi, thank you for this intro, very interesting subject! Would you happen to have a few examples of real life application of stochastic optimization? (Besides it’s use in ML algorithms) What (problems) would benefit from integrating randomness into an objective function?

You’re welcome.

Any optimization problems that may have a noisy objective function or a non-differentiable objective function.

Hi Jason,

I am not really able to follow this particular content. What maths background should Iook for learning so that i am up to it? Thanks!

Sorry to hear that.

No maths background really, just a developer background.

This may help:

https://machinelearningmastery.com/basics-mathematical-notation-machine-learning/