When to Use MLP, CNN, and RNN Neural Networks

What neural network is appropriate for your predictive modeling problem?

It can be difficult for a beginner to the field of deep learning to know what type of network to use. There are so many types of networks to choose from and new methods being published and discussed every day.

To make things worse, most neural networks are flexible enough that they work (make a prediction) even when used with the wrong type of data or prediction problem.

In this post, you will discover the suggested use for the three main classes of artificial neural networks.

After reading this post, you will know:

  • Which types of neural networks to focus on when working on a predictive modeling problem.
  • When to use, not use, and possible try using an MLP, CNN, and RNN on a project.
  • To consider the use of hybrid models and to have a clear idea of your project goals before selecting a model.

Let’s get started.

When to Use MLP, CNN, and RNN Neural Networks

When to Use MLP, CNN, and RNN Neural Networks
Photo by PRODAVID S. FERRY III,DDS, some rights reserved.

Overview

This post is divided into five sections; they are:

  1. What Neural Networks to Focus on?
  2. When to Use Multilayer Perceptrons?
  3. When to Use Convolutional Neural Networks?
  4. When to Use Recurrent Neural Networks?
  5. Hybrid Network Models

What Neural Networks to Focus on?

Deep learning is the application of artificial neural networks using modern hardware.

It allows the development, training, and use of neural networks that are much larger (more layers) than was previously thought possible.

There are thousands of types of specific neural networks proposed by researchers as modifications or tweaks to existing models. Sometimes wholly new approaches.

As a practitioner, I recommend waiting until a model emerges as generally applicable. It is hard to tease out the signal of what works well generally from the noise of the vast number of publications released daily or weekly.

There are three classes of artificial neural networks that I recommend that you focus on in general. They are:

  • Multilayer Perceptrons (MLPs)
  • Convolutional Neural Networks (CNNs)
  • Recurrent Neural Networks (RNNs)

These three classes of networks provide a lot of flexibility and have proven themselves over decades to be useful and reliable in a wide range of problems. They also have many subtypes to help specialize them to the quirks of different framings of prediction problems and different datasets.

Now that we know what networks to focus on, let’s look at when we can use each class of neural network.

When to Use Multilayer Perceptrons?

Multilayer Perceptrons, or MLPs for short, are the classical type of neural network.

They are comprised of one or more layers of neurons. Data is fed to the input layer, there may be one or more hidden layers providing levels of abstraction, and predictions are made on the output layer, also called the visible layer.

For more details on the MLP, see the post:

Model of a Simple Network

Model of a Simple Network

MLPs are suitable for classification prediction problems where inputs are assigned a class or label.

They are also suitable for regression prediction problems where a real-valued quantity is predicted given a set of inputs. Data is often provided in a tabular format, such as you would see in a CSV file or a spreadsheet.

Use MLPs For:

  • Tabular datasets
  • Classification prediction problems
  • Regression prediction problems

They are very flexible and can be used generally to learn a mapping from inputs to outputs.

This flexibility allows them to be applied to other types of data. For example, the pixels of an image can be reduced down to one long row of data and fed into a MLP. The words of a document can also be reduced to one long row of data and fed to a MLP. Even the lag observations for a time series prediction problem can be reduced to a long row of data and fed to a MLP.

As such, if your data is in a form other than a tabular dataset, such as an image, document, or time series, I would recommend at least testing an MLP on your problem. The results can be used as a baseline point of comparison to confirm that other models that may appear better suited add value.

Try MLPs On:

  • Image data
  • Text Data
  • Time series data
  • Other types of data

When to Use Convolutional Neural Networks?

Convolutional Neural Networks, or CNNs, were designed to map image data to an output variable.

They have proven so effective that they are the go-to method for any type of prediction problem involving image data as an input.

For more details on CNNs, see the post:

The benefit of using CNNs is their ability to develop an internal representation of a two-dimensional image. This allows the model to learn position and scale in variant structures in the data, which is important when working with images.

Use CNNs For:

  • Image data
  • Classification prediction problems
  • Regression prediction problems

More generally, CNNs work well with data that has a spatial relationship.

The CNN input is traditionally two-dimensional, a field or matrix, but can also be changed to be one-dimensional, allowing it to develop an internal representation of a one-dimensional sequence.

This allows the CNN to be used more generally on other types of data that has a spatial relationship. For example, there is an order relationship between words in a document of text. There is an ordered relationship in the time steps of a time series.

Although not specifically developed for non-image data, CNNs achieve state-of-the-art results on problems such as document classification used in sentiment analysis and related problems.

Try CNNs On:

  • Text data
  • Time series data
  • Sequence input data

When to Use Recurrent Neural Networks?

Recurrent Neural Networks, or RNNs, were designed to work with sequence prediction problems.

Sequence prediction problems come in many forms and are best described by the types of inputs and outputs supported.

Some examples of sequence prediction problems include:

  • One-to-Many: An observation as input mapped to a sequence with multiple steps as an output.
  • Many-to-One: A sequence of multiple steps as input mapped to class or quantity prediction.
  • Many-to-Many: A sequence of multiple steps as input mapped to a sequence with multiple steps as output.

The Many-to-Many problem is often referred to as sequence-to-sequence, or seq2seq for short.

For more details on the types of sequence prediction problems, see the post:

Recurrent neural networks were traditionally difficult to train.

The Long Short-Term Memory, or LSTM, network is perhaps the most successful RNN because it overcomes the problems of training a recurrent network and in turn has been used on a wide range of applications.

For more details on RNNs, see the post:

RNNs in general and LSTMs in particular have received the most success when working with sequences of words and paragraphs, generally called natural language processing.

This includes both sequences of text and sequences of spoken language represented as a time series. They are also used as generative models that require a sequence output, not only with text, but on applications such as generating handwriting.

Use RNNs For:

  • Text data
  • Speech data
  • Classification prediction problems
  • Regression prediction problems
  • Generative models

Recurrent neural networks are not appropriate for tabular datasets as you would see in a CSV file or spreadsheet. They are also not appropriate for image data input.

Don’t Use RNNs For:

  • Tabular data
  • Image data

RNNs and LSTMs have been tested on time series forecasting problems, but the results have been poor, to say the least. Autoregression methods, even linear methods often perform much better. LSTMs are often outperformed by simple MLPs applied on the same data.

For more on this topic, see the post:

Nevertheless, it remains an active area.

Perhaps Try RNNs on:

  • Time series data

Hybrid Network Models

A CNN or RNN model is rarely used alone.

These types of networks are used as layers in a broader model that also has one or more MLP layers. Technically, these are a hybrid type of neural network architecture.

Perhaps the most interesting work comes from the mixing of the different types of networks together into hybrid models.

For example, consider a model that uses a stack of layers with a CNN on the input, LSTM in the middle, and MLP at the output. A model like this can read a sequence of image inputs, such as a video, and generate a prediction. This is called a CNN LSTM architecture.

The network types can also be stacked in specific architectures to unlock new capabilities, such as the reusable image recognition models that use very deep CNN and MLP networks that can be added to a new LSTM model and used for captioning photos. Also, the encoder-decoder LSTM networks that can be used to have input and output sequences of differing lengths.

It is important to think clearly about what you and your stakeholders require from the project first, then seek out a network architecture (or develop one) that meets your specific project needs.

For a good framework to help you think about your data and prediction problems, see the post:

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Summary

In this post, you discovered the suggested use for the three main classes of artificial neural networks.

Specifically, you learned:

  • Which types of neural networks to focus on when working on a predictive modeling problem.
  • When to use, not use, and possible try using an MLP, CNN, and RNN on a project.
  • To consider the use of hybrid models and to have a clear idea of your project goals before selecting a model.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Frustrated With Your Progress In Deep Learning?

Deep Learning with Python

 What If You Could Develop A Network in Minutes

…with just a few lines of Python

Discover how in my new Ebook: Deep Learning With Python

It covers self-study tutorials and end-to-end projects on topics like:
Multilayer PerceptronsConvolutional Nets and Recurrent Neural Nets, and more…

Finally Bring Deep Learning To
Your Own Projects

Skip the Academics. Just Results.

Click to learn more.

28 Responses to When to Use MLP, CNN, and RNN Neural Networks

  1. John W July 25, 2018 at 5:31 am #

    Very nice article on neural networks.

    I love to work on data using neural networks. The human brain is clearly the baseline for many computer programs and artificial intelligence approaches. Artificial neural networks algorithm are focused on replicating the thought and reasoning patterns of the human brain which makes it an intriguing algorithm to use.

  2. Alan Preciado July 27, 2018 at 7:32 am #

    Hi Dr. Brownlee,

    Firstly, thanks for all your posts, they’ve been a useful reference for me since I began getting involved with ML problems about a year ago.

    Right now I’m working with the problem of audio classification using conventional and neural network approaches. I’m actually using the three NNs you mention above. The idea of a hybrid model fascinates me but 1. I don’t know if it can work properly for my audio problem and 2. I don’t have any experience designing these hybrid models.

    I appreciate any advice on this!

  3. Koura July 27, 2018 at 6:32 pm #

    Thanks jason it is useful

  4. Omachi July 27, 2018 at 11:06 pm #

    Thank you Dr. Jason.
    Your tutorials have given me an inroad to ML and Data Mining.
    Thank you

  5. Richard S Zipper July 29, 2018 at 11:13 am #

    As my first application of DL I was given 480 sequences of ssDNA and an indication if it cystalizes or not. My goal was to predict cystalization given a sequences.\

    First I embedded each sequence (generally 5 – 28 in length) into a blank sequence 40 characters in length … \this allowed me to
    1. make the lengths uniform and
    2. repeat ssDNA sequences into different positions in the 40 charcter blank … to increase my dataset size. from 480 to 6400

    Next I translated A = 10001 T = 010000 C = 00100 G = 00011 Blank = -1, -1, -1, -1, -1

    After much testing and reading I created 3 models
     Model 01: Multi Layered 1D Convolutional Networks + Multilayer Perceptron
     Model 02: Time Distributed 1D Convolutional Layers + LSTM + Multilayer Perceptron
     Model 03: Stacked LSTM – 1 Perceptron

    The final prediction is arrived as follows – if 2 of the 3 models predict crystalization – predict crystalization.
    The accuracy will be tested in the lab over the next couple of months.

    I have two questions:
    Why not use the combined scores of several models.?
    Do you have any suggestions or pointers on my first real project ?

    Thanks again for being a guiding light for the ML sommunity

  6. vishnu priya July 30, 2018 at 3:30 am #

    Thank you Dr.Jason
    These all tutorials related to neural networks is very good and are useful to learn basics of neural network in easy way.

    sir can you please give explanation on back propagation algorithm,
    how the filter weights are randomly selected. is there any possibility to change that weights of filter.

  7. anvin ps July 31, 2018 at 8:55 pm #

    sir, which is the best neural network to predict a lottery number/ ( not really a random number because the some numbers are repeating many times within a month)

  8. Abdullahi Mohammad August 8, 2018 at 10:49 pm #

    Hi Jason, I’m one of your online fans and students. I have been very constant in following your blogs, and they are pretty great. Please, I was wondering if you could help me with the idea of how to training two models that have the same network structure with the weights of one model initialized by the learned weights of the other. However, during the training of the second model, the layers of the first model are made fixed. I really need your help as it’s part of my final year project. Thank you.

    • Jason Brownlee August 9, 2018 at 7:41 am #

      You can save the weights to file, then load the weights into the new model.

  9. Geeta September 21, 2018 at 11:43 am #

    Hi,

    I have one use where I need to do log mining and classify logs and also predict if the classified logs can produce some undesirable behavior to the system. Given the experience with system I know what logs can produce what kind of cascading effects

    Since it is a combination of classification and prediction together , I am not able to get what algorithms to be applied on this.
    Could you please help .

    Thanks,
    Geeta

  10. Tim September 21, 2018 at 11:29 pm #

    Thank you Jason, I just started to learn ML and there are so many concepts and which confusing me is really hard to figure out when should I use the different ML algorithms to handle my problems. I’m going to try to make a prediction system for PM2.5 and PM10 in several cities, and I want to know what kind of ML algorithms probably situs for the system to make the prediction when I choose observed PM concentration and other weather info like wind speed, wind direction, temperature, and humidity. so could you give me some advice? thank you very much

  11. Seungman Kang October 19, 2018 at 11:31 am #

    Hi! Jason. Appreciate your work.
    Could I translate and post it into Korean on scimonitors.com with source? It’ll be grateful for us to understand the subject.

    • Jason Brownlee October 19, 2018 at 2:50 pm #

      Please do not translate and repost my material.

  12. Romain November 14, 2018 at 2:37 am #

    Hello Jason,

    Just a message to thank you for your site, I really appreciate all the materials as a total beginner and in my opinion that’s really well written.

    Cheers from France.

  13. Maryam MV November 14, 2018 at 2:48 am #

    Hey, hope you’re doing great. Is there anyone here using neural networks as function estimator in Reinforcement learning? if yes, Some questions to which I’d like to know the answer have occupied my mind, my questions are as follows:

    1) Do you use an ANN for each action to estimate the value of state S, i.e., one ANN is used to calculate Q(s,a1) , another ANN for Q(s,a2) and so forth? Or a single ANN has been exploited to calculate the Q-value of current state with respect to all actions?

    2) Are there any other useful resources that may assist me to fully understand this?
    thanks in advance,
    Maryam

  14. soha December 3, 2018 at 6:52 pm #

    hi , can i use RNN for symbol by symbol detection ?

Leave a Reply