How to Configure the Number of Layers and Nodes in a Neural Network

Artificial neural networks have two main hyperparameters that control the architecture or topology of the network: the number of layers and the number of nodes in each hidden layer.

You must specify values for these parameters when configuring your network.

The most reliable way to configure these hyperparameters for your specific predictive modeling problem is via systematic experimentation with a robust test harness.

This can be a tough pill to swallow for beginners to the field of machine learning, looking for an analytical way to calculate the optimal number of layers and nodes, or easy rules of thumb to follow.

In this post, you will discover the roles of layers and nodes and how to approach the configuration of a multilayer perceptron neural network for your predictive modeling problem.

After reading this post, you will know:

  • The difference between single-layer and multiple-layer perceptron networks.
  • The value of having one and more than one hidden layers in a network.
  • Five approaches for configuring the number of layers and nodes in a network.

Kick-start your project with my new book Better Deep Learning, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

How to Configure the Number of Layers and Nodes in a Neural Network

How to Configure the Number of Layers and Nodes in a Neural Network
Photo by Ryan, some rights reserved.

Overview

This post is divided into four sections; they are:

  1. The Multilayer Perceptron
  2. How to Count Layers?
  3. Why Have Multiple Layers?
  4. How Many Layers and Nodes to Use?

The Multilayer Perceptron

A node, also called a neuron or Perceptron, is a computational unit that has one or more weighted input connections, a transfer function that combines the inputs in some way, and an output connection.

Nodes are then organized into layers to comprise a network.

A single-layer artificial neural network, also called a single-layer, has a single layer of nodes, as its name suggests. Each node in the single layer connects directly to an input variable and contributes to an output variable.

Single-layer networks have just one layer of active units. Inputs connect directly to the outputs through a single layer of weights. The outputs do not interact, so a network with N outputs can be treated as N separate single-output networks.

— Page 15, Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks, 1999.

A single-layer network can be extended to a multiple-layer network, referred to as a Multilayer Perceptron. A Multilayer Perceptron, or MLP for short, is an artificial neural network with more than a single layer.

It has an input layer that connects to the input variables, one or more hidden layers, and an output layer that produces the output variables.

The standard multilayer perceptron (MLP) is a cascade of single-layer perceptrons. There is a layer of input nodes, a layer of output nodes, and one or more intermediate layers. The interior layers are sometimes called “hidden layers” because they are not directly observable from the systems inputs and outputs.

— Page 31, Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks, 1999.

We can summarize the types of layers in an MLP as follows:

  • Input Layer: Input variables, sometimes called the visible layer.
  • Hidden Layers: Layers of nodes between the input and output layers. There may be one or more of these layers.
  • Output Layer: A layer of nodes that produce the output variables.

Finally, there are terms used to describe the shape and capability of a neural network; for example:

  • Size: The number of nodes in the model.
  • Width: The number of nodes in a specific layer.
  • Depth: The number of layers in a neural network.
  • Capacity: The type or structure of functions that can be learned by a network configuration. Sometimes called “representational capacity“.
  • Architecture: The specific arrangement of the layers and nodes in the network.

How to Count Layers?

Traditionally, there is some disagreement about how to count the number of layers.

The disagreement centers around whether or not the input layer is counted. There is an argument to suggest it should not be counted because the inputs are not active; they are simply the input variables. We will use this convention; this is also the convention recommended in the book “Neural Smithing“.

Therefore, an MLP that has an input layer, one hidden layer, and one output layer is a 2-layer MLP.

The structure of an MLP can be summarized using a simple notation.

This convenient notation summarizes both the number of layers and the number of nodes in each layer. The number of nodes in each layer is specified as an integer, in order from the input layer to the output layer, with the size of each layer separated by a forward-slash character (“/”).

For example, a network with two variables in the input layer, one hidden layer with eight nodes, and an output layer with one node would be described using the notation: 2/8/1.

I recommend using this notation when describing the layers and their size for a Multilayer Perceptron neural network.

Why Have Multiple Layers?

Before we look at how many layers to specify, it is important to think about why we would want to have multiple layers.

A single-layer neural network can only be used to represent linearly separable functions. This means very simple problems where, say, the two classes in a classification problem can be neatly separated by a line. If your problem is relatively simple, perhaps a single layer network would be sufficient.

Most problems that we are interested in solving are not linearly separable.

A Multilayer Perceptron can be used to represent convex regions. This means that in effect, they can learn to draw shapes around examples in some high-dimensional space that can separate and classify them, overcoming the limitation of linear separability.

In fact, there is a theoretical finding by Lippmann in the 1987 paper “An introduction to computing with neural nets” that shows that an MLP with two hidden layers is sufficient for creating classification regions of any desired shape. This is instructive, although it should be noted that no indication of how many nodes to use in each layer or how to learn the weights is given.

A further theoretical finding and proof has shown that MLPs are universal approximators. That with one hidden layer, an MLP can approximate any function that we require.

Specifically, the universal approximation theorem states that a feedforward network with a linear output layer and at least one hidden layer with any “squashing” activation function (such as the logistic sigmoid activation function) can approximate any Borel measurable function from one finite-dimensional space to another with any desired non-zero amount of error, provided that the network is given enough hidden units.

— Page 198, Deep Learning, 2016.

This is an often-cited theoretical finding and there is a ton of literature on it. In practice, we again have no idea how many nodes to use in the single hidden layer for a given problem nor how to learn or set their weights effectively. Further, many counterexamples have been presented of functions that cannot directly be learned via a single one-hidden-layer MLP or require an infinite number of nodes.

Even for those functions that can be learned via a sufficiently large one-hidden-layer MLP, it can be more efficient to learn it with two (or more) hidden layers.

Since a single sufficiently large hidden layer is adequate for approximation of most functions, why would anyone ever use more? One reason hangs on the words “sufficiently large”. Although a single hidden layer is optimal for some functions, there are others for which a single-hidden-layer-solution is very inefficient compared to solutions with more layers.

— Page 38, Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks, 1999.

How Many Layers and Nodes to Use?

With the preamble of MLPs out of the way, let’s get down to your real question.

How many layers should you use in your Multilayer Perceptron and how many nodes per layer?

In this section, we will enumerate five approaches to solving this problem.

1) Experimentation

In general, when I’m asked how many layers and nodes to use for an MLP, I often reply:

I don’t know. Use systematic experimentation to discover what works best for your specific dataset.

I still stand by this answer.

In general, you cannot analytically calculate the number of layers or the number of nodes to use per layer in an artificial neural network to address a specific real-world predictive modeling problem.

The number of layers and the number of nodes in each layer are model hyperparameters that you must specify.

You are likely to be the first person to attempt to address your specific problem with a neural network. No one has solved it before you. Therefore, no one can tell you the answer of how to configure the network.

You must discover the answer using a robust test harness and controlled experiments. For example, see the post:

Regardless of the heuristics you might encounter, all answers will come back to the need for careful experimentation to see what works best for your specific dataset.

2) Intuition

The network can be configured via intuition.

For example, you may have an intuition that a deep network is required to address a specific predictive modeling problem.

A deep model provides a hierarchy of layers that build up increasing levels of abstraction from the space of the input variables to the output variables.

Given an understanding of the problem domain, we may believe that a deep hierarchical model is required to sufficiently solve the prediction problem. In which case, we may choose a network configuration that has many layers of depth.

Choosing a deep model encodes a very general belief that the function we want to learn should involve composition of several simpler functions. This can be interpreted from a representation learning point of view as saying that we believe the learning problem consists of discovering a set of underlying factors of variation that can in turn be described in terms of other, simpler underlying factors of variation.

— Page 201, Deep Learning, 2016.

This intuition can come from experience with the domain, experience with modeling problems with neural networks, or some mixture of the two.

In my experience, intuitions are often invalidated via experiments.

3) Go For Depth

In their important textbook on deep learning, Goodfellow, Bengio, and Courville highlight that empirically, on problems of interest, deep neural networks appear to perform better.

Specifically, they state the choice of using deep neural networks as a statistical argument in cases where depth may be intuitively beneficial.

Empirically, greater depth does seem to result in better generalization for a wide variety of tasks. […] This suggests that using deep architectures does indeed express a useful prior over the space of functions the model learns.

— Page 201, Deep Learning, 2016.

We may use this argument to suggest that using deep networks, those with many layers, may be a heuristic approach to configuring networks for challenging predictive modeling problems.

This is similar to the advice for starting with Random Forest and Stochastic Gradient Boosting on a predictive modeling problem with tabular data to quickly get an idea of an upper-bound on model skill prior to testing other methods.

4) Borrow Ideas

A simple, but perhaps time consuming approach, is to leverage findings reported in the literature.

Find research papers that describe the use of MLPs on instances of prediction problems similar in some way to your problem. Note the configuration of the networks used in those papers and use them as a starting point for the configurations to test on your problem.

Transferability of model hyperparameters that result in skillful models from one problem to another is a challenging open problem and the reason why model hyperparameter configuration is more art than science.

Nevertheless, the network layers and number of nodes used on related problems is a good starting point for testing ideas.

5) Search

Design an automated search to test different network configurations.

You can seed the search with ideas from literature and intuition.

Some popular search strategies include:

  • Random: Try random configurations of layers and nodes per layer.
  • Grid: Try a systematic search across the number of layers and nodes per layer.
  • Heuristic: Try a directed search across configurations such as a genetic algorithm or Bayesian optimization.
  • Exhaustive: Try all combinations of layers and the number of nodes; it might be feasible for small networks and datasets.

This can be challenging with large models, large datasets and combinations of the two. Some ideas to reduce or manage the computational burden include:

  • Fit models on a smaller subset of the training dataset to speed up the search.
  • Aggressively bound the size of the search space.
  • Parallelize the search across multiple server instances (e.g. use Amazon EC2 service).

I recommend being systematic if time and resources permit.

More

I have seen countless heuristics of how to estimate the number of layers and either the total number of neurons or the number of neurons per layer.

I do not want to enumerate them; I’m skeptical that they add practical value beyond the special cases on which they are demonstrated.

If this area is interesting to you, perhaps start with “Section 4.4 Capacity versus Size” in the book “Neural Smithing“. It summarizes a ton of findings in this area. The book is dated from 1999, so there are another nearly 20 years of ideas to wade through in this area if you’re up for it.

Also, see some of the discussions linked in the Further Reading section (below).

Did I miss your favorite method for configuring a neural network? Or do you know a good reference on the topic?
Let me know in the comments below.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Papers

Books

Articles

Discussions

Summary

In this post, you discovered the role of layers and nodes and how to configure a multilayer perceptron neural network.

Specifically, you learned:

  • The difference between single-layer and multiple-layer perceptron networks.
  • The value of having one and more than one hidden layers in a network.
  • Five approaches for configuring the number of layers and nodes in a network.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Develop Better Deep Learning Models Today!

Better Deep Learning

Train Faster, Reduce Overftting, and Ensembles

...with just a few lines of python code

Discover how in my new Ebook:
Better Deep Learning

It provides self-study tutorials on topics like:
weight decay, batch normalization, dropout, model stacking and much more...

Bring better deep learning to your projects!

Skip the Academics. Just Results.

See What's Inside

76 Responses to How to Configure the Number of Layers and Nodes in a Neural Network

  1. Marius Lindauer July 27, 2018 at 6:45 pm #

    Thanks for the blog post.
    There is indeed a large number of recent research going to answer this question automatically, dubbed (neural) architecture search. Here a list of papers which I maintain:
    https://www.automl.org/automl/literature-on-neural-architecture-search/

    • Jason Brownlee July 28, 2018 at 6:32 am #

      Thanks.

    • James Washington April 26, 2023 at 10:47 am #

      Hello Mr. Lindauer,

      In regards to the concept on “counting the input layer or not”. How about this as a thought:

      ” I would have thought the process of “Application initialization” and
      connecting data sources” would be considered the “first process”, so
      why wouldn’t the initialization of the model not be considered?
      Is it because the input layer isn’t active consistently? Or the input layer
      doesn’t change?

      One of the things I’ve seen a lot of in researching Neural Networks is that they always
      discribe the first layer as the “input layer”, which sort of makes MLP2 a little mis-leading?

      Trying to figure out the pro’s and con’s!
      Not looking to take you back through the argument process again! Sorry

  2. Salim July 29, 2018 at 7:56 am #

    Great! Thanks for the blog post 🙂 There is also an interesting post here which tries to address the same question.

    https://towardsdatascience.com/beginners-ask-how-many-hidden-layers-neurons-to-use-in-artificial-neural-networks-51466afa0d3e

  3. Aditi Machine July 29, 2018 at 8:26 pm #

    This is grate blog and nice information

  4. Adam November 6, 2018 at 11:01 am #

    Hi, Very nice summary. Thank you very much! I’m a deep learning researcher working in an inter-disciplinary team in Univ Edi. May I ask about the template you used to create this site? It looks quite professional and great!

  5. Hooman February 16, 2019 at 9:38 pm #

    Thank you for such a great post, but I have a question;
    Let’s say our image size is 64*64*3 so what would be the number of nodes in our input layer?

    • Jason Brownlee February 17, 2019 at 6:32 am #

      I would recommend using a CNN and perhaps try 32 filters?

      • Hooman February 18, 2019 at 3:48 am #

        Sorry for ambiguity in my question….Suppose I’m using a CNN and I have a picture of size 64*64*3 and my question is what would be the number of nodes in my input layer?

        • Jason Brownlee February 18, 2019 at 6:31 am #

          It would be input_shape=(64,64,3) if using channels last format.

          • Hooman February 18, 2019 at 4:22 pm #

            Thanks

      • Kamal July 22, 2019 at 1:42 am #

        Will the number of neuron in hidden layer mention here work for RNN/LSTM as well?

        • Jason Brownlee July 22, 2019 at 8:27 am #

          Perhaps test and compare results with different configurations?

  6. William Armstrong February 25, 2019 at 8:52 am #

    Hi Jason,
    There is practically no way to know ahead of time how many layers or nodes you will need for a certain neural network learning task. I have a solution: a neural network, called ALNfitDeep, which can automatically *grow* during training to fit the problem. Software to do this is at https://github.com/Bill-Armstrong/ . There is a new executable release available, which you can get by clicking where it says “4 releases” near the top of the main page. You can forget the source code for now. Use the Help button. Since the release is new, I would appreciate any feedback on problems you encounter. From my point of view, neural nets which can’t learn automatically based on the problem are a total waste of time. Also having to use a lot of valuable data for validation is a waste. My nets measure the noise variance and then train on all of the data not used in testing. My nets can grow to tens of thousands of nodes, yet the execution of the learned function remains very fast (because very little has to actually be computed for a given input). The secret is that all computation of linear functions is in the first layer. Instead of a one-input squashing function, there are two-input non-linearities: max, and min. People have to have the courage to try it. I will help.

    • Jason Brownlee February 25, 2019 at 2:11 pm #

      Thanks for the note.

      I have played with “growing” and “pruning” nets since the late 1990s, I remain skeptical.

      A sensitivity analysis of model capacity vs skill is reliable and repeatable for me.

    • AndoStat January 31, 2023 at 9:47 am #

      Totally agree with statement “Waste of time” ! Creation NN’s no big deal, but creating actually useful one, almost “mission impossible”. U have to have big employment “horde” doing one task only – finding the right shape of NN in testing it against the dataset manually. So underdeveloped those software solutions are.

  7. John March 30, 2019 at 12:04 pm #

    Hi Jason,

    I have 2 questions.

    1. What if I want to predict financial time series (e.g. Forex, stock price) and I have decided to use mlp with 2 hidden layers. I have also decided to use 4 neurons for each of the hidden layers. How exactly do I split up my data set including the input. I’m assuming that the 4 neurons will be Open, Close, High and Low Values.

    Will my input values be majority of my dataset in total? and then for the first hidden layer, a subset of the dataset including, so some of the ‘High values for one of the neurons’, some of the low values for the second neuron and so on, and then the same in the other hidden layer?

    2. Do you have a python script example for iterating through different example layers.

    Thank you very much!

  8. james April 29, 2019 at 9:31 pm #

    Hi, May I say that layers and nodes are relevant to how many training inputs numbers?
    If more inputs, more nodes?

    Thanks

    • Jason Brownlee April 30, 2019 at 6:55 am #

      Not really. They are unrelated.

      • Mohamad Jaber July 12, 2021 at 9:46 pm #

        Hi Jason,

        Thanks again for your wonderful tutorials.

        Rephrasing the question by “James”: Does having more training samples requires an addition to the number of neurons?
        I have designed a neural network model based on the “input shapes” which is also advised by many empirical rules of thumb. My question is, if I have datasets of 100 or 100k samples (each is representative enough), shall I leave the model shapes fixed? or grow it (perhaps linearly) with the increase of the samples?

        Because I’ve noticed some complexities arises in the dataset as it grows.

        Thanks again.

        • Jason Brownlee July 13, 2021 at 5:18 am #

          It may or it may not. We cannot know for sure for a given dataset and model combination.

    • SHABBEER BASHA June 18, 2019 at 11:01 am #

      There is an interrelation between the number of layers (nodes per layer). Please have a look into our paper https://arxiv.org/abs/1902.02771. Thank you.

  9. Saad May 2, 2019 at 8:30 am #

    Thank you, Jason, for the post.
    Is there a rule of thumb for the number of units when you want to increase the number of hidden layers? Let’s say for example that your model has a decent performance for 1 hidden layer and 30 units, would choosing 2 hidden layers means you would decrease the number of units for each of these layers or you can even increase it?

    • Jason Brownlee May 2, 2019 at 2:02 pm #

      Not really, sorry.

      Test and use a robust test harness so that the results are reliable.

  10. carlos July 8, 2019 at 8:07 pm #

    lets say we want to differentiate between clear and blur images, can CNN train a model to do that and how do you go about it

    • Jason Brownlee July 9, 2019 at 8:08 am #

      Yes, perhaps a classification problem with a binary prediction (blur vs no-blur).

  11. Daniel J. Dick July 27, 2019 at 6:49 pm #

    MLP? Or MLFFN? Is there a way to use the simpler perceptron update algorithm without using derivative or backprop or without separating the layers with a non-linearity activation without having the whole thing collapse into the equivslent of a single linear layer as Minsky pointed out way back?

    • Jason Brownlee July 28, 2019 at 6:42 am #

      There may be, I don’t have material on it, sorry.

      We moved away from simple Perceptron because backprop on a MLP works really well in general.

  12. erfan basiri October 14, 2019 at 12:22 am #

    Hi. jason. i have a network with 4 layers (4 hidden layers) which each layer has 32 nodes. i use adam optimizers and leaky relu. i want to know what is the name of my network? is it simple MLP ? can i name that as a deep network ?

  13. Bayangmbe October 23, 2019 at 9:45 am #

    Hello to you, Jason,
    I have a forecasting project using machine learning to predict agricultural crops.
    I need to build an algorithm to predict agricultural crops based on field size, local climate, season and soil chemical components (such as mineral salts, phosphorus ions, potassium and nitrates, moisture and gases in the air) at the input. And the output will be a list of optimized crops generated. Which method will I use? Any link to guide me will be useful to me.

    • Jason Brownlee October 23, 2019 at 1:47 pm #

      That sounds like a fun project!

      I recommend following this process as a first step:
      https://machinelearningmastery.com/start-here/#process

      • Bayangmbe October 23, 2019 at 4:23 pm #

        Thank so much!
        I will give you a feedback.
        I will make this prototype to present it at panama city in competition. Am selected as a finalist.
        I wouldn’t want us to see this project as a fun project! If you can redirect it better to make it look interesting, it would be great!

  14. Kamil November 20, 2019 at 8:36 am #

    “A single-layer neural network can only be used to represent linearly separable functions.” I think this statement is wrong. I understand that “A feedforward network with a single layer is sufficient to represent any function, but the layer may be infeasibly large and may fail to learn and generalize correctly.”, do you mean single-layer with only one single neuron ?

    • Jason Brownlee November 20, 2019 at 1:50 pm #

      No, a network with a single layer of nodes.

      • Kamil November 24, 2019 at 2:20 am #

        oh, yes,I checked it and everything is correct, I should add “A feedforward network with a single **hidden** layer” which would be universal approximation theorem. But a single-layer neural network has no hidden layers at all, then it can’t make anything more than linear separation, the simples example would be it can’t compute xor.

  15. Habib Kedir March 21, 2020 at 12:16 am #

    Hi Jason, am working on neural machine machine translation, one of my examiner asks me how many input layer, hidden layer and output layer In my experiment. Nothing to answer. the parallel sentence used is 7050 length of long sentence in input 25 and output 20.
    I used 100 dimension. would you help me? vocabulary(unique words in input) is 12700.

    • Jason Brownlee March 21, 2020 at 8:26 am #

      Perhaps test different configurations and discover what works best for your model and dataset?

  16. Mike Janson April 8, 2020 at 2:58 am #

    The reference to Deep Learning and the universal approximation theorem is incorrect–while the above reference states “p.198”, it’s actually on p.192 of the 2016 edition.

  17. Robin Scott April 14, 2020 at 1:25 pm #

    Hi Jason,

    Thanks for the info regarding hidden layer structure selection.

    I wanted to ask if you were familiar with using metaheuristics to train a network and whether different training strategies need differing model structures. For example, if you were using the Iris data set with 5 hidden neurons (one layer) when training with backpropagation, do you think it would be appropriate to use the same number of hidden layers and neurons if you were to train using PSO or SA?

    In other words, does the training technique influence the number of hidden neurons or layers?

    Cheers,
    Rob

    • Jason Brownlee April 14, 2020 at 1:38 pm #

      Great question!

      Backprop remains the most efficient training algorithm, regardless of choice of architecture.

      Metaheuristics could be useful in finding the architecture to train though. I have seen many automl and NAS (network architecture search) algorithms that use an evolutionary algorithm at their core.

      • Robin Scott April 14, 2020 at 2:17 pm #

        It’s funny because there’s a lot of resources on using metaheuristics to find optimal network hyperparameters or hidden structure, not a whole lot on training. Part of my current interest is in that area and I can see why BP is generally preferred for training. I’ve implemented a GA trained NN in lieu of BP and while it seems to converge nicely, it sure is slow (I’m talking 50x slower for equivalent networks). I’ve still yet to implement a PSO-NN but it’s still interesting to think about. A bioinspired network trained by a bioinspired metaheuristic has a nice ring to it.

  18. Some Dude July 5, 2020 at 9:09 pm #

    here’s my question: if we have a summation function that takes the sum of the weighted inputs and forwards it to the activation function, how do we count the layers? ex: a single layer precepetron with 2 inputs and 2 weights, and the question specifically mentions that we have a summation function and an activation function. do we count both summation and activation as 1 layer or 2 layers?

  19. Usama July 5, 2020 at 10:35 pm #

    construct a dataset having 4 inputs against two input variables. you also have to assume target output against each input.

    2. construct a topology for neural network having atleast 5 neurons (number of hidden layers and number of neurons in each layer will be of your own choice)

    3. assume initial weights of your own choice and run a complete iteration (for all four inputs)

    Iahve to submit this assignment plz anyone can help??

  20. mohamed August 10, 2020 at 7:43 am #

    How many minimume number of layers in deep laerning

    • Jason Brownlee August 10, 2020 at 11:03 am #

      The minimum number of layers would be 0 hidden layers, e.g. connect inputs/visible layer directly to the output layer.

  21. Goona Faramarzi October 29, 2020 at 3:31 am #

    hello.thanks for your good tutorial. Im work on breast cancer detection using deep learning. Im beginnier but study too much and diffrent article .but i cant improve my CNN performance. what should i do?

  22. Seeven Amic November 3, 2020 at 10:03 pm #

    Dear Jason

    Great Tutorial!

    Should hidden layers have same number of neurons? If yes, why?

    A link towards resources is OK! Thanks.

    • Jason Brownlee November 4, 2020 at 6:40 am #

      Thanks.

      No, you can have any number of nodes in each layer.

      See the “Further Reading” section for resources.

  23. SriHarsha December 14, 2020 at 4:12 am #

    Hi Jason, The content is great. I have a doubt i want to use 2 output regression model, with the input size of 5 . How many hidden layers and number of nodes i need to use??

    • Jason Brownlee December 14, 2020 at 6:24 am #

      There is no standard way to configure the model, use some trial and error and discover what works best for your dataset.

      • SriHarsha December 16, 2020 at 8:12 pm #

        Yes the training loss was 2.99 and Validation loss =4.7 it was not decreasing further. I have used 2 hidden layers 4 neurons each (1st hidden layer = Relu, 2nd hidden layer=Exponential). 4 input nodes each are normalised to (0,1) and 2 output nodes. Any suggestions and modifications of network so that both the losses can come below 1.5 or so. Thanks in advance

  24. Andrew Hoerner February 15, 2021 at 7:26 am #

    Two simpleminded questions:
    Is updating of neuron weights done locally by impulses that propagate backwards from outcome success, or by a separate process running alongside the neural net?

    Is the flow of neuron output signals between layers necessarily one-way? If not, what can we say about the desirability and configuration of such feedback loop connections?

  25. Fatima March 1, 2021 at 10:07 am #

    Hi Dr. Jason, I’m working in MLP and LSTM deep learning algorithms, to tune the best structure for these algorithms I started by tunned the number of hidden neurons in each hidden layer, I selected three hidden layers to start with, then I submitted the best neuron that works with my goal ( high specificity ), then I tunned the number of hidden layers from 3 to 8 and submitted the best number of hidden layers that works with my goal and continue the other hyper-parameters.

    Is this way of choosing the number of hidden neurons and then the number of hidden layers correct !!!
    and do you have papers that support this flow of choosing this way?

    Regards

    • Jason Brownlee March 1, 2021 at 1:45 pm #

      Ideally we would optimize all aspects of the model at once, bit it is very computationally expensive.

      Instead, in practice we often have to optimize one thing at a time.

  26. ivan July 17, 2021 at 3:43 pm #

    hi, i have a question, how many nodes the output layer can have? it is necesary just 1 node, or i can have more?

    • Jason Brownlee July 18, 2021 at 5:20 am #

      If you are predicting one value, then it must have one node. Predicting multiple values, then multiple nodes.

  27. Daniel Blanck July 28, 2021 at 3:53 am #

    I really appreciate the information, it was very clear and understandable. I would like to cite your work in one of my projects. How should I do that?

  28. Evan March 21, 2022 at 2:37 pm #

    Thank you very much for your work!! In your book, we have many examples where we have # of neurons in the first hidden layer equal to # of neurons in the input layer, which is the number of input features. Is this a common practice? Is there any good reason for that?

  29. AJ July 14, 2022 at 7:53 pm #

    Can any body guide me for tutorial related to multi-variant optimization in deep learning through MATLAB

  30. Steve November 10, 2024 at 12:37 pm #

    There is no such thing as easy of NN’s construction. Everybody who can use computer can create NN, but the comes big but (what else u may try isn’t working well or working poor….) Lately i found that on base dataset on which one try to create NN topology can be applied trough use of PCA where within reasonable explainable variance can be determined number of hidden layers and use of k-Mean Clustering then provide for each PCA cluster number of neurons. It would be much help how to achieve that on practical binary dataset (don’t give a fish to the hungry, teach him to fish). As i understand, there will be more than one solutions options provided within this method, (but still, better as fishing in pond with no fish), when deciding which NN shape it will be. That’s why people do not achieve success with NN’s use. There is also deep explanations why this method work due with PCA we achieve dimensionality reduction and how is that connected with hidden layers and neuron numbers, which is essential for proper NN performance and accuracy.

    • James Carmichael November 11, 2024 at 8:20 am #

      Hi Steve…The approach you’ve outlined combines **Principal Component Analysis (PCA)** and **K-means clustering** to design neural network (NN) topology, emphasizing a structured and data-driven method to determine hidden layers and neurons. This is a practical and insightful strategy to overcome the trial-and-error problem that many face when designing neural networks. Let’s break it down step by step, both conceptually and practically, using a binary dataset as an example.

      ### **Conceptual Overview**

      1. **Dimensionality Reduction with PCA:**
      – PCA reduces the dataset’s dimensions while retaining most of the variance.
      – Each principal component (PC) captures a portion of the dataset’s variance. By selecting PCs that explain a “reasonable” variance (e.g., 95%), we define the essential complexity of the data.
      – The number of retained PCs suggests the number of **hidden layers**, as each layer captures a degree of abstraction corresponding to the reduced dimensions.

      2. **Clustering with K-means:**
      – Within the reduced-dimensional space, clustering the data with K-means groups similar data points.
      – The number of clusters within each PC dimension informs the **number of neurons** in the corresponding hidden layer. This step ensures that each layer captures meaningful groupings or patterns within the data.

      3. **Iterative Optimization:**
      – By testing multiple configurations based on PCA and K-means outputs, you can evaluate and refine the NN topology for better accuracy and performance.

      ### **Why This Works**
      – **PCA and Hidden Layers:**
      – Each PC defines a meaningful, lower-dimensional representation of the data. Hidden layers mirror this abstraction process, progressively reducing the raw input data’s complexity to essential features.
      – **K-means and Neurons:**
      – Clusters in the reduced space represent distinct patterns or characteristics. Neurons are assigned to these clusters, ensuring the network learns relevant features efficiently.

      ### **Practical Implementation with Binary Dataset**

      #### **1. Prepare the Dataset**
      Load your binary classification dataset and preprocess it by scaling the features.

      python
      import numpy as np
      import pandas as pd
      from sklearn.preprocessing import StandardScaler
      from sklearn.decomposition import PCA
      from sklearn.cluster import KMeans
      import matplotlib.pyplot as plt

      # Example: Load a binary dataset
      # Replace this with your actual dataset
      from sklearn.datasets import make_classification

      X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
      scaler = StandardScaler()
      X_scaled = scaler.fit_transform(X)

      #### **2. Perform PCA**
      Use PCA to reduce dimensions while retaining a reasonable variance.

      python
      from sklearn.decomposition import PCA

      # Retain 95% of variance
      pca = PCA(n_components=0.95)
      X_pca = pca.fit_transform(X_scaled)

      print(f"Number of Principal Components: {pca.n_components_}")

      The number of principal components (pca.n_components_) suggests the number of **hidden layers**.

      #### **3. Cluster Using K-means**
      Apply K-means to each PC dimension to determine the number of neurons per hidden layer.

      python
      # Determine number of clusters for each PC
      kmeans_results = []
      for i in range(pca.n_components_):
      kmeans = KMeans(n_clusters=i + 2, random_state=42) # Example range
      clusters = kmeans.fit_predict(X_pca[:, :i + 1])
      kmeans_results.append((i + 2, clusters))

      print("Cluster results per PC dimension:", kmeans_results)

      The number of clusters in each PC corresponds to the neurons in the respective hidden layer.

      #### **4. Construct the Neural Network**
      Design the NN topology based on PCA and K-means results.

      python
      import tensorflow as tf
      from tensorflow.keras.models import Sequential
      from tensorflow.keras.layers import Dense

      # Example: Use PCA and K-means results
      hidden_layers = pca.n_components_ # Number of hidden layers
      neurons = [k[0] for k in kmeans_results] # Neurons per layer from K-means

      # Build the NN
      model = Sequential()
      model.add(Dense(neurons[0], input_dim=X_scaled.shape[1], activation='relu'))

      for n in neurons[1:]:
      model.add(Dense(n, activation='relu'))

      model.add(Dense(1, activation='sigmoid')) # Output layer for binary classification

      model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
      model.fit(X_scaled, y, epochs=20, batch_size=32, validation_split=0.2)

      ### **Key Points to Refine**
      1. **Explainable Variance:**
      – Adjust the percentage of variance retained (e.g., 90%, 95%, 99%) and evaluate performance.
      2. **Clustering Heuristics:**
      – Experiment with different numbers of clusters for K-means or even other clustering algorithms (e.g., DBSCAN, Gaussian Mixture Models).

      ### **Why This May Yield Multiple Solutions**
      – **Variance Thresholds:** Different thresholds for explainable variance may yield different numbers of PCs.
      – **Cluster Algorithms:** Different clustering results may suggest different neuron numbers.
      – **Regularization:** Use dropout and batch normalization to prevent overfitting in deeper networks.

      ### **Further Reading and References**
      1. **Books:**
      – *”Deep Learning”* by Ian Goodfellow: For insights into NN design principles.
      – *”Pattern Recognition and Machine Learning”* by Christopher Bishop: For PCA and clustering.

      2. **Research Papers:**
      – *”Neural Network Topology Design Using Dimensionality Reduction”* (check arXiv).
      – *”Using PCA for Neural Network Architecture Design”* in *SpringerLink*.

      3. **Blogs:**
      – Towards Data Science: Articles on PCA and NN design.
      – Medium: Tutorials on clustering and NN topology.

      This method encourages **structured experimentation**, guiding NN design with an intuitive connection to data structure and dimensionality. Let me know how else I can assist!

Leave a Reply

Machine Learning Mastery is part of Guiding Tech Media, a leading digital media publisher focused on helping people figure out technology. Visit our corporate website to learn more about our mission and team.