Artificial neural networks have two main hyperparameters that control the architecture or topology of the network: the number of layers and the number of nodes in each hidden layer.
You must specify values for these parameters when configuring your network.
The most reliable way to configure these hyperparameters for your specific predictive modeling problem is via systematic experimentation with a robust test harness.
This can be a tough pill to swallow for beginners to the field of machine learning, looking for an analytical way to calculate the optimal number of layers and nodes, or easy rules of thumb to follow.
In this post, you will discover the roles of layers and nodes and how to approach the configuration of a multilayer perceptron neural network for your predictive modeling problem.
After reading this post, you will know:
- The difference between single-layer and multiple-layer perceptron networks.
- The value of having one and more than one hidden layers in a network.
- Five approaches for configuring the number of layers and nodes in a network.
Kick-start your project with my new book Better Deep Learning, including step-by-step tutorials and the Python source code files for all examples.
Let’s get started.
How to Configure the Number of Layers and Nodes in a Neural Network
Photo by Ryan, some rights reserved.
Overview
This post is divided into four sections; they are:
- The Multilayer Perceptron
- How to Count Layers?
- Why Have Multiple Layers?
- How Many Layers and Nodes to Use?
The Multilayer Perceptron
A node, also called a neuron or Perceptron, is a computational unit that has one or more weighted input connections, a transfer function that combines the inputs in some way, and an output connection.
Nodes are then organized into layers to comprise a network.
A single-layer artificial neural network, also called a single-layer, has a single layer of nodes, as its name suggests. Each node in the single layer connects directly to an input variable and contributes to an output variable.
Single-layer networks have just one layer of active units. Inputs connect directly to the outputs through a single layer of weights. The outputs do not interact, so a network with N outputs can be treated as N separate single-output networks.
— Page 15, Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks, 1999.
A single-layer network can be extended to a multiple-layer network, referred to as a Multilayer Perceptron. A Multilayer Perceptron, or MLP for short, is an artificial neural network with more than a single layer.
It has an input layer that connects to the input variables, one or more hidden layers, and an output layer that produces the output variables.
The standard multilayer perceptron (MLP) is a cascade of single-layer perceptrons. There is a layer of input nodes, a layer of output nodes, and one or more intermediate layers. The interior layers are sometimes called “hidden layers” because they are not directly observable from the systems inputs and outputs.
— Page 31, Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks, 1999.
We can summarize the types of layers in an MLP as follows:
- Input Layer: Input variables, sometimes called the visible layer.
- Hidden Layers: Layers of nodes between the input and output layers. There may be one or more of these layers.
- Output Layer: A layer of nodes that produce the output variables.
Finally, there are terms used to describe the shape and capability of a neural network; for example:
- Size: The number of nodes in the model.
- Width: The number of nodes in a specific layer.
- Depth: The number of layers in a neural network.
- Capacity: The type or structure of functions that can be learned by a network configuration. Sometimes called “representational capacity“.
- Architecture: The specific arrangement of the layers and nodes in the network.
How to Count Layers?
Traditionally, there is some disagreement about how to count the number of layers.
The disagreement centers around whether or not the input layer is counted. There is an argument to suggest it should not be counted because the inputs are not active; they are simply the input variables. We will use this convention; this is also the convention recommended in the book “Neural Smithing“.
Therefore, an MLP that has an input layer, one hidden layer, and one output layer is a 2-layer MLP.
The structure of an MLP can be summarized using a simple notation.
This convenient notation summarizes both the number of layers and the number of nodes in each layer. The number of nodes in each layer is specified as an integer, in order from the input layer to the output layer, with the size of each layer separated by a forward-slash character (“/”).
For example, a network with two variables in the input layer, one hidden layer with eight nodes, and an output layer with one node would be described using the notation: 2/8/1.
I recommend using this notation when describing the layers and their size for a Multilayer Perceptron neural network.
Why Have Multiple Layers?
Before we look at how many layers to specify, it is important to think about why we would want to have multiple layers.
A single-layer neural network can only be used to represent linearly separable functions. This means very simple problems where, say, the two classes in a classification problem can be neatly separated by a line. If your problem is relatively simple, perhaps a single layer network would be sufficient.
Most problems that we are interested in solving are not linearly separable.
A Multilayer Perceptron can be used to represent convex regions. This means that in effect, they can learn to draw shapes around examples in some high-dimensional space that can separate and classify them, overcoming the limitation of linear separability.
In fact, there is a theoretical finding by Lippmann in the 1987 paper “An introduction to computing with neural nets” that shows that an MLP with two hidden layers is sufficient for creating classification regions of any desired shape. This is instructive, although it should be noted that no indication of how many nodes to use in each layer or how to learn the weights is given.
A further theoretical finding and proof has shown that MLPs are universal approximators. That with one hidden layer, an MLP can approximate any function that we require.
Specifically, the universal approximation theorem states that a feedforward network with a linear output layer and at least one hidden layer with any “squashing” activation function (such as the logistic sigmoid activation function) can approximate any Borel measurable function from one finite-dimensional space to another with any desired non-zero amount of error, provided that the network is given enough hidden units.
— Page 198, Deep Learning, 2016.
This is an often-cited theoretical finding and there is a ton of literature on it. In practice, we again have no idea how many nodes to use in the single hidden layer for a given problem nor how to learn or set their weights effectively. Further, many counterexamples have been presented of functions that cannot directly be learned via a single one-hidden-layer MLP or require an infinite number of nodes.
Even for those functions that can be learned via a sufficiently large one-hidden-layer MLP, it can be more efficient to learn it with two (or more) hidden layers.
Since a single sufficiently large hidden layer is adequate for approximation of most functions, why would anyone ever use more? One reason hangs on the words “sufficiently large”. Although a single hidden layer is optimal for some functions, there are others for which a single-hidden-layer-solution is very inefficient compared to solutions with more layers.
— Page 38, Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks, 1999.
How Many Layers and Nodes to Use?
With the preamble of MLPs out of the way, let’s get down to your real question.
How many layers should you use in your Multilayer Perceptron and how many nodes per layer?
In this section, we will enumerate five approaches to solving this problem.
1) Experimentation
In general, when I’m asked how many layers and nodes to use for an MLP, I often reply:
I don’t know. Use systematic experimentation to discover what works best for your specific dataset.
I still stand by this answer.
In general, you cannot analytically calculate the number of layers or the number of nodes to use per layer in an artificial neural network to address a specific real-world predictive modeling problem.
The number of layers and the number of nodes in each layer are model hyperparameters that you must specify.
You are likely to be the first person to attempt to address your specific problem with a neural network. No one has solved it before you. Therefore, no one can tell you the answer of how to configure the network.
You must discover the answer using a robust test harness and controlled experiments. For example, see the post:
Regardless of the heuristics you might encounter, all answers will come back to the need for careful experimentation to see what works best for your specific dataset.
2) Intuition
The network can be configured via intuition.
For example, you may have an intuition that a deep network is required to address a specific predictive modeling problem.
A deep model provides a hierarchy of layers that build up increasing levels of abstraction from the space of the input variables to the output variables.
Given an understanding of the problem domain, we may believe that a deep hierarchical model is required to sufficiently solve the prediction problem. In which case, we may choose a network configuration that has many layers of depth.
Choosing a deep model encodes a very general belief that the function we want to learn should involve composition of several simpler functions. This can be interpreted from a representation learning point of view as saying that we believe the learning problem consists of discovering a set of underlying factors of variation that can in turn be described in terms of other, simpler underlying factors of variation.
— Page 201, Deep Learning, 2016.
This intuition can come from experience with the domain, experience with modeling problems with neural networks, or some mixture of the two.
In my experience, intuitions are often invalidated via experiments.
3) Go For Depth
In their important textbook on deep learning, Goodfellow, Bengio, and Courville highlight that empirically, on problems of interest, deep neural networks appear to perform better.
Specifically, they state the choice of using deep neural networks as a statistical argument in cases where depth may be intuitively beneficial.
Empirically, greater depth does seem to result in better generalization for a wide variety of tasks. […] This suggests that using deep architectures does indeed express a useful prior over the space of functions the model learns.
— Page 201, Deep Learning, 2016.
We may use this argument to suggest that using deep networks, those with many layers, may be a heuristic approach to configuring networks for challenging predictive modeling problems.
This is similar to the advice for starting with Random Forest and Stochastic Gradient Boosting on a predictive modeling problem with tabular data to quickly get an idea of an upper-bound on model skill prior to testing other methods.
4) Borrow Ideas
A simple, but perhaps time consuming approach, is to leverage findings reported in the literature.
Find research papers that describe the use of MLPs on instances of prediction problems similar in some way to your problem. Note the configuration of the networks used in those papers and use them as a starting point for the configurations to test on your problem.
Transferability of model hyperparameters that result in skillful models from one problem to another is a challenging open problem and the reason why model hyperparameter configuration is more art than science.
Nevertheless, the network layers and number of nodes used on related problems is a good starting point for testing ideas.
5) Search
Design an automated search to test different network configurations.
You can seed the search with ideas from literature and intuition.
Some popular search strategies include:
- Random: Try random configurations of layers and nodes per layer.
- Grid: Try a systematic search across the number of layers and nodes per layer.
- Heuristic: Try a directed search across configurations such as a genetic algorithm or Bayesian optimization.
- Exhaustive: Try all combinations of layers and the number of nodes; it might be feasible for small networks and datasets.
This can be challenging with large models, large datasets and combinations of the two. Some ideas to reduce or manage the computational burden include:
- Fit models on a smaller subset of the training dataset to speed up the search.
- Aggressively bound the size of the search space.
- Parallelize the search across multiple server instances (e.g. use Amazon EC2 service).
I recommend being systematic if time and resources permit.
More
I have seen countless heuristics of how to estimate the number of layers and either the total number of neurons or the number of neurons per layer.
I do not want to enumerate them; I’m skeptical that they add practical value beyond the special cases on which they are demonstrated.
If this area is interesting to you, perhaps start with “Section 4.4 Capacity versus Size” in the book “Neural Smithing“. It summarizes a ton of findings in this area. The book is dated from 1999, so there are another nearly 20 years of ideas to wade through in this area if you’re up for it.
Also, see some of the discussions linked in the Further Reading section (below).
Did I miss your favorite method for configuring a neural network? Or do you know a good reference on the topic?
Let me know in the comments below.
Further Reading
This section provides more resources on the topic if you are looking to go deeper.
Papers
Books
- Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks, 1999.
- Deep Learning, 2016.
Articles
- Artificial neural network on Wikipedia
- Universal approximation theorem on Wikipedia
- How many hidden layers should I use?, comp.ai.neural-nets FAQ
Discussions
- How to choose the number of hidden layers and nodes in a feedforward neural network?
- Number of nodes in hidden layers of neural network
- multi-layer perceptron (MLP) architecture: criteria for choosing number of hidden layers and size of the hidden layer?
- In deep learning, how do I select the optimal number of layers and neurons?
Summary
In this post, you discovered the role of layers and nodes and how to configure a multilayer perceptron neural network.
Specifically, you learned:
- The difference between single-layer and multiple-layer perceptron networks.
- The value of having one and more than one hidden layers in a network.
- Five approaches for configuring the number of layers and nodes in a network.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Thanks for the blog post.
There is indeed a large number of recent research going to answer this question automatically, dubbed (neural) architecture search. Here a list of papers which I maintain:
https://www.automl.org/automl/literature-on-neural-architecture-search/
Thanks.
Hello Mr. Lindauer,
In regards to the concept on “counting the input layer or not”. How about this as a thought:
” I would have thought the process of “Application initialization” and
connecting data sources” would be considered the “first process”, so
why wouldn’t the initialization of the model not be considered?
Is it because the input layer isn’t active consistently? Or the input layer
doesn’t change?
One of the things I’ve seen a lot of in researching Neural Networks is that they always
discribe the first layer as the “input layer”, which sort of makes MLP2 a little mis-leading?
Trying to figure out the pro’s and con’s!
Not looking to take you back through the argument process again! Sorry
Great! Thanks for the blog post 🙂 There is also an interesting post here which tries to address the same question.
https://towardsdatascience.com/beginners-ask-how-many-hidden-layers-neurons-to-use-in-artificial-neural-networks-51466afa0d3e
Thanks for sharing.
This is grate blog and nice information
Thanks.
Hi, Very nice summary. Thank you very much! I’m a deep learning researcher working in an inter-disciplinary team in Univ Edi. May I ask about the template you used to create this site? It looks quite professional and great!
Thanks, you can learn more about the software I use for the site here:
https://machinelearningmastery.com/faq/single-faq/what-software-do-you-use-to-run-your-website
Thank you for such a great post, but I have a question;
Let’s say our image size is 64*64*3 so what would be the number of nodes in our input layer?
I would recommend using a CNN and perhaps try 32 filters?
Sorry for ambiguity in my question….Suppose I’m using a CNN and I have a picture of size 64*64*3 and my question is what would be the number of nodes in my input layer?
It would be input_shape=(64,64,3) if using channels last format.
Thanks
Will the number of neuron in hidden layer mention here work for RNN/LSTM as well?
Perhaps test and compare results with different configurations?
Hi Jason,
There is practically no way to know ahead of time how many layers or nodes you will need for a certain neural network learning task. I have a solution: a neural network, called ALNfitDeep, which can automatically *grow* during training to fit the problem. Software to do this is at https://github.com/Bill-Armstrong/ . There is a new executable release available, which you can get by clicking where it says “4 releases” near the top of the main page. You can forget the source code for now. Use the Help button. Since the release is new, I would appreciate any feedback on problems you encounter. From my point of view, neural nets which can’t learn automatically based on the problem are a total waste of time. Also having to use a lot of valuable data for validation is a waste. My nets measure the noise variance and then train on all of the data not used in testing. My nets can grow to tens of thousands of nodes, yet the execution of the learned function remains very fast (because very little has to actually be computed for a given input). The secret is that all computation of linear functions is in the first layer. Instead of a one-input squashing function, there are two-input non-linearities: max, and min. People have to have the courage to try it. I will help.
Thanks for the note.
I have played with “growing” and “pruning” nets since the late 1990s, I remain skeptical.
A sensitivity analysis of model capacity vs skill is reliable and repeatable for me.
Totally agree with statement “Waste of time” ! Creation NN’s no big deal, but creating actually useful one, almost “mission impossible”. U have to have big employment “horde” doing one task only – finding the right shape of NN in testing it against the dataset manually. So underdeveloped those software solutions are.
Hi Jason,
I have 2 questions.
1. What if I want to predict financial time series (e.g. Forex, stock price) and I have decided to use mlp with 2 hidden layers. I have also decided to use 4 neurons for each of the hidden layers. How exactly do I split up my data set including the input. I’m assuming that the 4 neurons will be Open, Close, High and Low Values.
Will my input values be majority of my dataset in total? and then for the first hidden layer, a subset of the dataset including, so some of the ‘High values for one of the neurons’, some of the low values for the second neuron and so on, and then the same in the other hidden layer?
2. Do you have a python script example for iterating through different example layers.
Thank you very much!
If you have 4 classes as output, then the data must be prepared to match this expectation with a one hot encoding:
https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/
Data would not be split based on the number of nodes, I’m not sure I follow, sorry.
Here’s an example of a model for multi-class classification:
https://machinelearningmastery.com/multi-class-classification-tutorial-keras-deep-learning-library/
Hi, May I say that layers and nodes are relevant to how many training inputs numbers?
If more inputs, more nodes?
Thanks
Not really. They are unrelated.
Hi Jason,
Thanks again for your wonderful tutorials.
Rephrasing the question by “James”: Does having more training samples requires an addition to the number of neurons?
I have designed a neural network model based on the “input shapes” which is also advised by many empirical rules of thumb. My question is, if I have datasets of 100 or 100k samples (each is representative enough), shall I leave the model shapes fixed? or grow it (perhaps linearly) with the increase of the samples?
Because I’ve noticed some complexities arises in the dataset as it grows.
Thanks again.
It may or it may not. We cannot know for sure for a given dataset and model combination.
There is an interrelation between the number of layers (nodes per layer). Please have a look into our paper https://arxiv.org/abs/1902.02771. Thank you.
Thanks for sharing.
Thank you, Jason, for the post.
Is there a rule of thumb for the number of units when you want to increase the number of hidden layers? Let’s say for example that your model has a decent performance for 1 hidden layer and 30 units, would choosing 2 hidden layers means you would decrease the number of units for each of these layers or you can even increase it?
Not really, sorry.
Test and use a robust test harness so that the results are reliable.
lets say we want to differentiate between clear and blur images, can CNN train a model to do that and how do you go about it
Yes, perhaps a classification problem with a binary prediction (blur vs no-blur).
MLP? Or MLFFN? Is there a way to use the simpler perceptron update algorithm without using derivative or backprop or without separating the layers with a non-linearity activation without having the whole thing collapse into the equivslent of a single linear layer as Minsky pointed out way back?
There may be, I don’t have material on it, sorry.
We moved away from simple Perceptron because backprop on a MLP works really well in general.
Hi. jason. i have a network with 4 layers (4 hidden layers) which each layer has 32 nodes. i use adam optimizers and leaky relu. i want to know what is the name of my network? is it simple MLP ? can i name that as a deep network ?
It is an MLP, a deep MLP if you like.
Hello to you, Jason,
I have a forecasting project using machine learning to predict agricultural crops.
I need to build an algorithm to predict agricultural crops based on field size, local climate, season and soil chemical components (such as mineral salts, phosphorus ions, potassium and nitrates, moisture and gases in the air) at the input. And the output will be a list of optimized crops generated. Which method will I use? Any link to guide me will be useful to me.
That sounds like a fun project!
I recommend following this process as a first step:
https://machinelearningmastery.com/start-here/#process
Thank so much!
I will give you a feedback.
I will make this prototype to present it at panama city in competition. Am selected as a finalist.
I wouldn’t want us to see this project as a fun project! If you can redirect it better to make it look interesting, it would be great!
“A single-layer neural network can only be used to represent linearly separable functions.” I think this statement is wrong. I understand that “A feedforward network with a single layer is sufficient to represent any function, but the layer may be infeasibly large and may fail to learn and generalize correctly.”, do you mean single-layer with only one single neuron ?
No, a network with a single layer of nodes.
oh, yes,I checked it and everything is correct, I should add “A feedforward network with a single **hidden** layer” which would be universal approximation theorem. But a single-layer neural network has no hidden layers at all, then it can’t make anything more than linear separation, the simples example would be it can’t compute xor.
Hi Jason, am working on neural machine machine translation, one of my examiner asks me how many input layer, hidden layer and output layer In my experiment. Nothing to answer. the parallel sentence used is 7050 length of long sentence in input 25 and output 20.
I used 100 dimension. would you help me? vocabulary(unique words in input) is 12700.
Perhaps test different configurations and discover what works best for your model and dataset?
The reference to Deep Learning and the universal approximation theorem is incorrect–while the above reference states “p.198”, it’s actually on p.192 of the 2016 edition.
Thanks Mike.
Hi Jason,
Thanks for the info regarding hidden layer structure selection.
I wanted to ask if you were familiar with using metaheuristics to train a network and whether different training strategies need differing model structures. For example, if you were using the Iris data set with 5 hidden neurons (one layer) when training with backpropagation, do you think it would be appropriate to use the same number of hidden layers and neurons if you were to train using PSO or SA?
In other words, does the training technique influence the number of hidden neurons or layers?
Cheers,
Rob
Great question!
Backprop remains the most efficient training algorithm, regardless of choice of architecture.
Metaheuristics could be useful in finding the architecture to train though. I have seen many automl and NAS (network architecture search) algorithms that use an evolutionary algorithm at their core.
It’s funny because there’s a lot of resources on using metaheuristics to find optimal network hyperparameters or hidden structure, not a whole lot on training. Part of my current interest is in that area and I can see why BP is generally preferred for training. I’ve implemented a GA trained NN in lieu of BP and while it seems to converge nicely, it sure is slow (I’m talking 50x slower for equivalent networks). I’ve still yet to implement a PSO-NN but it’s still interesting to think about. A bioinspired network trained by a bioinspired metaheuristic has a nice ring to it.
here’s my question: if we have a summation function that takes the sum of the weighted inputs and forwards it to the activation function, how do we count the layers? ex: a single layer precepetron with 2 inputs and 2 weights, and the question specifically mentions that we have a summation function and an activation function. do we count both summation and activation as 1 layer or 2 layers?
Typically you count hidden layers only.
construct a dataset having 4 inputs against two input variables. you also have to assume target output against each input.
2. construct a topology for neural network having atleast 5 neurons (number of hidden layers and number of neurons in each layer will be of your own choice)
3. assume initial weights of your own choice and run a complete iteration (for all four inputs)
Iahve to submit this assignment plz anyone can help??
Perhaps start here:
https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
Or here:
https://machinelearningmastery.com/implement-backpropagation-algorithm-scratch-python/
Perhaps contact your teacher directly, after all, you have already paid for their help.
How many minimume number of layers in deep laerning
The minimum number of layers would be 0 hidden layers, e.g. connect inputs/visible layer directly to the output layer.
hello.thanks for your good tutorial. Im work on breast cancer detection using deep learning. Im beginnier but study too much and diffrent article .but i cant improve my CNN performance. what should i do?
You’re welcome.
You can discover 100s of tutorials on how to improve neural nets on this blog, perhaps start here:
https://machinelearningmastery.com/start-here/#better
Dear Jason
Great Tutorial!
Should hidden layers have same number of neurons? If yes, why?
A link towards resources is OK! Thanks.
Thanks.
No, you can have any number of nodes in each layer.
See the “Further Reading” section for resources.
Hi Jason, The content is great. I have a doubt i want to use 2 output regression model, with the input size of 5 . How many hidden layers and number of nodes i need to use??
There is no standard way to configure the model, use some trial and error and discover what works best for your dataset.
Yes the training loss was 2.99 and Validation loss =4.7 it was not decreasing further. I have used 2 hidden layers 4 neurons each (1st hidden layer = Relu, 2nd hidden layer=Exponential). 4 input nodes each are normalised to (0,1) and 2 output nodes. Any suggestions and modifications of network so that both the losses can come below 1.5 or so. Thanks in advance
Yes, the suggestions here will get you started:
https://machinelearningmastery.com/start-here/#better
Two simpleminded questions:
Is updating of neuron weights done locally by impulses that propagate backwards from outcome success, or by a separate process running alongside the neural net?
Is the flow of neuron output signals between layers necessarily one-way? If not, what can we say about the desirability and configuration of such feedback loop connections?
Yes, from output layer based on error, then on back through the net to input layer.
Flow is forward for inference back for error correction/weight updates.
Other net types can have loops and/or internal and things get harder for training, e.g. RNNs like LSTMs using back prop through time:
https://machinelearningmastery.com/gentle-introduction-backpropagation-time/
Hi Dr. Jason, I’m working in MLP and LSTM deep learning algorithms, to tune the best structure for these algorithms I started by tunned the number of hidden neurons in each hidden layer, I selected three hidden layers to start with, then I submitted the best neuron that works with my goal ( high specificity ), then I tunned the number of hidden layers from 3 to 8 and submitted the best number of hidden layers that works with my goal and continue the other hyper-parameters.
Is this way of choosing the number of hidden neurons and then the number of hidden layers correct !!!
and do you have papers that support this flow of choosing this way?
Regards
Ideally we would optimize all aspects of the model at once, bit it is very computationally expensive.
Instead, in practice we often have to optimize one thing at a time.
hi, i have a question, how many nodes the output layer can have? it is necesary just 1 node, or i can have more?
If you are predicting one value, then it must have one node. Predicting multiple values, then multiple nodes.
I really appreciate the information, it was very clear and understandable. I would like to cite your work in one of my projects. How should I do that?
Thanks, see this:
https://machinelearningmastery.com/faq/single-faq/how-do-i-reference-or-cite-a-book-or-blog-post
Thank you very much for your work!! In your book, we have many examples where we have # of neurons in the first hidden layer equal to # of neurons in the input layer, which is the number of input features. Is this a common practice? Is there any good reason for that?
Hi Evan…This is not required. I would investigate optimization techniques to select the number of neurons in each layer:
https://machinelearningmastery.com/optimization-for-machine-learning-crash-course/
Can any body guide me for tutorial related to multi-variant optimization in deep learning through MATLAB
Hi AJ…While we focus on Python in our material, you may find the following of interest:
https://www.mathworks.com/help/deeplearning/ug/deep-learning-using-bayesian-optimization.html?s_tid=mwa_osa_a
There is no such thing as easy of NN’s construction. Everybody who can use computer can create NN, but the comes big but (what else u may try isn’t working well or working poor….) Lately i found that on base dataset on which one try to create NN topology can be applied trough use of PCA where within reasonable explainable variance can be determined number of hidden layers and use of k-Mean Clustering then provide for each PCA cluster number of neurons. It would be much help how to achieve that on practical binary dataset (don’t give a fish to the hungry, teach him to fish). As i understand, there will be more than one solutions options provided within this method, (but still, better as fishing in pond with no fish), when deciding which NN shape it will be. That’s why people do not achieve success with NN’s use. There is also deep explanations why this method work due with PCA we achieve dimensionality reduction and how is that connected with hidden layers and neuron numbers, which is essential for proper NN performance and accuracy.
Hi Steve…The approach you’ve outlined combines **Principal Component Analysis (PCA)** and **K-means clustering** to design neural network (NN) topology, emphasizing a structured and data-driven method to determine hidden layers and neurons. This is a practical and insightful strategy to overcome the trial-and-error problem that many face when designing neural networks. Let’s break it down step by step, both conceptually and practically, using a binary dataset as an example.
—
### **Conceptual Overview**
1. **Dimensionality Reduction with PCA:**
– PCA reduces the dataset’s dimensions while retaining most of the variance.
– Each principal component (PC) captures a portion of the dataset’s variance. By selecting PCs that explain a “reasonable” variance (e.g., 95%), we define the essential complexity of the data.
– The number of retained PCs suggests the number of **hidden layers**, as each layer captures a degree of abstraction corresponding to the reduced dimensions.
2. **Clustering with K-means:**
– Within the reduced-dimensional space, clustering the data with K-means groups similar data points.
– The number of clusters within each PC dimension informs the **number of neurons** in the corresponding hidden layer. This step ensures that each layer captures meaningful groupings or patterns within the data.
3. **Iterative Optimization:**
– By testing multiple configurations based on PCA and K-means outputs, you can evaluate and refine the NN topology for better accuracy and performance.
—
### **Why This Works**
– **PCA and Hidden Layers:**
– Each PC defines a meaningful, lower-dimensional representation of the data. Hidden layers mirror this abstraction process, progressively reducing the raw input data’s complexity to essential features.
– **K-means and Neurons:**
– Clusters in the reduced space represent distinct patterns or characteristics. Neurons are assigned to these clusters, ensuring the network learns relevant features efficiently.
—
### **Practical Implementation with Binary Dataset**
#### **1. Prepare the Dataset**
Load your binary classification dataset and preprocess it by scaling the features.
python
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
# Example: Load a binary dataset
# Replace this with your actual dataset
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
—
#### **2. Perform PCA**
Use PCA to reduce dimensions while retaining a reasonable variance.
python
from sklearn.decomposition import PCA
# Retain 95% of variance
pca = PCA(n_components=0.95)
X_pca = pca.fit_transform(X_scaled)
print(f"Number of Principal Components: {pca.n_components_}")
The number of principal components (
pca.n_components_
) suggests the number of **hidden layers**.—
#### **3. Cluster Using K-means**
Apply K-means to each PC dimension to determine the number of neurons per hidden layer.
python
# Determine number of clusters for each PC
kmeans_results = []
for i in range(pca.n_components_):
kmeans = KMeans(n_clusters=i + 2, random_state=42) # Example range
clusters = kmeans.fit_predict(X_pca[:, :i + 1])
kmeans_results.append((i + 2, clusters))
print("Cluster results per PC dimension:", kmeans_results)
The number of clusters in each PC corresponds to the neurons in the respective hidden layer.
—
#### **4. Construct the Neural Network**
Design the NN topology based on PCA and K-means results.
python
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# Example: Use PCA and K-means results
hidden_layers = pca.n_components_ # Number of hidden layers
neurons = [k[0] for k in kmeans_results] # Neurons per layer from K-means
# Build the NN
model = Sequential()
model.add(Dense(neurons[0], input_dim=X_scaled.shape[1], activation='relu'))
for n in neurons[1:]:
model.add(Dense(n, activation='relu'))
model.add(Dense(1, activation='sigmoid')) # Output layer for binary classification
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_scaled, y, epochs=20, batch_size=32, validation_split=0.2)
—
### **Key Points to Refine**
1. **Explainable Variance:**
– Adjust the percentage of variance retained (e.g., 90%, 95%, 99%) and evaluate performance.
2. **Clustering Heuristics:**
– Experiment with different numbers of clusters for K-means or even other clustering algorithms (e.g., DBSCAN, Gaussian Mixture Models).
—
### **Why This May Yield Multiple Solutions**
– **Variance Thresholds:** Different thresholds for explainable variance may yield different numbers of PCs.
– **Cluster Algorithms:** Different clustering results may suggest different neuron numbers.
– **Regularization:** Use dropout and batch normalization to prevent overfitting in deeper networks.
—
### **Further Reading and References**
1. **Books:**
– *”Deep Learning”* by Ian Goodfellow: For insights into NN design principles.
– *”Pattern Recognition and Machine Learning”* by Christopher Bishop: For PCA and clustering.
2. **Research Papers:**
– *”Neural Network Topology Design Using Dimensionality Reduction”* (check arXiv).
– *”Using PCA for Neural Network Architecture Design”* in *SpringerLink*.
3. **Blogs:**
– Towards Data Science: Articles on PCA and NN design.
– Medium: Tutorials on clustering and NN topology.
This method encourages **structured experimentation**, guiding NN design with an intuitive connection to data structure and dimensionality. Let me know how else I can assist!