# Autoencoder Feature Extraction for Classification

Autoencoder is a type of neural network that can be used to learn a compressed representation of raw data.

An autoencoder is composed of an encoder and a decoder sub-models. The encoder compresses the input and the decoder attempts to recreate the input from the compressed version provided by the encoder. After training, the encoder model is saved and the decoder is discarded.

The encoder can then be used as a data preparation technique to perform feature extraction on raw data that can be used to train a different machine learning model.

In this tutorial, you will discover how to develop and evaluate an autoencoder for classification predictive modeling.

After completing this tutorial, you will know:

• An autoencoder is a neural network model that can be used to learn a compressed representation of raw data.
• How to train an autoencoder model on a training dataset and save just the encoder part of the model.
• How to use the encoder as a data preparation step when training a machine learning model.

Let’s get started.

How to Develop an Autoencoder for Classification
Photo by Bernd Thaller, some rights reserved.

## Tutorial Overview

This tutorial is divided into three parts; they are:

1. Autoencoders for Feature Extraction
2. Autoencoder for Classification
3. Encoder as Data Preparation for Predictive Model

## Autoencoders for Feature Extraction

An autoencoder is a neural network model that seeks to learn a compressed representation of an input.

An autoencoder is a neural network that is trained to attempt to copy its input to its output.

— Page 502, Deep Learning, 2016.

They are an unsupervised learning method, although technically, they are trained using supervised learning methods, referred to as self-supervised.

Autoencoders are typically trained as part of a broader model that attempts to recreate the input.

For example:

• X = model.predict(X)

The design of the autoencoder model purposefully makes this challenging by restricting the architecture to a bottleneck at the midpoint of the model, from which the reconstruction of the input data is performed.

There are many types of autoencoders, and their use varies, but perhaps the more common use is as a learned or automatic feature extraction model.

In this case, once the model is fit, the reconstruction aspect of the model can be discarded and the model up to the point of the bottleneck can be used. The output of the model at the bottleneck is a fixed-length vector that provides a compressed representation of the input data.

Usually they are restricted in ways that allow them to copy only approximately, and to copy only input that resembles the training data. Because the model is forced to prioritize which aspects of the input should be copied, it often learns useful properties of the data.

— Page 502, Deep Learning, 2016.

Input data from the domain can then be provided to the model and the output of the model at the bottleneck can be used as a feature vector in a supervised learning model, for visualization, or more generally for dimensionality reduction.

Next, let’s explore how we might develop an autoencoder for feature extraction on a classification predictive modeling problem.

## Autoencoder for Classification

In this section, we will develop an autoencoder to learn a compressed representation of the input features for a classification predictive modeling problem.

First, let’s define a classification predictive modeling problem.

We will use the make_classification() scikit-learn function to define a synthetic binary (2-class) classification task with 100 input features (columns) and 1,000 examples (rows). Importantly, we will define the problem in such a way that most of the input variables are redundant (90 of the 100 or 90 percent), allowing the autoencoder later to learn a useful compressed representation.

The example below defines the dataset and summarizes its shape.

Running the example defines the dataset and prints the shape of the arrays, confirming the number of rows and columns.

Next, we will develop a Multilayer Perceptron (MLP) autoencoder model.

The model will take all of the input columns, then output the same values. It will learn to recreate the input pattern exactly.

The autoencoder consists of two parts: the encoder and the decoder. The encoder learns how to interpret the input and compress it to an internal representation defined by the bottleneck layer. The decoder takes the output of the encoder (the bottleneck layer) and attempts to recreate the input.

Once the autoencoder is trained, the decoder is discarded and we only keep the encoder and use it to compress examples of input to vectors output by the bottleneck layer.

In this first autoencoder, we won’t compress the input at all and will use a bottleneck layer the same size as the input. This should be an easy problem that the model will learn nearly perfectly and is intended to confirm our model is implemented correctly.

We will define the model using the functional API; if this is new to you, I recommend this tutorial:

Prior to defining and fitting the model, we will split the data into train and test sets and scale the input data by normalizing the values to the range 0-1, a good practice with MLPs.

We will define the encoder to have two hidden layers, the first with two times the number of inputs (e.g. 200) and the second with the same number of inputs (100), followed by the bottleneck layer with the same number of inputs as the dataset (100).

To ensure the model learns well, we will use batch normalization and leaky ReLU activation.

The decoder will be defined with a similar structure, although in reverse.

It will have two hidden layers, the first with the number of inputs in the dataset (e.g. 100) and the second with double the number of inputs (e.g. 200). The output layer will have the same number of nodes as there are columns in the input data and will use a linear activation function to output numeric values.

The model will be fit using the efficient Adam version of stochastic gradient descent and minimizes the mean squared error, given that reconstruction is a type of multi-output regression problem.

We can plot the layers in the autoencoder model to get a feeling for how the data flows through the model.

The image below shows a plot of the autoencoder.

Plot of Autoencoder Model for Classification With No Compression

Next, we can train the model to reproduce the input and keep track of the performance of the model on the hold-out test set.

After training, we can plot the learning curves for the train and test sets to confirm the model learned the reconstruction problem well.

Finally, we can save the encoder model for use later, if desired.

As part of saving the encoder, we will also plot the encoder model to get a feeling for the shape of the output of the bottleneck layer, e.g. a 100 element vector.

An example of this plot is provided below.

Plot of Encoder Model for Classification With No Compression

Tying this all together, the complete example of an autoencoder for reconstructing the input data for a classification dataset without any compression in the bottleneck layer is listed below.

Running the example fits the model and reports loss on the train and test sets along the way.

Note: if you have problems creating the plots of the model, you can comment out the import and call the plot_model() function.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we see that loss gets low, but does not go to zero (as we might have expected) with no compression in the bottleneck layer. Perhaps further tuning the model architecture or learning hyperparameters is required.

A plot of the learning curves is created showing that the model achieves a good fit in reconstructing the input, which holds steady throughout training, not overfitting.

Learning Curves of Training the Autoencoder Model Without Compression

So far, so good. We know how to develop an autoencoder without compression.

Next, let’s change the configuration of the model so that the bottleneck layer has half the number of nodes (e.g. 50).

Tying this together, the complete example is listed below.

Running the example fits the model and reports loss on the train and test sets along the way.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we see that loss gets similarly low as the above example without compression, suggesting that perhaps the model performs just as well with a bottleneck half the size.

A plot of the learning curves is created, again showing that the model achieves a good fit in reconstructing the input, which holds steady throughout training, not overfitting.

Learning Curves of Training the Autoencoder Model With Compression

The trained encoder is saved to the file “encoder.h5” that we can load and use later.

Next, let’s explore how we might use the trained encoder model.

## Encoder as Data Preparation for Predictive Model

In this section, we will use the trained encoder from the autoencoder to compress input data and train a different predictive model.

First, let’s establish a baseline in performance on this problem. This is important as if the performance of a model is not improved by the compressed encoding, then the compressed encoding does not add value to the project and should not be used.

We can train a logistic regression model on the training dataset directly and evaluate the performance of the model on the holdout test set.

The complete example is listed below.

Running the example fits a logistic regression model on the training dataset and evaluates it on the test set.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that the model achieves a classification accuracy of about 89.3 percent.

We would hope and expect that a logistic regression model fit on an encoded version of the input to achieve better accuracy for the encoding to be considered useful.

We can update the example to first encode the data using the encoder model trained in the previous section.

First, we can load the trained encoder model from the file.

We can then use the encoder to transform the raw input data (e.g. 100 columns) into bottleneck vectors (e.g. 50 element vectors).

This process can be applied to the train and test datasets.

We can then use this encoded data to train and evaluate the logistic regression model, as before.

Tying this together, the complete example is listed below.

Running the example first encodes the dataset using the encoder, then fits a logistic regression model on the training dataset and evaluates it on the test set.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that the model achieves a classification accuracy of about 93.9 percent.

This is a better classification accuracy than the same model evaluated on the raw dataset, suggesting that the encoding is helpful for our chosen model and test harness.

This section provides more resources on the topic if you are looking to go deeper.

## Summary

In this tutorial, you discovered how to develop and evaluate an autoencoder for classification predictive modeling.

Specifically, you learned:

• An autoencoder is a neural network model that can be used to learn a compressed representation of raw data.
• How to train an autoencoder model on a training dataset and save just the encoder part of the model.
• How to use the encoder as a data preparation step when training a machine learning model.

Do you have any questions?

## Develop Deep Learning Projects with Python!

#### What If You Could Develop A Network in Minutes

...with just a few lines of Python

Discover how in my new Ebook:
Deep Learning With Python

It covers end-to-end projects on topics like:
Multilayer PerceptronsConvolutional Nets and Recurrent Neural Nets, and more...

### 117 Responses to Autoencoder Feature Extraction for Classification

1. Mike December 7, 2020 at 12:33 pm #

Thanks Jason,

Can you explain again why we would expect the results of a compressed dataset with the encoder to give better results than the raw dataset? Aren’t we just losing information by compressing?

I thought that the value of the compression would be that we would be dealing with a smaller dataset with less features.

I guess somehow it’s learned more useful latent features similar to how embeddings work? Is that the case?

Thanks

• Jason Brownlee December 7, 2020 at 1:36 pm #

We don’t expect it to give better performance, but if it does, it’s great for our project.

It is similar to an embedding for discrete data.

Yes – similar to dimensionality reduction or feature selection, but using less features is only useful if we get same or better performance.

• Jerry April 27, 2021 at 7:24 am #

I think what he was asking was, why on Earth would this compression EVER improve a result?

• Jason Brownlee April 28, 2021 at 5:54 am #

Thanks.

The bottleneck may be able to compress large/complex input features into a lower-dimensional space. This can be helpful as dimensionality reduction. This compression may or may not be helpful to predictive models, often it is.

• mustafa June 21, 2021 at 12:11 am #

Because input dimensions may be too large for our model to fit with the training data we have. By compressing input data, we can fit the model with less likelihood of overfitting.

2. John December 10, 2020 at 7:29 pm #

Thanks for this tutorial!

How does encoder.save(‘encoder.h5’) get the learned weights from the model object? How does instantiating a new model object using encoder = Model(inputs=visible, outputs=bottleneck) allow us to keep the weights?

• Jason Brownlee December 11, 2020 at 6:36 am #

Saving the model involves saving both the architecture and weights into a single file. We can then load it and use it directly.

In that line we define a new model with layers now shared between two models – the encoder-decoder model and the encoder model. We only keep the encoder model.

• David March 14, 2021 at 8:03 pm #

Dear Jason,
Sir I can’t see how did you eliminate the decodeing part and just extracting features from the encoding part, from the code !
I only see you using the whole model !

Please, I need to extract features from the decoding part then feed them to a classifier like the SVM !

• Jason Brownlee March 15, 2021 at 5:56 am #

We define an encoder model and save it by itself. We also define a complete model that re-uses some of the layers of the encoder. We don’t save this complete model.

Does that help?

• Yash Sharma May 12, 2021 at 4:39 am #

Hello sir… I just wanted to know which type of autoencoder you used in this .. like is it sparse, stacked or multilayer autoencoder.

• Jason Brownlee May 12, 2021 at 6:14 am #

I don’t know how it might fit into a taxonomy sorry. It’s just an autoencoder, nothing fancy.

3. Hilal December 11, 2020 at 6:45 am #

Dear Jason, thank you for all informative sharings. I confused in one point like John. How does new encoder model learns weights from the autoencoder or why don’t we compile encoder model?

• Jason Brownlee December 11, 2020 at 7:42 am #

You’re welcome.

We train the encoder as part of the autoencoder, but then only save the encoder part. The weights are shared between the two models.

No beed need to compile the encoder as it is not trained directly.

4. Usman December 11, 2020 at 1:44 pm #

Dear Jason,
Thanks for the nice tutorial. Is there an efficient way to see how the data is projected on the bottleneck? I would like to compare the projection with PCA.

• Jason Brownlee December 12, 2020 at 6:19 am #

Thanks!

Yes. You can create a PCA projection of the encoded bottleneck vectors if you like.

• Usman December 12, 2020 at 8:58 pm #

Alright. Do you have a tutorial for visualizing the principal components?

• Jason Brownlee December 13, 2020 at 6:02 am #

I don’t think so.

It’s pretty straightforward, retrieve the vectors, run a PCA and then scatter plot the result.

5. Sampa December 11, 2020 at 4:07 pm #

Thank you so much for this informative tutorial. Please let me know the required version of keras and tensorflow to implement this codes.

• Jason Brownlee December 12, 2020 at 6:22 am #

6. Siddheshwar Harkal December 11, 2020 at 5:35 pm #

Dear Jason
this is a classification problem then why we take the loss as MSE

• Jason Brownlee December 12, 2020 at 6:23 am #

We use MSE loss for the reconstruction error for the inputs – which are numeric.

7. arkadia December 11, 2020 at 8:45 pm #

Thanks for this tutorial. Is it possible to make a single prediction? Which transformation should do we apply?

• Jason Brownlee December 12, 2020 at 6:26 am #

Yes, encode the input with the encoder, then pass the input to the predict() function of the trained model.

• Meharunnisa M April 23, 2021 at 5:31 am #

Jason, now a days you are showing the code only in python.. kindly show the same in R language for R users too.. thank you

8. Abdelrahim December 11, 2020 at 10:36 pm #

Dear Jason, I think there is a typo mistake in
# train autoencoder for classification with no compression in the bottleneck layer
in filt calling
you writ “history = model.fit(X_train, X_train, epochs=200, batch_size=16, verbose=2, validation_data=(X_test,X_test)) ”
I think y_train Not 2 of X_train
with best regards
thnks for tutorial

• Jason Brownlee December 12, 2020 at 6:28 am #

No, it is correct.

The autoencoder is being trained to reconstruct the input – that is the whole idea of the autoencoder.

9. Igors Papka December 12, 2020 at 10:24 pm #

Dear Dr. Jason,
Thank you for the tutorial.

The method looks good for determining the number of clusters in unsupervised learning. I tried to reduce the dimensions with it and estimate the number of clusters first on the large synthetic dataset (more than 25000 instances and 100 features) with 10 informative features and then repeat it on the same real noisy data. I achieved good results in both cases by reducing the number of features to less than the informative ones, five in my case. This method helps to see the clear “elbows” of AIC, BIC informative criteria in the plot of the Gaussian Mixture Model, and fasten the work of algorithm in times.

• Jason Brownlee December 13, 2020 at 6:04 am #

You’re welcome.

Nice work, thanks for sharing your finding!

• Hareem Ayesha December 17, 2020 at 6:51 pm #

Hi… can we use this tutorial for multi label classification problem??

• Jason Brownlee December 18, 2020 at 7:15 am #

The autoencoder can be used directly, just change the predictive model that makes use of the encoded input.

10. JG December 13, 2020 at 7:41 am #

Hi Jason:

Thank you very much for all your free great tutorial catalog … one of the best in the world !.that serves as inspiration to my following work!

I share my conclusions after applying several modification to your baseline autoencoder classification code:

1.) Code Modifications:

1.1) I decided to compare accuracies results from 5 different classification models:
(LogisticRegression, SVC, ExtratreesClassifier, RandomForestClassifier, XGBClassifier)
1.2) I apply statistical evaluation to model results trough well known “KFold()” and “cross_val_score()” functions of SKLearn library
1.3) and very important I apply several rates of autoencoding features compression such as 1 (no compression at all), 1/2 (your election) , 1/4 (even more compressed) and of course not autoencoding and even expand features to double to see what happen (some kind of embedding?)) …

2.) my conclusion, after obtaining the same approach results as your LogisticRegression model, are the results are more sensitive to the model chosen:
sometimes autoencoding it is no better results that not autoencoding, and sometines 1/4 compression is the best …so a lot of variations that indicate you have to work in a heuristic way for every particular problem!
In particular my best results are chosen SVC classification model and not autoencoding bu on logistic regression model it is true the best results are achieved by autoencoding and feature compression (1/2).

It is a pity that I can no insert here (I do not know how?) my graphs results to visualize it!

As I said you provide us with the basic tools and concepts and then we can experiment variations on those ideas

• Jason Brownlee December 13, 2020 at 1:03 pm #

Thanks!

Well done, that sounds like a great experiment.

Likely results are limited by the synthetic dataset. Perhaps the results would be more interesting/varied with a larger and more realistic dataset where feature extraction can play an important role.

11. JG December 13, 2020 at 11:48 pm #

As a matter of fact I applied the same autoencoder analysis to a more “realistic” dataset as “breast cancer” and “diabetes pima india” and I got similar results of previous one, but with less accuracy around 75% for Cancer and 77% for Diabetes, probably because of few samples (286 for cancer and 768 for diabetes)…

In both cases cases LogisticRegression is now the best model with and without autoencoding and compression… I remember got same results using ‘onehotencoding’ in the cancer case …

So “trial and error” with different models and different encoding methods for each particular problema seem to be the only way-out…

• Jason Brownlee December 14, 2020 at 6:18 am #

Very nice work!

No silver bullet for feature extraction, and all that. Just another method in our toolbox.

12. Selma December 18, 2020 at 12:29 am #

Hello
I need a matlab code for this tutorial

• Jason Brownlee December 18, 2020 at 7:17 am #

Sorry, I do not have any matlab code.

13. Robert December 31, 2020 at 5:44 am #

Hi Jason,

Thank you very much for this insightful guide.

When using an AE solely for feature creation, can you skip the steps on decoding and fitting? i.e. just use the encoder part:

# define encoder
visible = Input(shape=(n_inputs,))

# encoder level 1
e = Dense(n_inputs*2)(visible)
e = BatchNormalization()(e)
e = LeakyReLU()(e)

# encoder level 2
e = Dense(n_inputs)(e)
e = BatchNormalization()(e)
e = LeakyReLU()(e)

# bottleneck
n_bottleneck = n_inputs
bottleneck = Dense(n_bottleneck)(e)

And then ‘create’ the new features by jumping to:

encoder = Model(inputs=visible, outputs=bottleneck)
X_train_encode = encoder.predict(X_train)
X_test_encode = encoder.predict(X_test)

In other words, is there any need to encode and fit when only using the AE to create features?

Thank you very much.

• Jason Brownlee December 31, 2020 at 9:23 am #

This is exactly what we do at the end of the tutorial.

14. Robert December 31, 2020 at 8:30 pm #

But you load and use the saved encoder at the end of the tutorial – encoder = load_model(‘encoder.h5’). Just wondering if encoding and fitting prior to saving the encoder has any impact at the end when creating. Thanks

• Robert December 31, 2020 at 8:31 pm #

* decoding and fitting

• Jason Brownlee January 1, 2021 at 5:24 am #

The encoder model must be fit before it can be used.

You can choose to save the fit encoder model to file or not, it does not make a difference to its performance.

The decoder is not saved, it is discarded.

• Robert January 9, 2021 at 2:25 am #

Why do we fit the encoder model in feature creation, if fitting is just used to reconstruct the input (which we don’t need)?

• Jason Brownlee January 9, 2021 at 6:44 am #

It is fit on the reconstruction project, then we discard the decoder and are left with just the encoder that knows how to compress input data in a useful way.

• Robert January 9, 2021 at 7:00 am #

Got it, thank you very much. Just wanted to ensure that the loss and val_loss are still relevant when using the latent representation, even though the decoder is discarded.

• Jason Brownlee January 9, 2021 at 8:35 am #

The loss is only relevant to the task of reconstructing input.

The encoding achieved at the bottleneck layer may or may not be helpful to a prediction task using the input data, it depends on the specific dataset.

Generally, it can be helpful – the whole idea of the tutorial is to teach you how to do this so you can test it on your data and find out.

• Robert January 9, 2021 at 7:35 pm #

Ok so loss is not relevant when only taking the encoded representation. I am trying to compare different (feature extraction) autoencoders. I was hoping to do so by comparing the loss and val_loss, but I guess doing so is only relevant when fitting a model for classification, after extracting the AE features.
Thanks

• Jason Brownlee January 10, 2021 at 5:38 am #

Yes, the only relevant comparison (for predictive modeling) is the effect on a classifier/regressor that uses the encoded input.

15. Abdelrahman January 11, 2021 at 9:56 am #

Dear Jason,
I am going to use the encoder part as a tool that generates a new features and I will combine them with the original data set.
So, How can I control the number of new features I want to get, in the code?

• Jason Brownlee January 11, 2021 at 10:28 am #

Good question.

Control over the number of features in the encoding is via the number of nodes in the bottleneck layer.

• Abdelrahman Fayed January 12, 2021 at 11:32 am #

I already did, But it always gives me number of features like equal my original input.
Here is the code I changed.
Or if you have time please send me the modified version which gave me 10 new featues.
[email protected]

# define encoder
visible = Input(shape=(n_inputs,))
# encoder level 1
e = Dense(round(float(n_inputs) / 2.0))(visible)
e = BatchNormalization()(e)
e = LeakyReLU()(e)
# encoder level 2
e = Dense(round(float(n_inputs) / 2.0))(e)
e = BatchNormalization()(e)
e = LeakyReLU()(e)
# bottleneck
n_bottleneck = 10
bottleneck = Dense(n_bottleneck)(e)

• Jason Brownlee January 12, 2021 at 12:35 pm #

Sorry, I don’t have the capacity to customize the tutorial for you.

16. MONIKA KUMARI January 24, 2021 at 10:44 am #

Can we use this code for multi-class classification? Which lines will be tweaked in that case?

• MONIKA KUMARI January 27, 2021 at 5:09 am #

Jason Brownlee, please give a hint at least, I am searching an article on autoencoder for multiclass classification for weeks.

17. Mohamed Yousuff January 25, 2021 at 6:08 pm #

Dear Jason,
Thank you very much for your tutorials!
I need some clarification in this following code

# encode the train data
X_train_encode = encoder.predict(X_train)
# encode the test data
X_test_encode = encoder.predict(X_test)

My first query is, what actually we do in this code?
My second query is, if we have the embedding (i.e compressed data ) of dataset then we can proceed directly from the bottleneck layer output to logistic regression classification model. why we need this above code, I mean why we predict for new x_train

• Jason Brownlee January 26, 2021 at 5:49 am #

We are using the trained encoder to encode the input data for train and test sets.

The output of the encoder is the bottleneck.

This encoded data is fed as input to the logistic regression model.

• Mohamed Yousuff January 27, 2021 at 4:15 am #

Dear Jason,

Please tell me have to extract the latent space features given by the bottle neck layer as CSV file. Is it possible?.The type(encoder) is tensorflow.python.keras.engine.functional.Functional

• Jason Brownlee January 27, 2021 at 6:16 am #

You can if you like, predict each input via the encoder and save results to csv.

• Mohamed Yousuff January 30, 2021 at 12:37 am #

Dear Jason,

The features extracted from the bottleneck layer has NAN values, please give some suggestions to get rid of them.

Thank you

• Jason Brownlee January 30, 2021 at 6:35 am #

That is surprising, perhaps these tips will help:
https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me

• Mohamed Yousuff January 30, 2021 at 11:24 am #

Dear Jason,

Sorry, your code is working perfectly fine for me but I tried this with my own problem then I got these NAN values so I asked you to suggest some good practices or may be the reason or solution to avoid it.

Thank you

• Jason Brownlee January 30, 2021 at 12:35 pm #

Perhaps check that you scaled your data prior to modeling and that your data does not contain nan values.

18. Saurabh Sawhney January 27, 2021 at 1:12 am #

Good stuff. Tried exploding the number of features with polynomials, and then passing them through the autoencoder to get rid of the useless ones. Worked like a charm. Thanks.

• Jason Brownlee January 27, 2021 at 6:10 am #

Wow, very cool idea to test!

(why didn’t I think of that???)

Well done!

19. angel March 13, 2021 at 6:34 pm #

Dear Jason,
Can Auto Encoder be used to classify multiple classes? For example, 5 classes?

20. maryam March 13, 2021 at 6:46 pm #

Dear Dr. Jason,
Thank you for the tutorial.
I looked at the code but did not know where to apply my database.
in which section and how can I do this?

21. Atefeh March 27, 2021 at 4:24 am #

Hello Dr. Jason,

I have a problem with my input shape when I want to define encoder and decoder part.
My input shape is : (75, 75, 3). I have no idea how should adjust conv layer according to my input.

22. Ka246 March 28, 2021 at 9:27 pm #

Dear Dr. Jason,

I am working on time-series data. I know the input data is compressed in the encoded state and the features can be visualized on that compressed data. But, I want to visualize my original input data on those encoded features (like we can visualize in PCA and clusters). Is it possible to do so? If yes, please suggest!

23. Emmanuel Adjei April 7, 2021 at 2:31 pm #

Hi, Jason,
Thank you so much for this tutorial. I’m working on a fault detection classification. There two datasets involved. One is fault free data (normal) and the other is on faulty operations which contains the labels of the faults. Can I train, the autoencoder on the fault free(normal) and later use the encoder on the “fault” data for SVM classification?

• Jason Brownlee April 8, 2021 at 5:05 am #

Perhaps try it and compare results to a model operating on both datasets.

24. Emmanuel Adjei April 8, 2021 at 6:16 am #

so as long as the encoder is applied to a similar data it should be good right (sort of transfer learning)?

25. Tridib April 12, 2021 at 6:27 am #

Hello Jason,

Your tutorials have been a lot of help to me when I was learning this stuff. In fact, even now, when I am looking up something related to implementing something using Python, particularly neural net related, first thing I try is to look for one of your tutorials. Having said that, recently I found no such information on using TFP (tensorflow’s probability module) for doing probabilistic programming, particularly VAEs. Although this may not be a good place to ask about VAEs, but I would give it a try nonetheless.
Could you do a small tutorial on this subject matter using TFP ?

• Jason Brownlee April 13, 2021 at 6:00 am #

Thanks!

Thank you for the suggestion.

26. Pami April 15, 2021 at 6:50 am #

Hey Jason! Thanks. Can you please how would we modify modify.fit() when using own image dataset? Perhaps using ImageDataGenerator of Keras, but how do we use it in model.fit()?

27. Pami April 15, 2021 at 6:51 am #

Sorry, typo.
model.fit()

28. Stella April 25, 2021 at 6:23 pm #

How do you reshape the encoded data (in your case this is X_train_encode) so that it is two dimensional and aligns with the y train. I trained an autoencoder and my resulting x_train_encode had a latent space of 32 X 32 X 32 though I originally had 5900 images and each were 254 X 254 X 254. Now, how do I match this matrix of 32 x 32 x32 with my y_train and the photos for training with classifiers like KNN or SVM? Thank you!

29. Yahya May 7, 2021 at 9:45 am #

Hi,

Thank you so much for the post!
I am wondering why the validation loss is lower than the training loss?
Is it related to the way tensorflow computes losses?

Thanks,

• Jason Brownlee May 8, 2021 at 6:28 am #

It may be a statistical fluke.
It may be that the validation dataset is small and not representative of the training set.

• Etron June 26, 2021 at 9:44 pm #

Hi,

I’m training a model with a similar architecture, and I also found that the validation loss is much lower than the training loss. Although it doesn’t affect the result of my model, I’d like to figure out why such nonsense situation happens all the time. I first thought there may be data leakage, and therefore I have used different ways to split my training and validation dataset. However, it is still the same case. After a few more attempts, I soon solve the problem when I changed the loss function from mean squared error to (1 – structural similarity). So I’m wondering if the computation of the losses cause the problem just as Yahya mentioned. Please let me know if you have any new thought on the issue after seeing my reply.

Thanks

• Jason Brownlee June 27, 2021 at 4:37 am #

Perhaps the validation dataset is too small or not representative of the training dataset.

• Etron June 27, 2021 at 6:46 pm #

I’d tried to split my dataset into half, with 50% of it as training set and the another half as validation set. And I also tried not to shuffle the dataset, but these doesn’t change much. I’ve never seen that other than autoencoder. It’s just weird, and I’m not sure if I should ignore the issue.

• Jason Brownlee June 28, 2021 at 7:57 am #

Perhaps explore alternate model configs too?

30. bella May 10, 2021 at 7:52 pm #

can we use the encoder as a data preparation step to train a neural network model?

• Jason Brownlee May 11, 2021 at 6:40 am #

31. bella May 14, 2021 at 7:30 pm #

i want to pretrained the model using autoencoder to get weight inisialization, and then use the weight for neural network model. do you have the tutorial for me?

32. goveror May 22, 2021 at 5:54 am #

how we can find accuracy for this classifier I have need the values of accuracy not graphical representation from x-ray images in python I need source code

33. debmalya sarkar June 17, 2021 at 9:24 am #

Hi Jason

Thanks for the amazing tutorial.
When I ran the code for my dataset, the model ran and the loss decreased with the epochs. But a warning came-

WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. model.compile_metrics will be empty until you train or evaluate the model.

Do you know why this coming? I couldn’t find anything online. Thanks in advance

• Jason Brownlee June 18, 2021 at 5:36 am #

You’re welcome.

You can probably safely ignore that warning for now.

34. karim June 21, 2021 at 2:58 am #

Hey, Jason,
Thank you so much for this tutorial. i wanna create a model for authorship verification form text to determine if two document are writing by the same author or not, using the autoencoder but i have a lot of problems i don’t understand the dataset and how can i train and build my model.

this is the architecture of data set:
the first .jsonl file is as below :

{“id”: “6cced668-6e51-5212-873c-717f2bc91ce6”, “fandoms”: [“Fandom 1”, “Fandom 2”], “pair”: [“Text 1…”, “Text 2…”]}
{“id”: “ae9297e9-2ae5-5e3f-a2ab-ef7c322f2647”, “fandoms”: [“Fandom 3”, “Fandom 4”], “pair”: [“Text 3…”, “Text 4…”]}

and the second .jsonl truth file is :

{“id”: “6cced668-6e51-5212-873c-717f2bc91ce6”, “same”: true, “authors”: [“1446633”, “1446633”]}
{“id”: “ae9297e9-2ae5-5e3f-a2ab-ef7c322f2647”, “same”: false, “authors”: [“1535385”, “1998978”]}

• Jason Brownlee June 21, 2021 at 5:40 am #

Perhaps you could experiment with different framings of the problem?

Perhaps you could model it as a binary classification task with a model that takes text from two sources?

• karim June 22, 2021 at 6:01 am #

I ask you for a favour sir if you can propose for me some Techniques of deep learning for this data

• Jason Brownlee June 22, 2021 at 6:34 am #

Test a suite of techniques and discover what works well or best.

35. Dammie July 8, 2021 at 10:57 pm #

Hi Jason, thanks for sharing your knowledge with the community. i have really leant a lot from you. infact your blogs and books are my go-to when i have doubts. i just completed your tutorial on autoencoder and would like your expert guide on a problem stated below:

dataframe_a =ID, col1, col2, col3, col4,col5,col6,col7,col8,col9 …..col21,label

dataframe_b = ID, col_A, col_B, col_C, col_D

dataframe_a has shape (3250, 23) while dataframe_b has shape (64911, 5). note: dataframe_b has no label.

performing inner join to merge both data on ‘ID’ gave a small dataof shape (274, 27) and the model perform badly

i have already trained a binary classification model on the first data (dataframe_a) and achieved an accuracy of ~70% to predict the label. it is my believe that there are some information in the second data(dataframe_b) which would help improve my model performance but as i mentioned above mapping both data on feature ‘ID’ gave a really small data.

how i can use autoencoder in combination with the model i had already trained on dataframe_a to achieve a better accuracy. Also, if you have a use-case of related to my question, please share it.

• Jason Brownlee July 9, 2021 at 5:11 am #

Perhaps you can mark missing values and then impute them or use a model that can ignore them.

Perhaps you can use a multi-input model that takes additional data when available or all zeros otherwise.

36. Fernanda July 15, 2021 at 4:34 pm #

Thank you Jason for this very well explained tutorial, as usual.

I just wonder why did you choose Adam optimizer, is there a reason behind? Which nother opotimizer would you choose here and why?

Thank you!

• Jason Brownlee July 16, 2021 at 5:21 am #

You’re welcome.

I chose Adam because it works well in most cases. You can use any optimize you like.

37. Sepi July 26, 2021 at 8:27 pm #

Hi Jason,

Thank you very much for your great tutorial. I would like to use an autoencoder for dimension reduction of some 1D data (light spectrums). I have only 180 samples (from 17 patients) which each of which includes 1000 points, so the input dimension is 180*1000, and this is raw data with no feature extraction done before. I need to classify these data into two classes (cancer, non-cancer) but as the number of samples is low (180), I think it is better that I reduce the dimension from raw data=1000 to for example 50 and then apply classification for example a fully connected dense network.

I was thinking to do such a raw data dimension reduction with autoencoder as I have no idea what features I can manually extract from raw data and I thought autoencoder could do automatic feature extraction for me, and then I can use the feature vectors (e.g 180*50) as an input for any classifier.

In your tutorial you did dimension reduction from 1000*100 > 1000*50, Would you please tell me if you think I can use your approach for my data considering the little sample size I have? I would like to reach for example 180*1000 > 180*50. And also would be very kind of you if you recommend me, in general, any solution to solve this classification problem considering that I have a little data set and I also do not know which features to extract (That is why I am thinking of neural networks and possibly deep learning).

Best regards
Sepi

• Jason Brownlee July 27, 2021 at 5:06 am #

Perhaps you can try it and compare results to fitting a model on the raw data directly then use whatever works best.

38. Sepi July 28, 2021 at 3:56 pm #

Hi Jason,

Thanks for your answer. Could you please tell me what do you mean by ‘fitting a model on the raw data directly’? Do you mean for example applying a fully connected network (dense) for classification using raw data (no feature extraction)?

Best
Sepi

39. Reini October 10, 2021 at 8:46 am #

Dear Jason,

love your work, thanks a lot for everything!

Do you know if there is a possibility to retrieve the weights of the encoder, so that you can remap it on the original data to investigate which features were selected?

Best,
reini

• Adrian Tam October 13, 2021 at 6:00 am #

If you have a layer, you can do layer.get_weights(); but that’s only for one layer at a time. You can’t do that with a model at once.

40. Bean October 12, 2021 at 4:17 am #

Greeting Dr. Jason,

Can autoencoder work with all types of datasets?
I am working on student performance data that including the student demographic information, performance in classes (final scores), and the final result (pass or no pass). When I use autoencoder, I get very weird results. I do know where was my mistake but sometimes I wonder can autoencoder deal with this kind of data!

• Adrian Tam October 13, 2021 at 7:31 am #

What do you expect for an autoencoder in this case? My favorite explanation of an autoencoder is a lossy compression of the input. If you can related the compression to your problem, then autoencoder is a good model for it.

41. Bean October 19, 2021 at 3:02 am #

I am just trying to see how the autoencoder (feature extraction) can help to increase the performance of a predictive model that uses any traditional classifier. That would be by comparing it to the same classifier without using extract the salient features. I asked because I didn’t see any example of an autoencoder working on the same type of data!