Frequently Asked Questions

Do you have a question? Perhaps I have answered it before.

Browse Questions:

Questions are organized by category, if you cannot find your question, see the “Need More Help?” section at the bottom of the page.

General Questions (8)

There are a few ways that you can give back and support me and this website.

1. Spread the Word

Word of mouth is still a really big deal.

Share a note on social media about Machine Learning Mastery.

For example:

I am loving the machine learning tutorials on https://machinelearningmastery.com

2. Purchase an Ebook

My best advice on applied machine learning and deep learning is captured in my Ebooks.

They are designed to help you learn and get the results the fastest way I know how.

All Ebook sales directly fund and support my continued work on this website and creating new tutorials.

3. Make a Donation or Become a Patron

If you want to support this site financially but cannot afford an Ebook, then consider making a one time donation.

Think of it like buying me a coffee or adding a few dollars to the tip jar. Every little bit helps.

You may also want to become a patron and donate a small amount each month.

Thank You!

This site would not exist without the support of generous readers like you!

I write every day.

I wrote (almost) all of the posts on this site and I wrote all of the books in the catalog.

I make it a priority to write and dedicate time every single day to replying to blog comments, replying to emails from readers, and to writing and editing tutorials on applied machine learning.

I very rarely take a break and even write while on vacation.

I aim to be super responsive and accessible, so much so, that some readers and customers get frustrated when I take a long haul flight.

I’m also fortunate that working on Machine Learning Mastery is my full-time job, and has been since May 2016. Before that, it was a side project where I wrote early in the morning before work (4:30 am to 5:30am) and on weekends.

If you want to get a lot done, you need to put in the hours.

I don’t think I’m a machine learning master.

I do believe in striving for mastery.

Mastery is a journey. I think it is beyond just competence, it is a journey without an end.

Applied machine learning is endlessly fascinating. It is an enormous field, always with new methods or more detail to learn. Machine learning mastery means continuous self-study.

I want to help you on this journey by pointing out a few things I have learned that I think will short-cut parts of this journey, such as:

  • How to get started.
  • How to get results.
  • What to focus on.

I hope that I can help you on your journey.

There are many ways to study machine learning.

For example, a classical academic approach is to start by learning the mathematical prerequisite subjects, then learn general machine learning theories and their derivations, then the derivations of machine learning algorithms. One day you might be able to run an algorithm on real data.

This approach can crush the motivation of a developer interested in learning how to get started and use machine learning to add value in business.

An alternative is to study machine learning as a suite of tools that generate results to problems.

In this way, machine learning can be broken down into a suite of tutorials for studying the tools of machine learning. Specifically, applied machine learning, because methods that are not useful or do not generate results are not considered, at least not at first.

By focusing on how to generate results, a practitioner can start adding value very quickly. They can also filter possible areas to study in broader machine learning and narrow their focus to those areas that produce results that are directly useful and relevant to their project or goals.

I call this approach to studying machine learning “results-first“, as opposed to “theory-first“.

Undergraduate and postgraduate courses on machine learning are generally designed to teach you theoretical machine learning. They are training academics.

This also applies to machine learning textbooks, designed to be used in these courses.

These courses are great if you want to be a machine learning academic. They may not be great if you want to be a machine learning practitioner.

The approach starts from first principles and is rooted in theory and math. I refer to this as bottom-up machine learning.

An alternate approach is to focus on what practitioners need to know in order to add value using the tools of machine learning in business. Specifically, how to work through predictive modeling problems end-to-end.

Theory and math may be used but is touched on later, in the context of the process of working through a project and only in ways that make working through a project clearer or allow the practitioner to achieve better results.

I recommend that developers that are interested in being machine learning practitioners use this approach.

I refer to it as top-down machine learning.

You can discover more about how you are already using the top-down approach to learning in this post:

Learn more about the contrast between bottom-up and top-down machine learning in this post:

Learn how to get started with top-down machine learning here:

Thanks for your interest.

I have a bunch of degrees in computer science and artificial intelligence and I have worked many years in the tech industry in teams where your code has to work and be maintainable.

If you want to see a detailed resume, I have a version on LinkedIn:

I also have a blog post that gives some narrative on my background.

Academia was a bad fit for me, but I loved to research and to write.

I think of myself as a good engineer that really wants to help other people get started and get good at machine learning, without wasting years of their life “getting ready to get started“.

Learn more about my approach to teaching applied machine learning here:

Just me, Jason Brownlee.

  • No big team of writers.
  • No faceless company.
  • No help desk staff.

I’m a real person with a passion for machine learning, a passion for helping others with machine learning, and I sell Ebooks to keep this site going and to support my family.

Note that I do hire contractors like a copy editor to catch typos and and technical editors to test the code in my books.

 

I started this site for two reasons:

1) I think machine learning is endlessly interesting.

I’ve studied and worked in a few different areas of artificial intelligence, computational intelligence, multi-agent systems, severe weather forecasting, but I keep coming back to applied machine learning.

2) I want to help developers get started and get good at machine learning.

I see so many developers wasting so much time. Studying the wrong way, focusing on the wrong things, getting ready to get started in machine learning but never pulling the trigger. It’s a waste and I hate it.

Learn more here:

Practitioner Questions (123)

All code examples were designed to run on your workstation.

If you need help setting up your Python development environment, a tutorial is provided in the appendix of most books showing you exactly how to do this.

You can also see a tutorial on this topic here:

You can also run deep learning examples on AWS EC2 instances that provide access to GPU cheaply. Again, all deep learning books provide an appendix with a tutorial on how to run code on EC2.

You can also see a tutorial on this topic here:

I understand that Google Colab is a cloud-based environment for running code in notebooks.

I have not used Google Colab and I have not tested the code examples in Google Colab.

I generally recommend against using notebooks if you are a beginner as they can introduce confusion and additional problems.

Nevertheless, some of readers report that have run code examples on Google Colab successfully.

Generally, you do not need special hardware for developing deep learning models.

You can use a sample of your data to develop and test small models on your workstation with the CPU. You can then develop larger models and run long experiments on server hardware that has GPU support.

I write more about this approach in this post:

For running experiments with large models or large datasets, I recommend using Amazon EC2 service. It offers GPU support and is very cheap.

This tutorial shows you how to get started with EC2:

I do not have examples of Restricted Boltzmann Machine (RBM) neural networks.

This is a type of neural network that was popular in the 2000s and was one of the first methods to be referred to as “deep learning”.

These methods are, in general, no longer competitive and their use is not recommended.

In their place I would recommend using deep Multilayer Perceptrons (MLPs) with the rectified linear activation function.

I generally do not have material on big data, or on the platforms used for big data (e.g. Spark, Hadoop, etc.).

Although, I do have the odd post on the topic, for example:

I focus on teaching applied machine learning, mostly on small data.

For beginners, I recommend learning machine learning on small data first, before tackling machine learning on big data. I think that you can learn the processes and methods fast using small in-memory datasets. Learn more here:

In practice, there are sufficient tools to load large datasets into memory or progressively from disk for training large models, such as large deep learning models. For example:

We still require machine learning with big data. You are still working with a random sample, with all that this entails when it comes to approximating functions (the job of machine learning methods).

Further, machine learning on big data often requires specialized versions of standard machine learning algorithms that can operate at scale on the big-data platforms (e.g. see Mahout and Spark MLlib).

 

Sorry, I do not have material on time series forecasting in R.

I do have a book on time series in Python.

There are already some great books on time series forecast in R, for example, see this post:

I do not have tutorials in Octave or Matlab.

I believe Octave and Matlab are excellent platforms for learning how machine learning algorithms work in an academic setting.

I do not think that they are good platforms for applied machine learning in industry, which is the focus of my website.

Sorry, I do not have tutorials on AI or AGI.

I focus on predictive modeling with supervised learning, and maybe a little unsupervised learning.

These are the areas of machine learning that the average developer may need to use “at work“.

For a good layman introduction to AI, I recommend:

For a good technical introduction to AI, I recommend:

Sorry, I don’t current have any tutorials on chatbots.

I may cover the topic in the future if there is significant demand.

I do have tutorials on preparing text data for modeling and modeling text problems with deep learning.

You can get started here:

Sorry, I do not have tutorials on deep learning in R.

I have focused my deep learning tutorials on the Keras library in Python.

The main reason for this, is that skills in machine learning and deep learning in Python are in huge demand. You can learn more in this post:

I believe Keras is now supported in R, and perhaps much of the library has the same API function calls and arguments.

You may be able to port my Python-based tutorials to R with little effort.

Sorry, I don’t currently have any tutorials on deep reinforcement learning.

I may write about the topic in the future.

At this stage, I am focused on predictive modeling with supervised learning, and maybe a little unsupervised learning.

I am not convinced that deep reinforcement learning is useful to the average developer working on predictive modeling problems “at work“.

The results are remarkable but might currently be limited to boardgames (Go, Shogi, Chess, etc.) and video games (Atari, Quake3, StarCraft II, DOTA2, etc.).

Some recommended reading on DeepMind’s and OpenAI’s impressive results:

Some skeptical comments:

Machine Learning or ML is the study of systems that can learn from experience (e.g. data that describes the past). You can learn more about the definition of machine learning in this post:

Predictive Modeling is a subfield of machine learning that is what most people mean when they talk about machine learning. It has to do with developing models from data with the goal of making predictions on new data. You can learn more about predictive modeling in this post:

Artificial Intelligence or AI is a subfield of computer science that focuses on developing intelligent systems, where intelligence is comprised of all types of aspects such as learning, memory, goals, and much more.

Machine Learning is a subfield of Artificial Intelligence.

Machine learning is a subfield of computer science and artificial intelligence concerned with developing systems that learn from experience.

You can learn more about the definition of machine learning in this post:

When most people talk about machine learning, they really mean predictive modeling. That is, developing models trained on historical data used to make predictions on new data.

You can learn more about predictive modeling in this post:

Big data refers to very large datasets.

Big data involves methods and infrastructure for working with data that is too large to fit on a single computer, such as a single hard drive or in RAM.

An exciting aspect of big data is that simple statistical methods can reveal surprising insights, and simple models can produce surprising results when trained on big data. An example is the use of simple word frequencies prepared on a very big dataset instead of the use of sophisticated spelling correction algorithms. For some problems data can be more valuable than complex hand-crafted models. You can learn more about this here:

Even so, a “big data” dataset is still a random sample, and can benefit from the methods from applied machine learning.

An important consideration when using machine learning on big data is that they often require modification to operate at the scale on infrastructure such as Hadoop and Spark.

Two examples of libraries of machine learning methods on big data include Mahout and Spark MLlib.

Machine learning is a subfield of computer science and artificial intelligence concerned with developing systems that learn from experience.

You can learn more about the definition of machine learning in this post:

When most people talk about machine learning, they really mean predictive modeling. That is, developing models trained on historical data used to make predictions on new data.

You can learn more about predictive modeling in this post:

Data science is a new term that means using computational and scientific methods to learn from and harness data.

A data scientist is someone with skills in software development and machine learning who may be tasked with both discovering ways to better harness data within the organization toward decision making and developing models and systems to capitalize on those discoveries.

A data scientist uses the tools of machine learning, such as predictive modeling.

I have a few posts on data science here:

I generally try not to use the term “data science” or “data scientist” as I think they are ill defined. I prefer to focus on and describe the required skill of “applied machine learning” or “predictive modeling” that can be used in a range of roles within an organization.

There are many roles in an organization where machine learning may be used. For a fuller explanation, see the post:

Machine Learning or ML is the study of systems that can learn from experience (e.g. data that describes the past). You can learn more about the definition of machine learning in this post:

Predictive Modeling is a subfield of machine learning that is what most people mean when they talk about machine learning. It has to do with developing models from data with the goal of making predictions on new data. You can learn more about predictive modeling in this post:

Deep Learning is the application of artificial neural networks in machine learning. As such, it is a subfield of machine learning. You can learn more about deep learning in this post:

There is a lot of overlap between statistics and machine learning.

We may explore predictive modeling in statistics. Machine learning may use methods developed and used in statistics, e.g. linear regression or logistic regression.

Statistics is mostly focused on understanding or explaining data. Models are designed be be descriptive an interpretable, to have goodness of fit. For more on statistics, see:

Machine learning is mostly focused on predictive skill. Models are chosen based on how well they make skillful predictions (and somewhat to maximize parsimony).  More on machine learning here:

Actually, practitioners talk about machine learning, but often mean a sub-filed called predictive modeling. More on predictive modeling here:

A good example that makes the differences clear is in statistics we start with the idea of using a linear regression or a logistic regression model then beat the data into shape to meet the expectations or requirements of our pre-chosen model. In machine learning, we are agnostic to the model and only care what works well or best, allowing us to explore a suite of different approaches.

For more information on this distinction, see:

A neural network designed for a regression problem can easily be changed to classification.

It requires two changes to the code:

  1. A change to the output layer.
  2. A change to the loss function.

A neural network designed for regression will likely have an output layer with one node to output one value and a linear activation function, for example, in Keras this would be:

We can change this to a binary classification problem (two classes) by changing the activation to sigmoid, for example:

We can change this to a multi-class classification problem (more than two classes) by changing the number of nodes in the layer to the number of classes (e.g. 3 in this example) and the activation function to softmax, for example:

Finally, the model will have an error based loss function, such as ‘mse‘ or ‘mae‘, for example:

We must change the loss function for a binary classification problem (two classes) to binary_crossentropy, for example:

We must change the loss function for a multi-class classification problem (more than two classes) to categorical_crossentropy, for example:

That is it.

Generally, I would recommend re-tuning the hyperparameters of the neural network for your specific predictive modeling problem.

For some examples of neural networks for classification, see the posts:

I recommend running large models or long-running experiments on a server.

I recommend only using your workstation for small experiments and for figuring out what large experiments to run. I talk more about this approach here:

I recommend using Amazon EC2 service as it provides access to Linux-based servers with lots of RAM, lots of CPU cores, and lots of GPU cores (for deep learning).

You can learn how to setup an EC2 instance for machine learning in these posts:

You can learn useful commands when working on the server instance in this post:

Text must be converted to numbers before you can use it as input to a machine learning model.

The first step is to determine your vocabulary of words, then assign a unique integer to each word.

You control the complexity of your modeling task by controlling the size of the vocabulary of supported words. Words that are not supported by your chosen vocabulary can be mapped to the integer value 0, which stands for “unknown“.

This is generally referred to as cleaning text data, you can learn more about it here:

Documents of integer encoded words can then be encoded to a vector representation to be fed into a machine learning model.

There are many approaches to encoding documents and the general approach is referred to as a bag of words. Common encodings include boolean (word is present or not), count of words, frequency of words (TF-IDF), and more.

For more on the bag of words model, see the tutorial:

A more modern approach to encoding integer encoded documents is to use a word embedding, also referred to by one of the specific techniques called word2vec.

This encoding is commonly used with neural networks that take text, but it can be used with any machine learning model.

For more on word embeddings, see the tutorial:

I am not an expert in modeling Covid-19 data.

The fact the whole world is focused on the problem of the Covid-19 pandemic means some of the best modelers in the world are already working on this problem, and may make their data, models, and analysis available to the public directly. I recommend seeking out these sources.

If you are interested in modeling the number of cases per day for a location, a simple exponential function can be used, such as the GROWTH() function in excel.

This is a deep question.

From a high-level, algorithms learn by generalizing from many historical examples, For example:

Inputs like this are usually come before outputs like that.

The generalization, e.g. the learned model, can then be used on new examples in the future to predict what is expected to happen or what the expected output will be.

Technically, we refer to this as induction or inductive decision making.

Also see this post:

This is an open question, but I have some ideas.

1) Perhaps you can formulate an existing problem from your industry as a supervised learning problem and see if machine learning algorithms can perform well or better than other methods.

This framework may help:

2) Perhaps you can search the literature for applications of machine learning to your domain to see what is common or popular and use that as inspiration or a starting point.

You can search the machine learning literature here:

3) Perhaps you can search for datasets from your industry that you can use for inspiration or practice.

You can search for machine learning datasets here:

You can’t.

Accuracy is a measure for classification.

You calculate the error for regression.

Learn more here:

Classification Accuracy is a performance metric for classification predictive modeling problems.

It is the percentage of correct predictions made by a model, calculated as the total number of correct predictions made divided by the total predictions that were made:

Classification accuracy cannot be calculated for one class. It can only be calculated for all predictions made by a model.

To get an idea of the types of class prediction errors made by a model, you may want to calculate a Confusion Matrix for all predictions made:

Alternately, you might instead want to calculate the Precision and Recall for each class:

For more help on choosing a performance metric for classification problems, see the tutorial:

It is possible to overfit the training data.

This means that the model is learning the specific random variations in the training dataset at the cost of poor generalization of the model to new data.

This can be seen by improved skill of the model at making predictions on the training dataset and worse skill of the model at making predictions on the test dataset.

A good general approach to reducing the likelihood of overfitting the training dataset is to use k-fold cross-validation to estimate the skill of the model when making predictions on new data.

A second good general approach in addition to using k-fold cross-validation is to have a hold-out test set that is only used once at the end of your project to help choose between finalized models.

Most Python libraries have a “__version__” attribute that you can query to get the current installed version.

For example, the script below can be used to print the version of many Python libraries used in machine learning:

Running this script will print the version numbers.

For example, you may see results like the following:

You can discover what version of Python you are using by typing a command line.

On the command line, type:

You should then see the Python version printed.

For example, you may see something like:

The Keras deep learning library allows you to define the input layer on the same line as the first hidden layer of the network.

This can be confusing for beginners as you might expect one line of code for each network layer including the input layer.

Let’s make this concrete with an example.

Imagine this was the first line in a defined neural network:

In this case the input layer for the network is defined by the “input_dim” argument and the network expects 8 input variables. This means that your dataset (X) must have 8 columns.

The first hidden layer of the network will have 12 nodes, defined by the first argument to the Dense() layer.

This sounds like an engineering question, not a machine learning question.

If you need help deploying a machine learning model, I have some general tips here that may help:

Making an application depends on the requirements of the project, such as who is it for, who will be operating the application, in what environment, etc.

The engineering question of how to deploy a python file to an operational environment is not the focus of this website, but I may have some suggestions to help get you started:

Distribute as Python Package

Perhaps you want to distribute your code as a Python package that can be installed on other systems, if so, this may help:

Make Available as Web API

Perhaps you want to make your code available via a web API, if so this may help:

Standalone Executable

Perhaps you want to embed your code within a standalone executable, if so this may help:

Embedded in Application

Perhaps you want to embed your code within an existing application if so, you will need to coordinate with the developers of the existing application.

Clustering algorithms are a type of unsupervised machine learning algorithm that automatically discover natural groupings in data.

For examples of how to use clustering, see the tutorial:

There are many ways to evaluate the performance of a clustering algorithm on a dataset based on the clusters that were discovered.

A good starting point is to think about how you want to use the clusters in your application or project and then use that to frame or think about metrics that you can use to evaluate clusters that were discovered.

There are standard metrics that you can use to evaluate clusters that were discovered and most are based around the idea that you have a dataset where you already know what clusters should have been discovered.

The scikit-learn Python machine learning library provides a number of standard clustering evaluation metrics, you can learn more about them here:

You can evaluate a machine learning algorithm on your specific predictive modleing problem.

I assume that you have already collected a dataset of observations from your problem domain.

Your objective is to estimate the skill of a model trained on a dataset of a given size by making predictions on a new test dataset of a given size. Importantly, the test set must contain observations not seen during training. This is so that we get a fair idea of how the model will perform when making predictions on new data.

The size of the train and test datasets should be sufficiently large to be representative of the problem domain. You can learn more about how much data is required in this post:

A simple way to estimate the skill of the model is to split your dataset into two parts (e.g. a 67%/33% train/test split), train on the training set and evaluate on the test set. This approach is fast, and is suitable if your model is very slow to train or you have a lot of data and a suitably large and representative train and test sets.

  • train/test split.

Often we don’t have enough data for the train and test sets to be representative. There are statistical methods called resampling methods that allow us to economically reuse the one dataset and split it multiple times. We can use the multiple splits to train and evaluate multiple models, then calculate the average performance across each model to get a more robust estimate of the skill of the model on unseen data.

Two popular statistical resampling methods are:

  • k-fold cross-validation.
  • bootstrap.

I have tutorials on how to create train/test splits and use resampling methods for a suite of machine learning platforms on the blog. Use the search feature. Here are some tutorials that may help to get started:

I have tutorials on how resampling methods work on the blog. Use the search feature. Here is a good place to get started:

I teach a top-down and results-first approach to machine learning.

This means that you very quickly learn how to work through predictive modeling problems and deliver results.

As part of this process, I teach a method of developing a portfolio of completed projects. This demonstrates your skill and gives you a platform from which to take on ever more challenging projects.

It is this ability to deliver results and the projects that demonstrate that you can deliver results is what will get you a position.

Business use credentials as a shortcut for hiring, they want results more than anything else. Smaller companies are more likely to value results above credentials. Perhaps focus your attention on smaller companies and start-ups seeking developers with skills in machine learning.

Here’s more information on the portfolio approach:

Here’s more information on why you don’t need a degree:

There are many ways to get started in machine learning.

You need to find the one way that works best for your preferred learning style.

I teach a top-down and results-first approach to machine learning.

The core of my approach is to get you to focus on the end-to-end process of working through a predictive modeling problem. In this context everything (or most things) starts to make sense.

My best advice for getting started is broken down into a 5-step process:

I’m here to help and answer your questions along the way.

I focus on teaching machine learning, not programming.

My code tutorials generally assume that you already know some Python programming.

If you don’t know how to code, I recommend getting started with the Weka machine learning workbench. It allows you to learn and practice applied machine learning without writing a line of code. You can learn more here:

If you know how to program in another language, then you will be able to pick-up programming Python very quickly.

I have a Python programming tutorial that might help:

Working with NumPy arrays are a big part of machine learning programming in Python. You can get started with NumPy arrays here:

I think the best way to learn a programming language is by using it.

Three good books for learning Python include:

You may have a predictive modeling problem where an input or output variable has a large number of levels or values (high cardinality), such as thousands, tens or hundreds of thousands.

This may introduce issues when using sparse representations such as a one hot encoding.

Some ideas for handling data with a large number of categories include:

  • Perhaps the variable can be removed?
  • Perhaps use the data as-is (some NLP problems have hundreds of thousands of words that are treated like categories).
  • Perhaps you can try an integer encoding?
  • Perhaps you can try using a hash of the categories?
  • Perhaps you can remove or group some levels? (e.g. in NLP you can remove all words with a low frequency)
  • Perhaps you can add some flag variables (boolean) to indicate a subset of important levels.
  • Perhaps you can use methods like trees that better lend themselves to many levels.

It is common to have a different number of observations for each class in a classification predictive modeling problem.

This is called a class imbalance.

There are many ways to handle this, from resampling the data to choosing alternate performance measures, to data generation and more.

I recommend trying a suite of approaches to see what works best for your project.

I list some ideas here:

Your data may have a column that contains string values.

Specifically, the string values are labels or categories. For example, a variable or column called “color” with the values in the column of “red“, “green“, or “blue“.

If you have string data, such as addresses or free text, you may need to look into feature engineering or natural language processing respectively.

Your categorical data may be a variable that will be an input to your model, or a variable that you wish to predict, called the class in classification predictive modeling.

Generally, in Python, we must convert all string inputs to numbers.

We can do this two ways:

  • Convert the string values, called integer encoding.
  • Convert the string to binary vectors, called a one hot encoding (you must integer encode first).

For explanation why, see the post:

For a tutorial with an example of integer encoding and one hot encoding, see the post:

For a tutorial on how to prepare the data using a one hot encoding, see the post:

Some time series data is discontiguous.

This means that the interval between the observations is not consistent, but may vary.

You can learn more about contiguous vs discontiguous time series datasets in this post:

There are many ways to handle data in this form and you must discover the approach that works well or best for your specific dataset and chosen model.

The most common approach is to frame the discontiguous time series as contiguous and the observations for the newly observation times as missing (e.g. a contiguous time series with missing values).

Some ideas you may want to explore include:

  • Ignore the discontiguous nature of the problem and model the data as-is.
  • Resample the data (e.g. upsample) to have a consistent interval between observations.
  • Impute the observations to form a consistent interval.
  • Pad the observations for form a consistent interval and use a Masking layer to ignore the padded values.

Imbalanced classification techniques are demonstrated on binary classification tasks with one minority and one majority class for simplicity, but this is not a limitation of the techniques.

Most of the techniques developed for imbalanced classification work for both binary and multi-class classification problems.

This includes techniques such as:

  • Cost sensitive learning algorithms (e.g. class weighting).
  • Data sampling methods (e.g. oversampling and undersampling).
  • Performance metrics (e.g. precision and recall)

You will likely need to specify how to handle each class.

For example, you will need to specify the weight for each class in cost sensitive learning, the amount to over or under sample each class in data sampling, or which classes are the minority and which are the majority for performance metrics.

For worked examples of multi-class imbalanced classification, see the tutorial:

For an example of configuring metrics like precision and recall for multi-class classification, see the tutorial:

For an example of using some imbalanced classification techniques on a multi-class classification problem, see the section “improved models” in the tutorial:

You can handle rows with missing data by moving those rows or imputing the missing values.

For help handling missing data in tabular data with Python, see:

For help handling missing data in time series with Python, see:

For help handling missing data in tabular data in Weka, see:

I have a great checklist of things to try in order to improve the skill of your predictive model:

I have a deep learning specific version here:

For beginners, I recommend installing the Anaconda Python platform.

It is free and comes with Python and libraries needed for machine learning, such as scikit-learn, pandas, and much more.

You can also easily install deep learning libraries with Anaconda such as TensorFlow and Keras.

I provide a step-by-step tutorial on how to install Python for machine learning and deep learning here:

A p-value is the probability of observing a result given a null hypothesis (e.g. no change, no difference, or no result):

A p-value is not the probability of the hypothesis being true, given the result.

The p-value is interpreted in the context of a pre-chosen significance level, called alpha. A common value for alpha is 0.05, or 5%. It can also be thought of as a confidence level of 95% calculated as (1.0 – alpha).

The p-value can be interpreted with the significance level as follows:

  • p-value <= alpha: significant result, reject null hypothesis (H0), distributions differ.
  • p-value > alpha: not significant result, do not reject null hypothesis (H0), distributions same.

A significance level of 5% means that there is a 95% likelihood that we will detect a result (reject H0), if there is a result to detect. Put another way, there is a 5% likelihood of finding an effect (reject H0) if there is no effect, called a false positive or more technically a Type I error.

For more information see the post:

Statistics, specifically applied statistics is concerned with using models that are well understood, such that it can clearly be shown why a specific prediction was made by the model.

Explaining why a prediction is made by a model for a given input is called model interpretability.

Examples of predictive models where it is straightforward to interpret a prediction by the model are linear regression and logistic regression. These are simple and well understood methods from a theoretical perspective.

Note, interpreting a prediction does not (only) mean showing “how” the prediction was made (e.g. the equation for how the output was arrived at), it means “why“, as in the theoretical justification for why the model made a prediction. An interpretable model can show the relationship between input and output, or cause and effect.

It’s worse than this, there are great claims for the need for model interpretability, but little definition of what it is or why it is so important. See the paper:

In applied machine learning, we typically sacrifice model interpretability in favor of model predictive skill.

This may mean using methods that cannot easily (or at all) explain why a specific prediction is made.

In fact, this is the focus of the sub-field of machine learning referred to as “predictive modeling“.

Examples of models were there are no good ways to interpret “why” a prediction was made include support vector machines, ensembles of decision trees and artificial neural networks.

Traditionally, statisticians have referred to such methods as “black box” methods, given their opacity in explaining why specific predictions are made.

Nevertheless, there is work on developing methods for interpreting predictions from machine learning, for example see:

I think the goal of model interpretability may be misguided by machine learning practitioners.

In medicine, we use drugs that give a quantifiable result using mechanisms that are not understood. The how may (or may not) be demonstrated, but the cause and effect for individuals is not. We allow the use of poorly understood drugs through careful and systematic experimental studies (clinical trials) demonstrating efficacy and limited harm. It mostly works too.

As a pragmatist, I would recommend that you focus on model skill, on delivering results that add value, and on a high level of rigor in the evaluation of models in your domain.

For more, see this post:

Machine learning model performance is relative, not absolute.

Start by evaluating a baseline method, for example:

  • Classification: Predict the most common class value.
  • Regression: Predict the average output value.
  • Time Series: Predict the previous time step as the current time step.

Evaluate the performance of the baseline method.

A model has skill if the performance is better than the performance of the baseline model. This is what we mean when we talk about model skill being relative, not absolute, it is relative to the skill of the baseline method.

Additionally, model skill is best interpreted by experts in the problem domain.

For more on this topic, see the post:

The first step to making predictions is to develop a finalized version of your chosen model and model configuration, trained on all available data. Learn more about this here:

You will want to save the trained model to file for later use. You can then load it up and start making predictions on new data.

Most modeling APIs provide a predict() function that takes one row of data or an array/list of rows of data as input for your model and for which your model will generate a prediction.

For some examples of making predictions with final models with standard libraries, see the posts:

An anomaly is an example that does not fit with the rest of the data.

It is an exception, an outlier, abnormal, etc.

Anomaly detection is a large field of study and I hope to write more about it in the future. I recommend checking the literature.

It is possible to treat anomalies as outliers and detect them using statistical methods or so-called one-class classification techniques.

For more on statistical methods for outlier detection, see the tutorial:

For more on model-based approaches for automatic outlier detection, see the tutorial:

Time series data may contain outliers.

Again, there are specialized techniques for detecting outlier sin time series data and I hope to write about them in the future. I recommend checking the literature.

It is possible to use statistical methods to detect outliers in time series. It is also possible to treat outlier detection in time series as a time series classification task. The input to the model would be a sequence of observations and the target would be whether or not the input sequence contains an outlier.

Classification machine learning models as well as deep learning neural network models can be used such as MLPs, CNNs, LSTMs and hybrid models.

For more on using deep learning models for time series classification, you can see examples listed here:

The LSTM expects data to be provided as a three-dimensional array with the dimensions [samples, time steps, features].

Learn more about how to reshape your data in this tutorial:

For a reusable function that you can use to transform a univariate or multivariate time series dataset into a supervised learning problem (useful for preparing data for an LSTM), see the post:

If you have a long time series that you wish to reshape for an LSTM, see this tutorial:

If you have missing data in your sequence, see the tutorial:

If you have a large number of time steps and are looking for ideas on how to split up your data, see the tutorials:

The command line is the prompt when you can type commands to execute them.

It may be called different things depending on your platform, such as:

  • Terminal (linux and macos)
  • Command Prompt (windows)

I recommend running scripts from the command line if you are a beginner.

This means first you must save your script to a file with the appropriate extension in a directory.

For example, we may save a Python script with the .py extension in the /code directory. The full path would be:

To run the script, first open the command prompt or terminal.

Change directory to the location where you saved the script.

For example, if you saved the script in /code, you would type:

Use the language interpreter to run the script.

For example, in Python, the interpreter is “python” and you would run your script as:

If the script is dependent upon a data file, the data file often must be in the same directory as the code.

For example:

 

 

I strongly believe that self-study is the path to getting started and getting good at applied machine learning.

I have dedicated this site to help you with your self-study journey toward machine learning mastery (hence the name of the site).

I teach an approach to machine learning that is different to the way it is taught in universities and textbooks. I refer to the approach as top-down and results-first. You can learn more about this approach here:

You do not need a degree to get started, or learn machine learning, or even to get a job applying machine learning and adding value in business. I write about this more here:

A big part of doing well at school, especially the higher degrees, is hacking your own motivation. This, and the confidence it brings, was what I learned at university.

You must learn how to do the work, even when you don’t feel like it, even when the stakes are low, even when the work is boring. It is part of the learning process. This is called meta-learning or learning how to learn effectively. You doing the learning, not humans in general.

Learning how you learn effectively is a big part of self-study.

  • Find and use what motivates you.
  • Find and use the mediums that help you learn better.
  • Find and listen to teachers and material to which you relate strongly.

External motivators like getting a coach or an accountability partner sound short-term to me. You want to solve “learning how to self-study” in a way that you have the mental tools for the rest of your life.

A big problem with self-directed learning is that it is curiosity-driven. It means you are likely to jump from topic to topic based on whim. It also offers a great benefit, because you read broadly and deeply depending on your interests, making you a semi-expert on disparate topics.

An approach I teach to keep this on track is called “small projects“.

You design a project that takes a set time and has a defined endpoint, such as a few man-hours and a report, blog post or code example. The projects are small in scope, have a clear endpoint and result in a work product at the end to add to the portfolio or knowledge base that you’re building up.

You then repeat the process of designing and executing small projects. Projects can be on a theme of interest, such as “Deep Learning for NLP” or on questions you have along the way “How do you use SVM in scikit-learn?“.

I write more about this approach to self-study here:

The very next question I get is:

  • Do you have some examples of portfolios or small projects?

Yes, they are everywhere, search for machine learning on GitHub, YouTube, Blogs, etc.

This is how people learn. You may even be using this strategy already, just in a less systematic way.

  • You could think of each lecture in a course as a small project, although often poorly conceived.
  • You could think of each API call for a method in an open source library as a small project.
  • You could think of my blog as a catalog of small projects.

Don’t get hung-up on what others are doing.

Pick a medium that best suits you and build your own knowledge base. It’s for you remember. It’s use in interviews is a side benefit, not the goal.

Training machine learning models can be slow for may reasons.

This means there may be many opportunities to speed up their training.

Some high-level ideas include:

  • Try training your model on a faster machine (e.g. AWS EC2 instance, etc.)
  • Try using less training data (e.g. random sample, under sampling, etc.)
  • Try using a smaller model (e.g. fewer layers, few ensemble members, etc.)
  • Try using a more efficient implementation (e.g. different open source project, etc.)

Before you say “I can’t“, think carefully and do your homework.

  • Maybe you can use free credits on a cloud platform (ask).
  • Maybe your model is less sensitive to the size of the training dataset than you think (test it).
  • Maybe a smaller model will perform with little loss in skill (try it).
  • Maybe an lesser known implementation performs much faster than the common libraries (try it).

Also, some machine learning algorithms can accelerate their training specifically via changes to hyperparameters or model architecture.

  • Learning rate on many algorithms controls the speed of learning, and therefore training.
  • Batch size in gradient algorithms also influences the speed of learning.
  • Neural net layers like batch normalization can dramatically accelerate the learning.
  • Scaling of the input data can greatly simplify the complexity of many prediction problems.

 

Generally, you cannot use k-fold cross-validation to estimate the skill of a model for time series forecasting.

The k-fold cross-validation method will randomly shuffle the observations, which will cause you to lose the temporal dependence in the data, e.g. the ordering of observations by time. The model will no longer be able to learn how prior time steps influence the current time step. Finally, the evaluation will not be fair as the model will be able to cheat by being able to look into past and future observations.

The recommended method for estimating the skill of a model for time series forecasting is to use walk-forward validation.

You can learn more about walk-forward validation in this post:

I also have many posts that demonstrate how to use this method, search the site.

Early stopping is a regularization technique used by iterative machine learning algorithms such as neural networks and gradient boosting.

It reduces the likelihood of a model overfitting the training data by monitoring performance of the model during training on a validation dataset and stopping training as soon as performance starts to get worse.

For more on early stopping, see the tutorial:

k-Fold Cross-Validation is a resampling technique used to estimate the performance of a predictive model.

It works by splitting a training dataset into k non-overlapping folds, then using one fold as the hold out test dataset and all other folds as the training dataset. The process is repeated and the mean performance of the models on all hold out folds is used as the estimate of model performance when making predictions on data not seen during training.

For more on k-fold cross-validation, see the tutorial:

Using early-stopping with k-fold cross-validation can be tricky.

It requires that each training set within each cross-validation run be further split into a portion of the dataset to fit the model and a portion of the dataset to use by early stopping to monitor the training process.

This may require you running the k-fold cross-validation process manually in a for loop so that you can further split the training portion and configure early stopping. This would be my general recommendation to give you complete control over the process.

Grid Search is a type of model hyperparameter tuning.

It involves defining a grid of hyperparameter values to consider and evaluating the model on each in turn using a resampling technique such as k-fold cross-validation. The combination of hyperparameters that result in the best performance can the be chosen by the model.

Using early-stopping with grid search can also be tricky.

One approach might be to run early stopping on the training dataset a number of times, or with different splits of the dataset into train/validation sets, and use the mean number of iterations or epochs ran as a fixed hyperparameter when grids searching other hyperparameters.

Another approach would be to treat the model with early stopping as a modeling pipeline and grid search other hyperparameters directly on this pipeline. This does assume that you are able to split a training set into a a subset for fitting the model and a subset for early stopping as part of the grid search, which might require custom code.

Ensembles are a class of machine learning method that combines the predictions from two or more other machine learning models.

There are many types of ensemble machine learning methods, such as:

  • Stacked Generalization (stacking or blending)
  • Voting
  • Bootstrap Aggregation (bagging)
  • Boosting (e.g. AdaBoost)
  • Stochastic Gradient Boosting (e.g. xgboost)
  • Random Forest
  • And many more…

You can learn more about ensemble methods in general here:

You can learn how to code these methods from scratch, which can be a fun way to learn how they work:

You can use ensemble methods with standard machine learning platforms, for example:

Many studies have found that ensemble methods achieve the best results when averaged across multiple classification and regression type predictive modeling problems. For example:

You can also get great results with state-of-the-art implementations of ensemble methods, such as the XGBoost library that is often used to win machine learning competitions.

You can get started with XGBoost here:

LSTMs and other types of neural networks can be used to make multi-step forecasts on time series datasets.

To get started with using deep learning methods (MLPs, CNNs, and LSTMs) for time series forecasting, start here:

For a specific tutorial on using LSTMs for time series forecasting, including multi-step forecasting, see this post:

For a specific tutorial on LSTMs applied to a multivariate input and multi-step forecast problem, see this tutorial:

For more help on multi-step forecasting strategies in general, see the post:

You can get started using deep learning methods such as MLPs, CNNs and LSTMs for univariate, multivariate and multi-step time series forecasting here:

 

I have a process that is recommended when working through a new predictive modeling project that will help you work through your project systematically.

You can read about it here:

I hope that helps as a start.

I generally recommend working with a small sample of your dataset that will fit in memory.

I recommend this because it will accelerate the pace at which you learn about your problem:

  • It is fast to test different framings of the problem.
  • It is fast to summarize and plot the data.
  • It is fast to test different data preparation methods.
  • It is fast to test different types of models.
  • It is fast to test different model configurations.

The lessons that you learn with a smaller sample often (but not always) translate to modeling with the larger dataset.

You can then scale up your model later to use the entire dataset, perhaps trained on cloud infrastructure such as Amazon EC2.

There are also many other options if you are intersted in training with a large dataset. I list 7 ideas in this post:

The k-fold cross-validation method is used to estimate the skill of a model when making predictions on new data.

It is a resampling method, which makes efficient use of your small training dataset to evaluate a model.

It works by first splitting your training dataset into k groups of the same size. A model is trained on all but one of these groups, and then is evaluated on the hold out group. This process is repeated so that each of the k sub-groups of the training dataset is given a chance to be used as a the hold-out test set.

This means that k-fold cross-validation will train and evaluate k models and give you k skill scores (e.g. accuracy or error). You can then calculate the average and standard deviation of these scores to get a statistical impression of how well the model performs on your data.

You can learn more about how k-fold cross-validation works here:

You can learn more about how to implement k-fold cross-validation in these posts:

The models created during k-fold cross-validation are discarded. When you choose a model and set of parameters, you can train a final model using all of the training dataset.

Learn more about training final models here:

Consider an LSTM layer.

What do we know about this vanilla LSTM layer:

  • The layer has multiple nodes or units (e.g. 10).
  • The layer will received an input sequence (e.g. 100 steps with 1 feature per sep).
  • The will output a vector (e.g. 10 elements).

How does the layer process the input sequence?

Each node in the layer is like a mini-network. Each node is exposed to the input sequence and produces an output. This is the biggest point of confusion for beginners, and it means that the number of nodes in the layer is unrelated to the number of time steps in the input sequence.

The input sequence is processed one time step at a time. We may provide the entire input sequence to the layer via the API for efficiency reasons, or we may provide the input sequence to the mode one step at a time, a so-called dynamic RNN, which is less efficient but more flexible. The former is more common and is the approach I use in almost all of my tutorials.

Each step of input for a node results in an output and an internal state. Both are used in the processing of the subsequent time step and are held within the layer. By default, only the last output from the end of the sequence is the actual output provided by the node. This can be changed to return the output from each input time step (e.g. setting return_sequence=True) and must be changed to do this when stacking LSTM layers.

Nevertheless, by default, each node in the layer outputs one value after processing the sequence. Therefore, the number of nodes in the layer determines the number of elements in the vector output from the layer.

More nodes and layers means more capacity for the network to learn, but results in a model that is more challenging and slower to train.

You must find the right balance of network capacity and trainability for your specific problem.

There is no reliable analytical way to calculate the number of nodes or the number of layers required in a neural network for a specific predictive modeling problem.

My general suggestion is to use experimentation to discover what configuration works best for your problem.

This post has advice on systematically evaluating neural network models:

Some further ideas include:

  • Use intuition about the domain or about how to configure neural networks.
  • Use deep networks, as empirically, deeper networks have been shown to perform better on hard problems.
  • Use ideas from the literature, such as papers published on predictive problems similar to your problem.
  • Use a search across network configurations, such as a random search, grid search, heuristic search, or exhaustive search.
  • Use heuristic methods to configure the network, there are hundreds of published methods, none appear reliable to me.

More information here:

Regardless of the configuration you choose, you must carefully and systematically evaluate the configuration of the model on your dataset and compare it to a baseline method in order to demonstrate skill.

The amount of training data that you need depends both on the complexity of your problem and on the complexity of your chosen algorithm.

I provide a comprehensive answer to this question in the following post:

There are time series forecasting problems where you may have data from multiple sites.

For example, forecasting temperature for multiple cities.

We can think of this as multi-site time series forecasting.

Some general approaches that you could explore for multi-site forecasting include:

  • Develop one model per site.
  • Develop one model per group of sites.
  • Develop one model for all sites.
  • Hybrid of the above.
  • Ensemble of the above.

It may not be clear which approach may be most suitable to your problem. I recommend prototyping a few different approaches in order to discover what works best for your specific dataset.

Operations like sum and mean on NumPy arrays can be performed array-wise, column-wise and row-wise.

This is achieved by setting the “axis” argument when calling the function, e.g. sum(axis=0).

There are three axis values you may want to use, as follows:

  • axis=None: Apply operation array-wise.
  • axis=0: Apply operation column-wise, across all rows for each column.
  • axis=1: Apply operation row-wise, across all columns for each row.

For more information see:

Generally, I don’t read the latest bleeding edge papers.

This is for a few reasons:

  • Most papers are not reproducible.
  • Most papers are badly written.
  • Most papers will not be referenced or used by next year.

I try to focus my attention on the methods that prove useful and relevant after a few years.

These are the methods that:

  • Are used to do well in machine learning competitions.
  • Appear in open source libraries and packages.
  • Are used and discussed broadly by other practitioners.

Please note that “pandas.plotting.scatter_matrix” is the correct API for the most recent version of Pandas.

If you are using “pandas.tools.plotting.scatter_matrix” you will need to update your version of Pandas.

You can learn more about the updated API here.

Probably not.

If you have a problem to solve, do not code machine learning algorithms from scratch. Your implementation will probably be slow and full of bugs. Software engineering is really hard.

If you are working on a project that has to be used in production, does it make sense to:

  • Implement your own compiler?
  • Implement your own quick sort?
  • Implement your own graphical user interface toolkit?

In almost all cases, the answer is: no, are you crazy?

Use an open source library with an efficient and battle-tested implementation of the algorithms you need.

I write more about this here:

If you want to learn more about how an algorithm works, yes, implementing algorithms from scratch is a wonderful idea. Go for it!

In fact, for developers, I think coding algorithms from scratch is the best way to learn about machine learning algorithms and how they work. I even have a few books on the topic showing you how.

I write more about this here:

I recommend an approach to self-study that I call “small projects” or the “small project methodology” (early customers may remember that I even used to sell a guide by this name).

The small project methodology is an approach that you can use to very quickly build up practical skills in technical fields of study, like machine learning. The general idea is that you design and execute small projects that target a specific question you want to answer. You can learn more about this approach to studying machine learning here:

Small projects are small in a few dimensions to ensure that they are completed and that you extract the learning benefits and move onto the next project.

Below are constraints you should consider imposing on your projects:

  • Small in time.
  • Small in scope.
  • Small in resources.

You can learn more about small projects here:

I recommend that each project have a well defined deliverable, such as a program, blog post, API function, etc.

By having a deliverable, it allows you to build up a portfolio of “work product” that you can both leverage on future small projects and use to demonstrate your growing capabilities.

You can learn more about building a portfolio here:

I generally recommend that developers start in machine learning by focusing on how to work through predictive modeling problems end-to-end.

Everything makes sense through this lens, and it focuses you on the parts of machine learning you need to know in order to be able to deliver results.

I outline this approach here:

Nevertheless, there are some machine learning basics that you need to know. They can come later, but some practitioners may prefer to start with them.

You need to know what machine learning is:

You need to know that machine learning finds solutions to complex problems, that you probably cannot solve with custom code and lots of if-statements:

You need to get a feeling for the types of problems that can be addressed with machine learning:

You need to know about the ways that algorithms can learn:

You need to know about the types of machine learning algorithms that you can use:

You need to know how machine learning algorithms work, in general:

That is a good coverage of the absolute basics. Here are some additional good overviews:

The rest, such as how to work through a predictive modeling problems, algorithm details, and platform details can be found here:

It really depends on your goals.

If you want to be a machine learning practitioner or machine learning engineer, then what you need to know is very different from a machine learning academic.

The huge problem is, universities are training machine learning academics, not machine learning practitioners.

When a developer thinks about getting started in machine learning, they look to the university curriculum of math, and they think that they need to study math.

This is not true.

Start by figuring out your goals, what you want to do with machine learning. I write about this here:

This website is about helping developers get started and get good at applied machine learning.

As such, I assume you know about being a developer, this means:

  • You probably know how to write code (but this is not required).
  • You probably know how to install and manage software on your workstation.
  • You probably use computational thinking when approaching problems.

Does this describe you? If so, I’m one of you. This is who we are. We’re developers.

You’re in the right place! Now, it’s time to get started.

Machine learning algorithms learn how to map examples of input to examples of output.

This is useful because in the future we can give new examples of input and the model can predict the output.

Therefore, when we train a model, we must separate our data (rows) into input and output elements (columns)

Input is referred to as “X”, output is referred to as “y”, and predictions made by the model are its approximation of “y” that we call “yhat”.

  • X: The input component of rows of data.
  • y: The output component of rows of data.

Data preparation involves transforming raw data into a form or format such that it is ready to use as input for fitting a model.

For example, in machine learning and deep learning with tabular datasets (e.g. like data in a spreadsheet), each column of data might be standardized or normalized.

There are many types of data preparation that you might perform, and some algorithms expect some data preparation.

For example, algorithms that use distance calculations such as kNN and SVM may perform better if input variables with different scales (e.g. feet, hours, etc.) are normalized to the range between 0-1. Algorithms that weight inputs such as linear regression, logistic regression, and neural networks may also prefer input variables to be normalized.

Some input variables or output variables may have a specific data distribution, such as a Gaussian distribution. For those algorithms that prefer scaled input variables, it may be better to center or standardize the variable instead of normalizing it.

Further, some algorithms assume a specific data distribution. For example, linear algorithms like linear regression may assume input variables have a Gaussian distribution.

As we can see, the type of data preparation depends both on the choice of model and on the specific data that you are modeling.

To make things more complicated, sometimes better or best results can be achieved by ignoring the assumptions or expectations of an algorithm.

Therefore, in general, I recommend testing (prototyping or spot-checking) a suite of different data preparation techniques with a suite of different algorithms in order to learn about what might work well for your specific predictive modeling problem.

For more on what data preparation methods to use, see the tutorial:

I recommend the Keras library for deep learning.

It provides an excellent trade-off of power and ease-of-use.

Keras wraps powerful computational engines, such as Google’s TensorFlow library, and allows you to create sophisticated neural network models such as Multilayer Perceptrons, Convolutional Neural Networks and Recurrent Neural Networks with just a few lines of code.

You can get started with Keras in Python here:

You may be working on a regression problem and achieve zero prediction errors.

Alternately, you may be working on a classification problem and achieve 100% accuracy.

This is unusual and there are many possible reasons for this, including:

  • You are evaluating model performance on the training set by accident.
  • Your hold out dataset (train or validation) is too small or unrepresentative.
  • You have introduced a bug into your code and it is doing something different from what you expect.
  • Your prediction problem is easy or trivial and may not require machine learning.

The most common reason is that your hold out dataset is too small or not representative of the broader problem.

This can be addressed by:

  • Using k-fold cross-validation to estimate model performance instead of a train/test split.
  • Gather more data.
  • Use a different split of data for train and test, such as 50/50.

Verbose is an argument in Keras on functions such as fit(), evaluate(), and predict().

It controls the output printed to the console during the operation of your model.

Verbose takes three values:

  • verbose=0: Turn off all verbose output.
  • verbose=1: Show a progress bar for each epoch.
  • verbose=2: Show one line of output for each epoch.

When verbose output is turned on, it will include a summary of the loss for the model on the training dataset, it may also show other metrics if they have been configured via the metrics argument.

The verbose argument does not affect the training of the model. It is not a hyperparameter of the model.

Note, that if you are using an IDE or a notebook, verbose=1 can cause issues or even errors during the training of your model. I recommend turning off verbose output if you are using an IDE or a notebook.

Feature importance methods suggest the relative importance of each feature to the target variable.

The importance scores could be calculated using a statistical method (such as correlation or mutual information) or by a model such as an ensemble of decision trees.

Each feature importance method provides a different “view” on the relative importance of input variables that might be relevant to your predictive modeling problem.

As such, there is no objectively “best” feature importance method.

Further, the scores are relative, not absolute. As such, scores calculated by one method cannot be meaningfully compared to scores calculated by another method.

Your task as a machine learning practitioner is to best use these suggestions in the development of your model.

One approach might be to create a model from each view of your data and ensemble the predictions of these models together.

Another approach might be to evaluate the skill of a model developed from each view of the data and to use the features that result in a model with the best skill.

If you need to describe the importance of input variables to project stakeholders, perhaps you can look for how the performance of your specific final model varies with and without each input variable via an ablation study. Alternately, perhaps you can report on how a suite of different feature importance methods comment on your input data.

For more help on feature importance, see the post:

Feature selection methods suggest a subset of input features that you may use to model your predictive modeling problem.

Each feature selection method provides a different “view” on the input variables that might be relevant to your predictive modeling problem.

As such, there is no objectively “best” feature selection method.

Your task as a machine learning practitioner is to best use these suggestions in the development of your model.

One approach might be to create a model from each view of your data and ensemble the predictions of these models together.

Another approach might be to evaluate the skill of a model developed from each view of the data and to use the features that result in a model with the best skill.

For more help on feature selection, see the post:

If you are interested in applied machine learning but don’t know how to write code, a great place for you to start would be with Weka machine learning workbench.

Weka provides a graphical user interface and does not require any programming. You can use it to work through predictive modeling problems end-to-end very quickly.

You can get started with Weka here:

You can get started and get good at applied machine learning, even if you are not a developer.

I describe my approach to teaching machine learning as an “approach for developers“, but it will work just as fine for non-developers, such as engineers, analysts, designers, etc.

When I describe the material as “being for developers“, I mean a few things:

  • You are comfortable with computational thinking, such as abstraction, decomposition, procedures, etc.
  • You are comfortable with challenging open problems, such as designing and implementing a solution.
  • You are interested in solving problems for stakeholders, delivering results, working on projects.

Developers have these properties, but so do non-developers.

I have material that does not require any skill in programming, for example:

  • You can use Weka to work through predictive modeling problems without writing a line of code. (learn more)
  • You can learn how machine learning algorithms work using simple arithmetic and worked examples in spreadsheets. (learn more)

Many of my tutorials are written for practitioners who are comfortable with programming.

A good palce to start if you know a little programming is machine learning with Python. Python is really easy to use and writes like pseudocode.

Typically, a model is overfit if the skill of the model is better on the training dataset than on the test dataset.

If the model skill is poor on both the training and the test datasets, the model may be underfit.

Sometimes, it can be the case that the skill of the model is better on the test dataset than on the training dataset.

This is likely because the test dataset is not representative of the broader prediction problem, for example the size of the test dataset is too small.

To remedy this problem, I would recommend experimenting with different variations of the test harness, including at least different or different sized train and test datasets and different model configurations.

A confusion matrix is a way of presenting the results of a classifier in the context of what was really observed.

Results are presented in a table showing the breakdown of the categorical outcome variable by value or level, comparing the frequency of observed values to the frequency of predicted values.

Often the observed frequencies are presented in columns and predicted frequencies are presented as rows, for example:

Ideally, the predicted frequency would match the expected frequency and show number in the diagonal from the top left to the bottom right of the table and zeros every where else.

A confusion matrix is useful as it allows you to quickly see the distribution of the types of errors made by a classifier on a classification predictive modeling problem.

For more information and worked examples see the post:

Machine learning models learn how to map examples of input to examples of output from historical training data.

In mathematics, this is referred to a function approximation. That is machine learning models approximate a mapping function from inputs to outputs.

The target variable in a dataset is the variable (column or feature) in the dataset that is will be the output of the model. It is the thing to be predicted.

  • If the target is numeric (like 44.2), the prediction problem is referred to as regression.
  • If the target variable is a category (like “blue”), the prediction problem is referred to as classification.

Sometimes a target variable may be numeric but represent a nominal category, such as if “yes” was mapped to 1 and “no” was mapped to 0. The problem would still be considered a classification problem.

Sometimes a target variable may be numeric and may represent an ordinal category. This means we could choose to model the problem as either classification or regression. An example might be a target variable that is comprised of integer measurements from 1 to 10, where each value could be treated as a number for regression or a label for classification.

In a tabular dataset, such as a table or matrix of data, a target variable is a column of values.

It is common for the target variable to be in the same file as the input variables. It is also common for the target variable to be the last column in the file. This is best practice when storing datasets in files, like CSV files, but is not always the case.

If you collect the data yourself, you will know the target variable, because you have defined what the inputs and outputs for the problem will be.

If you are using a standard dataset, it is common for the dataset to have a description accompanying it, or for the dataset to be well described in the first publication or paper that uses the dataset. This description will mention the name or column in the dataset that is the target variable.

Generally, algorithms are divided up into the type of problem you are trying to solve, such as classification and regression.

There are then specializations on these types of problems, such as the type of data (text, audio, images) and the relationship between observations (such as sequences and time series).

1. Start By Defining Your Problem

The first step in selecting an algorithm is having a clear definition of your problem. This post can help you define your supervised learning problem:

Once you know the general type of problem you have and the inputs and outputs, you are ready to start testing algorithms on it.

2. Use Spot-Checking

No one can tell you the specific algorithm to use or that will work best on your problem.

You must discover what works best for your specific dataset through a process of careful experimentation. You can learn more about this approach here:

This is the challenge of applied machine learning.

I would recommend testing a suite of methods to see what works best for your specific data. I call this spot checking. See these posts on the topic:

3. Try Some Advanced Methods

Some algorithms such as Random Forest and Stochastic Gradient Boosting have being show to work well if not the best on large numbers of classification and regression predictive modeling problems. Perhaps try these methods first? You can learn more about this here:

4. Get the Most From Your Data

You must also explore different framings of your data, different schemes to prepare your data and more. You can learn more about this here:

5. Applied Machine Learning is a Search Problem

Applied machine learning, really, your specific predictive modeling problem, is a big combinatorial search problem. You can learn more about this perspective here:

Classification involves assigning an observation a label.

Some examples include:

  • Assigning an email a label of “spam” or “not spam”.
  • Assigning a credit card transaction a label of “fraud” or “ok”.
  • Assigning a medical record a label of “healthy” or “sick”.

Regression involves predicting a numerical quantity for an observation.

Some examples include:

  • Predicting the price for the description of a house.
  • Predicting the number of bugs given a sample of code.
  • Predicting the number of pageviews for a given new article.

Classification and regression predictive modeling problems are two high level types of problems, although there are many specalizations, such as recommender systems, time series forecasting and much more.

You can learn more here:

Consider a sequence prediction problem where we wish to extract features from each timestep before processing the sequence of extracted features.

Two examples include:

  • 1D: Sequence of daily subsequences of observations over a month in a time series prediction problem.
  • 2D: Sequence of still images in the case of video classification.

Data of this form can be addressed with a CNN-LSTM or a ConvLSTM model.

The CNN-LSTM will use a CNN model to extract features from each step in the input sequence, resulting in a sequence of extracted features. This sequence of extracted features can be interpreted by an LSTM model.

The ConvLSTM is different in that each step in the input sequence is processed by LSTM units directly using a convolutional operation inside the unit, not as a feature extraction step beforehand and passed as input to the unit as in the CNN-LSTM.

For more on how the ConvLSTM works, see the paper

For tutorials on using ConvLSTM and CNN-LSTMs for time series forecasting, start here:

Dimensionality Reduction generally refers to techniques that reduce the number of input variables in a dataset.

More specifically, this often refers to techniques that change the data and project it into a lower dimensional space.

This could be achieved using algorithms from linear algebra (matrix factorization) such as PCA or SVD, or it could be achieved using techniques from manifold learning such as tSNE or MDS.

For more on dimensionality reduction, see the tutorial:

Feature Selection refers to algoritms that choose input variables (features) to delete from a dataset or to keep in a dataset.

Features might be chosen based on their statistical relationship with the target variable (filter methods) such as correlation or mutual information, or based on the effect they have on model performance (wrapper methods) such as RFE.

For more on feature selection, see the tutorial;

Both dimensionality reduction techniques and feature selection technically reduce the dimensionality of a dataset and project the data into a lower dimensional space. As such the names could be used interchangeably in some literature.

The useful difference is that dimensionality reduction methods create new features from the data, feature selection choose existing features from the data.

I prefer the above definitions to separate the techniques.

Feature selection refers to techniques for selecting a subset of input features to delete or a subset of input features to keep in a dataset.

Irrelevant and redundant features can negative impact the performance of some machine learning algorithms on some datasets. As such using only the most relevant features can result in faster training, more efficient execution of algorithms and better model skill.

For more on feature selection, see the tutorial:

Feature importance refers to techniques that report on the relative utility or each input variable (feature) to the target variable.

The scores calculated by a feature importance method can be used as input to a feature selection method. They can also be used to report to project stakeholders about what may or may not be most relevant or important in the data to a predictive model.

For more on feature importance, see the tutorial:

There is a distinct difference between function approximation and function optimization.

Function Optimization is the process of finding the optima of a target function. This is a set of input variables provided to a function that results in a minimum or maximum evaluation or score from the optimization function. Each variable often has a defined range of values, and the function is often complex, nonlinear, discontinuous, nonconvex, noisy, slow to evaluate, and all manner of other mathematical properties that mean that we cannot simply calculate optima analytically. Typically function optimization is solved using procedures that navigate or search the variable space in an efficient manner.

Function Approximation is the process of estimating the probability distribution of a target function. This is achieved using a limited set of samples from the target function where the model seeks to minimize the error between the estimate of the function and the actual observations. Function approximation is the general framing for supervised learning in machine learning, where we seek to approximate the mapping function between input data and output for regression and classification predictive modeling dataset.

Function optimization is simpler than function approximation.

Different algorithms are used for function optimization and function approximation.

Function approximation is often solved using function optimization, e.g. minimizing loss or error.

There is a difference between Keras and tf.keras.

Keras is a standalone open source Python library for deep learning. It allows you to define, train, and evaluate deep learning models using a simple interface, but the computation can be performed using any one of a number of efficient backend mathematical libraries, such as TensorFlow, Theano, and CNTK.

tf.keras is an implementation of the Keras interface in TensorFlow. Historically, deep learning models were cumbersome to define and use in TensorFlow. Version 2.0 of TensorFlow introduced a new Keras interface for defining, training, and evaluating deep learning models. The developer of the standalone Keras python library was involved in the implementation. It is called tf.keras as “tf” stands for TensorFlow and tf.keras is the programming idiom for using the API in Python.

The standalone Keras remains independent and allows you to use backends other than TensorFlow.

The tf.keras locks you to TensorFlow and Google’s direction for the platform.

As of late 2019, the standalone Keras project recommends using tf.keras going forward, although this statement was made by the developer who started the Keras standalone project and who was part of the implementation of tf.keras and may have a conflict of interest.

At this time, we recommend that Keras users who use multi-backend Keras with the TensorFlow backend switch to tf.keras in TensorFlow 2.0. tf.keras is better maintained and has better integration with TensorFlow features (eager execution, distribution support and other).

Keras Project Homepage, Accessed December 2019.

They both do the same thing. You can use the platform you think is best.

I recommend using standalone Keras instead of tf.keras, at least for now. It has been around for 4+ years and works well.

All of my tutorials were written using the standalone version of Keras.

If you choose to use tf.keras platform, my tutorials can easily be adapted by changing the import statements from “import keras” to “import tensorflow.keras

Regularization refers to methods used to modify an objective function in order to reduce model overfitting.

L1 and L2 regularization refer to methods of calculating the length of a vector of model parameters (called the vector norm) in order that this length can be minimized as part of fitting the model.

  • L1 or the L1-norm is calculated as the sum of the absolute vector values. An example use of this form of regularization is used in Lasso Regression.
  • L2 or the L2-norm is calculated as the sum of the squared vector values. An example use of this form of regularization is used in Ridge Regression.

The ElasticNet Regression algorithm uses a combination of both L1 and L2 regularization.

For more information on how to calculate the L1 and L2 vector norms, see the post:

A multi-headed CNN is a model that has more than one input or “head” for reading input. It often allows the model to access the same input image multiple times using a different sized kernel, and in turn allowing different features to be extracted in parallel from the data.

A multi-channel CNN is a model that receives input that has multiple variables or “channels“, such as red, green and blue channels for an image for a 2D CNN, or parallel time series in the case of a 1D CNN. All channels will be read together using the same filters.

For example of multi-headed and multi-channel CNNs for time series forecasting, see the post:

Object recognition and object detection are problems from computer vision that can be addressed using deep learning convolutional neural networks.

  • Object detection is a specific computer vision task that involves identifying, locating, and classifying one or more objects in a photograph.
  • Object recognition is the broader problem of recognising objects in photographs and includes the subproblems of image classification, object localization, object detection, object segmentation, and more.

Object recognition is the general class of problems or field of study, whereas object detection is one specific type of object recognition task.

Traditionally, the result of running a learning algorithm on a dataset is called a fit.

We may also refer to the fit as the model.

An overfit model is one that fits the random noise in the data sample.

This means that the model may perform well on the training data, does not generalize to new data and performs well on test data.

An underfit model is one that does not capture enough of the structure in the data sample.

This means that the model will perform poor on the training and the test datasets. More fit or a better fit is required.

A good fit refers to a model that finds a suitable balance of capturing the structure in the dataset and generalizing to new data. It performs well on the training and test datasets.

You can learn more about overfitting and underfitting here:

In Keras, an LSTM input (the input_shape argument) must be defined as a tuple of (timesteps, features), for example:

Neural network models are typically fit on multiple examples of input and output, called samples. This is one row of data in a spreadsheet for example, divided into X and y elements. If this is a new idea for you see this post:

In a sequence prediction problem, a sample involves a sequence of input (e.g. time steps) where there are one or more observations at each time step (e.g. features).

Therefore, the input part of your training data (X) must be a three-dimensional array with the dimensions [samples][timesteps][features], and you must have at least one sample, one time step and one feature.

In practice, you will likely have thousands or millions of samples, perhaps more than 10 and less than 400 time steps and at least 1 feature. LSTMs don’t perform well with more than 200-400 time steps based on what I have read.

A model may operate on input with 1 time step (unusual in practice), but may have memory across the samples within a batch, before the internal state of the model is reset.

As an example, you may track temperature (a feature) and pressure (another feature) every second (time steps) for a week (604,800 seconds). You must split the years of data into discrete input examples (samples), e.g. input output pairs in order to train a supervised learning model like an LSTM or CNN. Perhaps you choose non-overlapping sequences of 60 seconds, therefore you might have thousands of samples, 60 time steps and 2 features or the LSTM input shape [10080, 60, 2] (recall that 604,800 seconds can be split into 10,080 60-second blocks).

As another example, if you have 20 time steps of data for 2 features, you might represent it as 20 samples with 1 time step [20, 1, 2] or as 1 sample with 20 time steps [1, 20, 2]. In the first case, you will have 20 samples, meaning 20 input and output examples and your model will make 20 predictions. In the latter case, you have 1 sample, with 1 input and 1 output and the model will make 1 prediction that is accumulated over 20 time steps. Note, this is an extreme example as a dataset with 20 examples is very small and a representation with 1 time step per sample might not make sense.

In summary:

  • Sample: One input example that has 1 or more time steps with one or more features at each time step.
  • Timestep: One part of a single input example that has one or more features.
  • Feature: One of possible many observations for a given time step.

For an example of splitting time series into samples, timesteps and features, see the posts:

For discussion and reusable code for converting a time series into a supervised learning problem in general, see:

For more on using LSTMs in general, see:

For worked examples on using LSTMs for time series forecasting, see:

Standardization refers to scaling a variable that has a Gaussian distribution such that it has a mean of zero and a standard deviation of one.

Normalization refers to scaling a variable that has any distribution so that all values are between zero and one.

It is possible to normalize after standardizing a variable.

For more information on how to standardize data and normalize data, see the tutorial:

 

Supervised learning is used on problems where the goal is to learn a mapping from inputs to outputs.

The methods are referred to as “supervised” because the learning process operates like a teacher supervising a student. The model continually makes predictions, the predictions are compared to the expected outcomes, error is calculated, and the model is corrected using these errors.

Examples of supervised machine learning problems include:

  • Classification or the mapping of input variables to a label.
  • Regression or the mapping if input variables to a quantity.

Examples of supervised machine learning algorithms include:

  • k-nearest neighbours.
  • support vector machines.
  • multilayer perceptron neural networks.

Unsupervised methods are used on a problem where there are only the inputs, and the goal is to learn or capture the inherent interesting structure in the data.

The methods are referred to as “unsupervised” to distinguish them from the “supervised” methods. There is no teacher, instead the models are updated based on repeated exposure to examples from the problem domain.

Examples of unsupervised machine learning problems include:

  • Clustering or the learning of the groups in the data.
  • Association of the learning of relationships in the data.

Examples of unsupervised machine learning algorithms include:

  • k-means.
  • apriori.
  • self-organizing map neural network.

You can learn more about supervised vs unsupervised methods in this post:

A tolerance interval describes the expected range of observations in a distribution. It could be used to identify outliers.

A confidence interval describes the expected range for a distribution parameter. It could be used to describe the accuracy or error of a model on average.

A prediction interval describes the expected range for an observation. It could be used to describe the uncertainty in a prediction.

I have tutorials showing how to calculate each type of interval on the blog.

A training dataset is used to train or fit a model.

A test dataset has observations that do not overlap with the training dataset and is used to evaluate a trained model. Specifically, to estimate the skill of the model on a new data sample.

A validation dataset typically refers to a portion of the training dataset, separated and used as a test dataset while the hyperparameters of the model are tuned.

You can learn more in this post:

Larger weights in a neural network (weights larger than they need to be) are a sign of overfitting and can make the model unstable.

Both weight regularization and weight constraints are regularization approaches intended to reduce overfitting (improve the generalization) of a neural network model.

Weight regularization updates the loss function used during training to penalize the model based on the size of the weights, calculated as the vector norm (magnitude) like L1 (sum of absolute weights) or L2 (sum of squared weights). Use of the L2 vector norm is often called “weight decay“.

Weight constraint is an if-then check during optimization for the size of the weights. If triggered, e.g. if the size of the weights calculated as the vector norm (often max norm) is larger than a pre-defined value, all weights are scaled so that the norm of the weights is below the desired level.

So “weight regularization” encourages the model to have small weights, where as “weight constraints” forces the model to have small weights.

In neural networks, a batch and an epoch are two hyperparametres that you must choose when training the network.

They are used in the stochastic gradient descent.

A sample is a single row of data, including the inputs for the network and the expected output.

A batch is a collection of samples that the network will process, after which the model weights will be updated. The model will make predictions for each sample in the batch, the error will be calculated by comparing the prediction to the expected value, an error gradient will be estimated and the weights will be updated. A training dataset is split into one or more batches.

An epoch involves one pass over the training dataset. One epoch is comprised of one or more batches, depending on the chosen batch size.

You can learn more about the difference between a batch and an epoch here:

You can learn more about stochastic gradient descent for training neural networks here:

A descriptive model is a model trained on historical data with the objective of understanding something about the problem, such as cause and effect.

Simpler models are used with strong theoretical foundation that aid in the understanding of the problem.

Descriptive modeling is often the goal in applied statistics and econometrics.

A predictive model is a model trained on historical data with the objective of making accurate predictions.

Complex models may be used that giver little if any idea of why specific predictions are made.

Predictive modeling is often the goal of applied machine learning, or a subfield referred to as predictive modeling.

A machine learning algorithm is a procedure that is run on training data to create a model.

It is the process that does the learning.

Some examples include:

  • Linear Regression.
  • Stochastic Gradient Descent with Backpropagation in a neural network.
  • Random Forest.

A machine learning model is the result the learning process of a machine learning algorithm.

A model is the “program” that is saved after training, later loaded and used to make predictions on new data.

Some examples include:

  • The coefficients from a linear regression.
  • The weights and structure of an artificial neural network.
  • The decision trees from a random forest.

For more information, see this tutorial:

Model parameters are internal to the model and learned by the training algorithm.

Examples of model parameters are the coefficients in a regression, weights in a neural network, and the split points in a decision tree.

The model parameters are the thing saved after training. They are the model.

Model hyperparameters are specified by you, the practitioner and often control the learning process. They are parameters that cannot be learned.

Examples of model hyperparameters are the number of epochs or training iterations in stochastic gradient descent and the maximum depth of decision trees.

The model hyperparameters are found by trial-and-error, by grid/random searching, or by comping examples that have worked in the past.

For more, see the post:

In statistics, a sample and a population are used to refer to data.

A sample is comprised of one or more individual observations drawn from a domain.

A population is the idealized notion of all possible observations from which specific samples of observations can be drawn.

In statistics, we often estimate parameters of the distribution of data in the population given a sample.

Generally, the no free lunch theorem suggests that no single machine learning method will perform better than any other when averaged across all possible problems.

The theorem concerns optimization and search, although has implications for predictive modeling with machine learning as most methods solve an optimization problem in order to approximate a function.

The implication is that there is no single algorithm that will be the best, let alone perform well, across all problems. Stated another way, all algorithms perform equally well when their performance is averaged across all possible problems. There is no silver bullet.

It is theoretical and assumes that we do not know anything about the problem, therefore cannot use knowledge of the problem to narrow the selection of algorithms. This is not true in practice.

Although there are no silver bullets, there are algorithms (such as Random Forest and Stochastic Gradient Boosting) that perform surprisingly well across many predictive modeling problems that we are interested in, i.e. a subset of all possible problems.

For more, see the tutorial:

Don’t start with the maths.

My mission is to help you get started and get good at applied machine learning using a “learn by doing” philosophy. I teach using a top-down and results-first approach.

Starting with the math is the classical, university way. The way of training a machine learning academic, not a modern practitioner. It does not have to be that hard. See this post

If you want the bottom-up theory-first approach to machine learning, I would recommend a textbook or a multi-year graduate program. It is a path to theoretical machine learning and academia.

That being said, it is generally recognized that eventually, you will need to know your away around the intersection of these fields of math:

Not all of these areas of math are relevant, only parts. Also, you need the intersections of these fields. For example, Linear Algebra + Statistics is Multivariate Analysis, and it’s needed to get into PCA and other projection methods, Linear Algebra + Calculus is Multivariate Calculus, needed for learning algorithms in deep learning. These are really hard postgraduate topics and not the place to start for developers that have a problem to solve.

Don’t make the beginners mistake of thinking you need to start here. Circle back after you know how to work through a predictive modeling problem end-to-end.

Learn math when you’re ready and only learn the relevant parts to help you get the most out of a method, or better results on your next project.

Read this:

Also, read this post:

I don’t have much experience in modeling sports datasets.

Nevertheless, I have seen some success in the use of rating systems to score teams, in team sports.

You can learn more about rating systems here:

A classical algoritm would be the Elo Rating system, a modern example would be Microsoft’s TrueSkill.

I love books.

I read a few books per week, I have a large reference library, and I’m always buying more books.

I would encourage you to read widely in machine learning.

To get started, here is a good list of books on machine learning:

I also have lists of books on specific subjects, for example:

Reading a book may not be enough, you have to know how to get the most out of it.

This post will give you some ideas on how to get the most out of the books that you are reading:

The specific programming language or platform that you use does not matter.

  • I strongly believe that the best thing to focus on is how to work through machine learning problems end-to-end (learn more).
  • That being said, I think if you’re not a strong programmer, that Weka is the best place to start because you can work through problems without writing a line of code (learn more).
  • I think Python is excellent for developing models that can run in production, it is a growing platform (learn more).
  • I think R might be the most powerful platform, but it requires learning a new programming language. I think the sweet spot for R is one-off projects and R&D projects (learn more).

Also, this post might help:

If pressed to answer, I would recommend that you start with Python (learn more).

It is possible for the performance of your model to get stuck during training.

This can happen with neural networks that achieve a specific loss, error or accuracy and no longer improve, showing the same score at the end of each subsequent epoch.

In the simplest case, if you have a fixed random number seed for the code example, then try changing the random seed, or do not specify the seed so that different random numbers are used for each run of the code.

Then try running the example a few times.

To learn more about randomness in machine learning see this post:

To learn more about pseudorandom number generators in machine learning, see this post:

If the problem is not the randomness, it is possible that your model has converged. If the skill of the model is not good after convergence, this is referred to as premature convergence.

You can address premature convergence by slowing down the learning by the model. You can do this by changing different hyperparameters for the learning process, such as:

  • Using a smaller learning rate.
  • Using a larger network (nodes or layers).
  • Using a training dataset with more examples.
  • And so on…

Premature convergence is a big area of study and I recommend reading up further on the topic if you need further ideas.

Machine learning algorithms use randomness, such as in the initialization, during learning, and in the evaluation of the algorithm.

Random numbers are calculated using a pseudorandom number generator.

Pseudorandom number generators can be seeded such that they produce the same sequence of numbers each time they are run. This can be useful to reproduce a model exactly, such as during a tutorial or as a final model trained on all available data.

The value used to seed the pseudorandom number generator does not matter. You can use any number you wish.

To learn more about pseudorandom number generators and when it is appropriate to seed them, see the post:

To learn more about how to seed the pseudorandom number generator for deep learning in Keras, see the post:

A hyperparameter is a user specified parameter of a model.

It allows you to tune the model to your specific problem.

By definition, a user parameter cannot be set automatically. There may exist heuristics to set the hyperparameter, but if they were reliable, they would be used instead of requiring for you to set the value for the hyperparameter.

You can learn more about parameters and hyperparameters here:

To set the value for a hyperparameter you must use careful experimentation with a robust test harness in order to discover values that work best for your specific dataset.

Some suggestions to speed up this process include:

  • Try heuristics reported in the literature.
  • Try values reported in the literature on similar problems.
  • Try a random search.
  • Try a grid search.
  • Try a heuristic search.

Standardization refers to scaling a variable that has a Gaussian distribution such that it has a mean of zero and a standard deviation of one.

Normalization refers to scaling a variable that has any distribution so that all values are between zero and one.

It is possible to normalize after standardizing a variable.

Generally, normalization and standardization are data scaling methods, and there are other methods that you may want to use.

Scaling methods are often appropriate with machine learning models that learn or make prediction on data using the distance between observations (e.g. k-nearest neighbors and support vector machines) and methods that calculate weighted sums of inputs (e.g. linear regression, logistic regression, and neural networks).

If you are still in doubt as to whether you should standardize, normalize, both, or something else, then I would recommend establishing a baseline model performance on your raw data, then experiment with each scaling method and compare the resulting skill of the model.

For more information on how to standardize data and normalize data, see the tutorial:

A Multilayer Perceptron or MLP can approximate a mapping function from inputs to outputs. They are flexible and can be adapted to most problems, nevertheless, they are perhaps more suited to classification and regression problems.

A Convolutional Neural Network or CNN was developed and is best used for image classification. They can also be used generally for working with data that has a spatial structure, such as a sequence of words and can be used for document classification.

A Recurrent Neural Network or RNN (such as the LSTM network) was developed for sequence prediction and are well suited for problems that have a sequence on input observations or a sequence of output observations. They are suitable for text data, audio data and similar applications.

Most useful network architectures are a hybrid, combining MLP, CNN and/or RNNs in some way.

For more information, see the post:

Some model evaluation metrics such as mean squared error (MSE) are negative when calculated in scikit-learn.

This is confusing, because error scores like MSE cannot actually be negative, with the smallest value being zero or no error.

The scikit-learn library has a unified model scoring system where it assumes that all model scores are maximized. In order this system to work with scores that are minimized, like MSE and other measures of error, the sores that are minimized are inverted by making them negative.

This can also be seen in the specification of the metric, e.g. ‘neg‘ is used in the name of the metric ‘neg_mean_squared_error‘.

When interpreting the negative error scores, you can ignore the sign and use them directly.

You can learn more here:

A predictive model will always have some error.

The model is an approximation for some unknown underlying perfect mapping function from inputs to the output being predicted.

Many aspects can introduce error into this mapping, for example:

  • The size and quality of the samples of observations, called random sampling error.
  • Noise in the observations, from random variation in the data to mislabeled examples.
  • Bias in the learning algorithm or chosen model.
  • Variance in the learning algorithm or model given the data sample.
  • The method of model evaluation.

We cannot find the best possible model for a given predictive modeling problem, it is intractable.

Instead, we must find the best model we can given the time and resources that we have available for the project.

The idea of good model skill is relative to a baseline on your specific problem, not comparisons of model skill across different problems.

Common baselines include the mean for regression, the model for classification or a persistence forecast in a time series problem.

You may get different model performance or a different prediction when you run code from a tutorial.

There are three main reasons why machine learning results may vary, they are:

  1. Stochastic machine learning algorithm.
  2. Stochastic algorithm evaluation.
  3. Rounding errors given hardware or platform.

For an in-depth tutorial on this topic, see:

Let’s take a closer look at each.

1. Stochastic machine learning algorithm means that you may get different results when you run the same algorithm on the same data on the same machine.

Relax. It is a feature, not a bug.

Many machine learning algorithms are stochastic, meaning they use a little randomness while learning, which means a slightly different model will be learned each time the same algorithm us run. The little bit of randomness helps the algorithm find better models, on average.

For more on this topic see:

Effective evaluation of stochastic machine learning algorithms requires you run the experiment many times and average the result. A single model evaluation run probably won’t cut it.

For more on this topic see:

If you are worried about the final model being different each time your code is run, fit many final models and average their predictions.

For more on this topic see:

DO NOT fix the seed for the pseudorandom number generator used by your machine learning algorithm. It’s a fragile kludge and ignores the nature of the learning system you are using.

2. Stochastic algorithm evaluation means that procedures like train/test split or k-fold cross-validation use some randomness when splitting up the dataset. This means although your algorithm may or may not be stochastic (above), you will get a different evaluation for the model each run.

Generally, it is a good idea to fix the seed for the pseudorandom number generator used by the evaluation procedure so you get the same split of data each time the code is run or each time the same test harness is used to evaluate different algorithms or algorithm configurations.

This tutorial may help:

3. Rounding errors given hardware platform means that different hardware or different version of the underlying math libraries may make different rounding errors. These errors can compound resulting in noticeable differences.

Although less likely, it is still an issue and is a fact of simulating floating point numbers for numerical computation on digital hardware.

 

I like to use different tools depending on the project.

Recently I have been focusing more attention on Python-based tools and libraries.

It seems that Python may be emerging as a dominant platform. Skills in Python for machine learning are in great demand. I am just serving this need.

See the post:

In many of the code examples, I will use the test set as the validation dataset.

This is a shortcut used in tutorials to get an idea of how well a model, like a neural network, is performing on a hold out dataset during training.

I do this sometimes for brevity, to keep the examples in the tutorials simple.

In general, it is not recommended to use the test set as a validation dataset.

A validation dataset is a subset of the training dataset that is not used to fit the model, and is instead used to evaluate model performance during training and perhaps tune the model hyperparameters.

A test set is a separate holdout dataset that is not used at all during training, and is only used after a model has been selected to estimate the expected performance of the chosen model or models.

For more on the difference between test and validation datasets, see the post:

I generally recommend that beginners to machine learning do not use notebooks such as IPython or Jupyter notebooks.

I also recommend that beginners don’t use fancy integrated development environments (IDEs).

Notebooks and IDEs are great tools once you know how things work and you have set up some expectations.

For a beginner, I see a host of the same issues again and again, such as:

  • Problems where code works on the command line but not in the notebook.
  • Problems with the notebook not presenting the output of models correctly.
  • Problems with the environment and missing libraries, even though they are installed.
  • Problems with paths.
  • Problems with hidden or obscured error messages.
  • Problems with new IDE-specific or notebook-specific error messages.

For beginners. I recommend writing code in a simple text editor and running scripts directly from the command line.

Running from the command line ensures:

  • You’re using the environment that you installed.
  • You see error messages directly from the interpreter.

Incidentally, I learned to program this way in a few different languages (Java, C, C++, Ruby, Python, FORTRAN, etc.) and recommend to new developers as well.

I write more about my recommended machine learning development environment in this post:

You are working on a time series forecasting problem and you plot your forecasted time series against the actual time series and it looks like the forecast is one step behind the actual.

This is common.

It means that your model is making a persistence forecast. This is a forecast where the input to the forecast (e.g. the observation at the previous time step) is predicted as the output.

The persistence forecast is used as a baseline method for comparison on time series forecasting. You can learn more about the method here:

The persistence forecast is the best that we can do on challenging time series forecasting problems, such as those series that are a random walk, like short range movements of stock prices. You can learn more about this here:

If your sophisticated model, such as a neural network, is outputting a persistence forecast, it might mean:

  • That the model requires further tuning.
  • That the chosen model cannot address your specific dataset.
  • It might also mean that your time series problem is not predictable.

GAN models do not converge.

Instead, the generator and the discriminator models find a stable equilibrium (hopefully). The generator deceives the discriminator at some level (but not all the time) and the discriminator effectively classifies real and generated images (but not all the time).

For more on this relationship between the models, see this tutorial:

If the generator and the discriminator models do not find an equilibrium, and training does converge. This is bad and the models are probably now useless. This is called a GAN failure mode.

For more on this topic, see the tutorial:

So how do you know when to stop training the GAN models?

Good question. You can use the generator to generate images every few epochs, review the images and decide whether to keep training. Or train for a long time, save the model every few epochs, then at the end of the run use each model to generate images and choose the model that was saved that generates the best images.

There are also metrics you can use toe valuate the generated images and in turn the “skill” of the generator model.

For more on this topic, see the tutorial:

Why does a developer need to spend time learning machine learning?

This is an important question!

The short answer:

Because machine learning provides techniques to learn a solution from historical examples for complex problems where it is intractable or infeasible to develop a manual solution.

The longer answer:

See the post:

I call learning the math and theory for machine learning first the “bottom-up” approach to machine learning.

It is the approach taught by universities and used in textbooks.

It requires that you learn the mathematical prerequsites, then the general theories of the field, then the equations and their derivations for each algorithm.

  • It is much slower.
  • It is much harder.
  • It is great for training academics (not practitioners).

A final problem is, that is where the bottom-up approach ends.

I teach an alternative approach that inverts the process called “top-down” machine learning.

We start by learning the process of how to work through predictive modeling problems end to end, from defining the problem to making predictions. Then we practice this process and get good at it. We start by learning how to deliver results and add value.

Later we circle back to the math and theory, but only in the context of the process. Meaning, only the theory and math that helps us deliver better results faster is considered.

You can learn more about the contrast between these two approaches here:

You can learn how to get started with this approach here:

But it’s Dangerous!

I have seen this criticism a lot.

It is dangerous for beginners to use algorithms they don’t understand to make predictions that the business depends upon.

I agree.

  • I agree for the same reason that I think a student learning to drive should not drive the school bus.
  • I agree for the same reason that I think a student learning to code should not put their hello world code into production.

But,

  • The student driver can practice and get good enough to drive the school bus eventually.
  • The student coder can practice and get good enough to put code into production.

Trust is earned in machine learning, just like with any other profession or skill.

Does knowing how the math of an algorithm works give you that trust?

Maybe, but probably not.

  • Does knowing how a combustion engine works give you trust enough to drive?
  • Does knowing how a compiler works give you trust enough to push code to production?

I write more about this here:

But Math is Required!

It is, just not first.

Learning how algorithms work and about machine learning theory can make you a better machine learning practitioner.

But, it can come later, and it can come progressively.

You can iteratively dip into textbooks and papers, as needed, with a specific focus of learning a specific thing that will make you better, faster or more productive.

Knowing how an algorithm works is important, but it cannot tell you much about when to use it.

In supervised machine learning, we are using data to build a model to approximate an unknown and noisy mapping function. If we knew enough about this function in order to correctly choose the right algorithm, we probably don’t need machine learning (e.g. we could use statistics and descriptive modeling of already understood relationships).

The badly kept secret in machine learning is that you can use machine learning algorithms like black boxes, at least initially, because the hard part is actually figuring out how to best frame the problem, prepare the data and figure out which of one thousand methods might perform well.

You can learn more about this here:

The math does not have to come first. It can, if you prefer to learn that way, but perhaps this site is not the best place for you to start.

Help Me Questions (43)

Thanks for asking.

Sorry, I’m generally not interested in new business opportunities.

I am focused on writing and helping my readers, and developing new opportunities will take a large amount of time and mental space away from this. I’m sure that you can understand.

Nevertheless, if you still want to pitch your idea, you can contact me directly.

Yes, I’m happy to answer questions about machine learning or my tutorials.

You can contact me directly via this form:

One question per message, please.

I don’t open attachments or links for security reasons.

I also don’t have the capacity to review your code/data/paper. I hope you can understand.

Thanks for asking.

Sorry, I cannot meet up for a coffee and a chat about machine learning and your project.

I’m focused on writing and helping my readers, and meeting up for chats throughout the day would take a lot of time and mental space away from this. I’m sure you can understand.

If you have a question about applied machine learning, you can contact me directly.

Sorry, I don’t provide support via telephone.

I don’t share my phone number (cell number), and I generally don’t answer my phone with calls from unknown numbers.

You can contact me directly via my contact page.

Thanks for asking, I’m flattered.

Sorry, I do not offer an internship program. I just don’t have the capacity.

Perhaps in the future.

Thanks for the generous offer. I really appreciate it.

I am not looking for help at the moment.

The best help that you can give is to help to get the word out about the site. Perhaps post a link to the site on social media? Or get involved in the comments on the tutorials?

Yes, but understand that all code and material on my site and in my books was developed and provided for educational purposes only.

I take no responsibility for the code, what it might do, or how you might use it.

If you use my code or material in your own project, please reference the source, including:

  • The Name of the author, e.g. “Jason Brownlee”.
  • The Title of the tutorial or book.
  • The Name of the website, e.g. “Machine Learning Mastery”.
  • The URL of the tutorial or book.
  • The Date you accessed or copied the code.

For example:

  • Jason Brownlee, Machine Learning Algorithms in Python, Machine Learning Mastery, Available from https://machinelearningmastery.com/machine-learning-with-python/, accessed April 15th, 2018.

Also, if your work is public, contact me, I’d love to see it out of general interest.

Sorry, I really don’t have the capacity for one-on-one coaching.

If you have a specific machine learning question, perhaps I have some ideas.

You can contact me directly with your question.

Thanks for asking.

Sorry, no.

I try to avoid conferences these days and prefer to catch up on the exciting results by skimming the papers or watching the videos online.

Going to conferences takes a lot of time and resources away from reading, writing and helping my readers.

Thanks for asking.

I would love to help, but I just don’t have the capacity.

I do offer a structured and top-down approach to machine learning self-study.

You can learn more about it here:

I am happy to continue to answer any machine learning questions you might have by email (one question at a time please).

Thanks for asking, I’m flattered, especially since we have never met or spoken.

Sorry, I cannot be your Ph.D. advisor or supervisor.

I am not affiliated with a university and I do not advise research students.

I’m eager to help, but I don’t have the capacity to customize the code for your specific needs.

I get a lot of requests like this. I’m sure you can understand my rationale.

I do have some ideas that might help:

  • Perhaps I already have a tutorial with the change you’re asking for? Search the blog.
  • Perhaps you can try to make the change yourself?
  • Perhaps you can add a comment below the post with the change you need and I or another reader can make a suggestion?
  • Perhaps you can hire a contractor or programmer to make the change?
  • Perhaps you can post a description of the code needed on stackoverflow.com?

Thanks for asking, I’m very flattered.

Sorry, I don’t have the capacity to come and give a presentation or a talk (even if all expenses are paid).

Traveling, preparing and giving a talk is a huge time investment that I cannot spare.

I am laser-focused on writing and helping my readers. I believe working on the site is the most efficient way that I can help the most people.

Sorry, I don’t have the capacity to review and comment on your question posted to StackOverflow or the StackExchange network.

Perhaps you can summarise your specific problem in a sentence or two and ask me via the contact page.

Do you have a question about someone else’s:

  • Research paper?
  • Blog post?
  • Diagram?
  • Code?

I’m eager to help, but reading a paper (or someone else’s material) to the level required to then explain it to you requires a large amount of time and effort. I just don’t have the capacity to this for every request that I get.

My best advice is to contact the author and ask your questions directly.

I believe an honest academic will want their work read and understood.

If you prepare well (e.g. do your homework), if you’re courteous (e.g. humble, polite, and not demanding), and if you are clear with your questions (e.g. specific), I would expect a helpful response.

Thanks for asking, I’m flattered.

I have some research degrees and I loved research then and love it now. But I don’t think I’m good at it.

In fact, I like reading and figuring out what others have learned perhaps more than devising my own research program. Perhaps I’m more engineer and scholar than academic researcher.

As such, I don’t feel qualified to give you advice on your research.

I recommend talking things over with your research advisor. After all, this is exactly their job, you chose them, and they chose you.

Also, very clever people have written up their advice.

I recommend reading:

Also these classics:

Sorry, I try hard not to give specific career advice.

My best advice on the topic of career is to follow a path that most interests and most excites you.

Life is short but the days are long. I think it is important to work on things that fully engage your interests.

Sorry, I cannot help you get a job directly.

But I have some advice.

I recommend that you focus your learning on how to work through predictive modeling probelms end-to-end, from problem definition to making predictions. I describe this process here:

I recommend practicing this process with your chosen tools/libraries and develop a portfolio of completed machine learning projects. This portfolio can be used to demonstrate your growing skills and provide a code base that you can leverage on larger and more sophisticated projects.

You can learn more about developing a machine learning portfolio here:

I recommend searching for a job in smaller companies and start-ups that value your ability to deliver results (demonstrated by your portfolio) over old-fashioned ways of hiring (e.g. having a degree on the topic).

I’m eager to help, but I just don’t have the capacity to help you setup or debug work workstation.

Also, I am not an expert in debugging workstations and development environments.

My material is generally intended for those that know their way around their own workstation and know how to install software.

Check these tutorials for setting up your environment:

If you continue to have problems, consider posting your question and issue to StackOverflow.

Sorry, I cannot help you develop a predictive model for the lottery.

The draw of numbers in a lottery are random events, by design.

A draw of random events cannot be predicted.

Even if you have a large dataset with hundreds or thousands of previous lottery draws.

Sorry, I have not seen a windows machine in nearly 2 decades.

I don’t know a thing about windows.

In fact, I’m not an expert at debugging workstations.

I recommend posting your question or issue on stackoverflow.com.

Sorry, I cannot help you with machine learning for predicting the stock market, foreign exchange, or bitcoin prices.

I do not have a background or interest in finance.

I’m really skeptical.

I understand that unless you are operating at the highest level, that you will be eaten for lunch by the fees, by other algorithms, or by people that are operating at the highest level.

To get an idea of how brilliant some of these mathematicians are that apply machine learning to the stock market, I recommend reading this book:

I love this quote from a recent Freakonomics podcast, asking about people picking stocks:

It’s a tax on smart people who don’t realize their propensity for doing stupid things.

— Barry Ritholtz, The Stupidest Thing You Can Do With Your Money, 2017.

I also understand that short-range movements of security prices (stocks) are a random walk and that the best that you can do is to use a persistence model.

I love this quote from the book “A Random Walk Down Wall Street“:

A random walk is one in which future steps or directions cannot be predicted on the basis of past history. When the term is applied to the stock market, it means that short-run changes in stock prices are unpredictable.

— Page 26, A Random Walk down Wall Street: The Time-tested Strategy for Successful Investing, 2016.

You can discover more about random walks here:

But we can be rich!?!

I remain really skeptical.

Maybe you know more about forecasting in finance than I do, and I wish you the best of luck.

What about finance data for self-study?

There is a wealth of financial data available.

If you are thinking of using this data to learn machine learning, rather than making money, then this sounds like an excellent idea.

Much of the data in finance is in the form of a time series. I recommend getting started with time series forecasting here:

Generally, I recommend that you complete homework and assignments yourself.

You have chosen a course and (perhaps) have even paid money to take the course. You have chosen to invest in yourself via self-education.

In order to get the most out of this investment, you must do the work.

Also, you (may) have paid the teachers, lectures and support staff to teach you. Use that resource and ask for help and clarification about your homework or assignment from them. They work for you in some sense, and no one knows more about your homework or assignment and how it will be assed than them.

Nevertheless, if you are still struggling, perhaps you can boil your difficulty down to one sentence and contact me.

Thanks for asking.

Sorry, I cannot help you with your project.

I’m eager to help, but I don’t have the capacity to get involved in your project at the level you need or at a level to do a good job.

I’m sure you can understand my position, as I get many of requests to help with projects each day.

Nevertheless, I am happy to answer any specific questions you have about machine learning.

That is very generous of you to ask, especially since we have never spoken or met.

No thank you.

I am laser focused on helping practitioners by writing tutorials and books.

Thanks for asking.

I’m eager to help, but I don’t have the capacity to read and review your research paper and give feedback. I’m sure you can understand.

I have some ideas:

  • Perhaps you can find a peer in your research group, school or class to give you feedback on your paper?
  • Perhaps you can get help and feedback on your paper from your teacher, advisor, or professor?
  • Perhaps you can hire a tutor or coach with skills in machine learning to give you feedback on your paper?

I am happy to answer any specific questions you have about machine learning.

Thanks for asking.

I’m eager to help, but I just don’t have the capacity to debug code for you.

I am happy to make some suggestions:

  • Consider aggressively cutting the code back to the minimum required. This will help you isolate the problem and focus on it.
  • Consider cutting the problem back to just one or a few simple examples.
  • Consider finding other similar code examples that do work and slowly modify them to meet your needs. This might expose your misstep.
  • Consider posting your question and code to StackOverflow.

Sorry, no.

I want to stay away from giving recruiting advice. The great people I know have jobs.

From what I have experienced, the great people are always fully engaged. Your job is to find them and offer them something more interesting.

Perhaps start by looking for a contractor on a site like Upwork?

Sorry, I don’t have a forum or slack channel.

You can ask questions on a relevant blog posts or contact me.

Thanks for asking, I’m flattered.

Sorry, I no longer take on contracting or consulting projects.

My focus is on writing and helping readers through the website and books.

Consulting is a huge investment of time and resources for me. It takes too much time and mental space away from writing new tutorials and books, which I think is the most effective way I can help the most people.

Thanks for asking, I’m flattered.

Sorry, I do not offer in-person training for individuals or teams.

I am focused on writing and helping my readers as I believe it is the best way that I can help the most people.

Sorry, I do not run any in-person meet-ups.

I am 100% focused on writing for the site and answering the questions of my readers.

I am not an oracle. Just a man.

I have a narrow focus on the most used methods for applied machine learning, specifically predictive modeling with supervised learning methods.

  • I don’t follow the latest papers, most methods will disappear and we will forget their names by next year.
  • In fact, most papers are probably not replicable (e.g. a type of academic fraud).
  • I don’t use unsupervised learning methods, I just don’t find them useful (for now).
  • I don’t read internet news, I’m busy.
  • I am not an expert in every method, nor could anyone be.

I share what I know, I read to know more, and I’m very clear and honest about what I don’t know when you ask.

I don’t know.

My focus is industrial machine learning and helping practitioners.

I no longer have an opinion on schools and courses.

If it looks good to you, go for it.

Remember, you do not need a higher degree to do very well in applied machine learning.

My best advice is to work on problems or with technology that most interests you.

Here are some more specific suggestions:

Consider working through a standard machine learning dataset:

Consider working on some more advanced datasets from competitive machine learning:

Consider working on problems that matter to you:

Consider devising your own projects:

Thanks for asking, I’m flattered.

I’m sorry, I don’t know what research topic that you should work on.

Generally, I would recommend picking a topic that you are really excited about. Motivation is important and research projects can take many years to complete. It is important that you spend that time on a project in which you are deeply interested.

I also think that the best person for you to talk to about research topics is your research advisor. This is their job.

Best of luck with your project.

Great!

  • If the error is on a blog post, please leave a comment. I will read it, make the fix and reply to you.
  • If the error is in a book, please contact me so that I can fix it and update the book ASAP.
  • If the error is with an open source library, contact the user group for that library.

I would recommend asking in places where machine learning and data science professionals hangout.

Some ideas include:

  • Forums (e.g. Kaggle)
  • Groups (e.g. Facebook, LinkedIn, Google+)
  • Q&A Sites (e.g. StackOverflow, CrossValidated, Quora)
  • Freelancing sites (e.g. UpWork)
  • MeetUps

I recommend searching the repositories of standard machine learning datasets.

For example:

Try Google’s dataset search:

Maybe you need to collect your own dataset for your specific problem?

I have some blog posts listing popular datasets, for example:

Beyond that, I recommend searching via Google search.

I would recommend searching on Google Scholar:

Are you looking for sample code for a specific machine learning method?

Perhaps I have a tutorial with sample code on the blog. Search the blog:

If I do not have an example on the blog, I cannot give you an idea of where to get code off-hand.

Nevertheless, I have some ideas here:

Building a portfolio of completed machine learning projects is a great way to leverage past work on new projects and to demonstrate your growing skill.

I talk about this more here:

I recommend building your portfolio in public, some suggestions of where you might do this include:

  • Website
  • Blog
  • GitHub
  • YouTube
  • SlideShare
  • Facebook
  • LinkedIn

Pick a medium and location with which you are most comfortable.

It might be better if you owned the location, such as your own website, so that you have more control over the future of your portfolio.

I will get to it eventually I hope.

Until then, contact me and let me know about the topic you want me to cover.

Email Questions (15)

Yes, you can change the email address used to receive emails from Machine Learning Mastery.

Contact me, and let me know your old and the new email addresses.

Yes.

Use the contact form.

It will go straight to my inbox.

I process emails in a batch, typically once or twice per day. This means I might not respond for 12 to 24 hours.

I often get 100s of emails per day, so please keep it brief. No big essays, code dumps, or long lists of questions. One question per email please.

No.

I do not offer the ability to consolidate emails to once per week, or other intervals, sorry.

No.

I fear that crazy beginners will use the list to subscribe to everything, and then complain that I send too many emails.

Find one topic on the site that you want to learn about the most and then subscribe to the email course.

Sorry, I do not have that capability to resend emails.

Each email course is also available as a blog post, search the blog to find it and access all of the lessons.

Yes.

If you reply to one of my emails I will read it and reply.

I do get 100s of emails per day, so please do not send me essays or lists of questions.

One question per email, please.

Be kind.

I have a weekly email newsletter that lists all of the new blog posts, tutorials, and news about upcoming books.

You can sign-up for it here:

What if you went crazy and subscribed to a bunch of courses.

No problem, you can manage your subscriptions yourself.

  1. Click the “Unsubscribe” link in any email that I send.
  2. This will take you to a webpage that says “Sorry to see you go!“.
  3. Click the link titled “Manage my subscriptions“.
  4. This will take you to a page called “My Campaigns“.
  5. Choose the email courses that you do and do not want to be subscribed to by clicking buttons next to each course you are or have been previously subscribed to.
  1. Click the “Unsubscribe” link in any email that I send.

I big mistake I see in beginners is that they sign-up to multiple courses at once. If you do this, you will get an email from each course.

I strongly recommend against this!

Take one email course at a time.

Pick one topic to focus on, and only take one email course on that topic.

I use a platform to manage my email courses and marketing automation called Drip.

You can learn more here:

Perhaps you have signed-up to too many courses at once?

I have a number of email courses.

  • My main course sends about 3 emails per week.
  • My short courses send emails once per weekday for usually one or two weeks, depending on the course.
  • My newsletter comes out once per week.

If you are taking one of my email courses, you will also get the newsletter.

I use email courses because, generally, my readers really love them.

I get tons of replies per day that say “thanks” or “done” or “can I have the next lesson early?“.

You have fine-grained control over what courses you subscribe to. You must choose to sign-up to them and you can then unsubscribe from a specific course or from any course at any time.

Sorry, I don’t have the capacity to help via social media channels such as Google+, Facebook, Twitter and LinkedIn.

If you have a question, please contact me directly.

I will not share or sell your email address with any third parties.

I have built my business based on a trust with my readers and my customers and this trust is very important to me.

Your email address is only used to send email course material and newsletters from my site.

Website Questions (17)

Machine Learning Mastery is based in Australia, although we have readers and customers in the EU. Therefore, I have done my best in good faith to make Machine Learning Mastery compliant with the General Data Protection Regulation (GDPR).

You can access the policies for this website here:

If you have questions about your personal information on Machine Learning Mastery or compliance with GDPR please contact me directly.

No.

I do not have ads on my site, except links to my own books and courses.

This is a decision that I made carefully when I started the site.

Thanks for the offer, I’m flattered.

Please do not translate my posts or books into another language.

I have thought carefully about this:

  • I prefer to retain complete control over my content.
  • I do not have the capacity to setup and work through the deal.
  • I update my material frequently.

No.

Sorry, I don’t accept guest posts.

Yes.

Please ask your question in the comments of the blog post.

No.

I don’t do link trading.

I link to content that I think will help my readers learn something about machine learning at the time I write each tutorial or post.

If you think you honestly have a relevant addition to a post, then post it in the comments of that post. It is an excellent way to contribute to the ongoing discussion.

I have an RSS feed for blog posts, you can access it via the following URL:

Thanks for your interest in my tutorials.

You can print a tutorial from the webpage as you would print any other webpage.

I do have a sign-up form, called a toaster form, in the bottom right of the page. When printing, this may obscure the print out.

This can be removed easily. Simply submit the form with your email address. The form will then disappear and not be shown again on any tutorial on my site.

Sample code is presented in tutorials on this website using a special plugin that lets you easily copy and paste the code into your editor.

It is important that when you copy code from a tutorial that white space is preserved. This is because in languages such as Python tabs and new lines are part of the language and must be used exactly as they appear in the tutorial code for the example to work correctly.

When you hover on the code box you will see a small menu appear with buttons.

For example:

Click on the second button from the far right on this menu.

It looks like two sheets of paper.

For example:

This will highlight or select all of the code in the code box.

Copy the highlighted code. The specifics of how to copy selected code depend on your platform.

  • Right click on the code and click “copy“.
  • Or, if on windows on linux: hold down the “control” key and press the “c” key on the keyboard.
  • Or, if on mac: hold down the “command” key and press the “c” key on the keyboard.

You will now have code copied onto your clipboard.

Open your editor and paste the code from your clipboard into the editor.

This will vary depending on your platform and your editor. Ensure you have a new document open in your editor:

  • Right click on the new document in the editor and click “paste“.
  • Or, if your editor has a menu, click the “Edit” menu and click “Paste“.
  • Or, if on windows or linux: click on the new document and hold the “control” key and press the “v” key.
  • Or, if on mac: click on the new document and hold the “command” key and press the “v” key.

You will now have the code in your editor will all white space preserved.

You can now run the code example. Ensure that any data files that the code example depend upon are in the same directory as the code file.

Thanks for your interest in citing my material.

You might like to include the following fields:

  • The Name of the author, e.g. “Jason Brownlee”.
  • The Title of the tutorial or book, e.g. “Deep Learning with Time Series Forecasting”
  • The Name of the publisher, e.g. “Machine Learning Mastery”.
  • The URL of the tutorial or book.
  • The Date you accessed or copied the code.

For example:

Jason Brownlee, Deep Learning with Time Series Forecasting, Machine Learning Mastery, Available from https://machinelearningmastery.com/machine-learning-with-python/, accessed November 6th, 2018.

Also, if your work is public, contact me, I’d love to see it out of general interest.

Great! I want the posts to be the best they can be.

Please let me know.

Either contact me or leave a comment on the post.

I will fix up the post ASAP.

Thanks for asking.

Perhaps you’re thinking of setting up your own website. If so, I wish you the best of luck!

Here is a summary of the software that I use to run this site:

Each tutorial links to the dataset used.

Each book provides the code and datasets used as a bonus, see the “code/” directory.

I have a copy of each dataset used on GitHub. The reason for this is that sometimes the third party websites that host the datasets go down, or take down datasets (as has happened recently).

You can access them here:

All comments on this blog are moderated, by me.

I read every one, and reply if I can.

I often moderate comments in a batch, once per day. Please be patient.

After it is moderated, your comment will appear on the blog.

Thanks for noticing.

Machine learning can sometimes be a dry topic. I use colorful photos, mostly of nature, to add some life to the tutorials.

The photos are a reminder to me, and to you that there is a wider world out there.

I’m sorry to hear that you are having trouble running a code example.

All code tutorials were tested prior to being posted and are updated frequently.

I have some suggestions to try:

If you think you have found a bug, please let me know.

I’m sorry to hear that Machine Learning Mastery is blocked in your country.

I try to make my tutorials accessible to everyone and I am not political in any way.

Regardless, my website is blocked in some countries.

Any block in access to my website is controlled either by your government or your internet service provider.

An example is Iran that blocks access to my site:

Perhaps you can check with your government or your internet service provider as to why they have blocked access?

Customer Questions (78)

Thanks for your interest.

Sorry, I do not support third-party resellers for my books (e.g. reselling in other bookstores).

My books are self-published and I think of my website as a small boutique, specialized for developers that are deeply interested in applied machine learning.

As such I prefer to keep control over the sales and marketing for my books.

I’m sorry, I don’t support exchanging books within a bundle.

The collections of books in the offered bundles are fixed.

My e-commerce system is not sophisticated and it does not support ad-hoc bundles. I’m sure you can understand. You can see the full catalog of books and bundles here:

If you have already purchased a bundle and would like to exchange one of the books in the bundle, then I’m very sorry, I don’t support book exchanges or partial refunds.

If you are unhappy, please contact me directly and I can organize a refund.

Thanks for your interest.

I’m sorry,  I cannot create a customized bundle of books for you. It would create a maintenance nightmare for me. I’m sure you can understand.

My e-commerce system is not very sophisticated. It cannot support ad-hoc bundles of books or the a la carte ordering of books.

I do have existing bundles of books that I think go well together.

You can see the full catalog of my books and bundles available here:

Sorry, I don’t sell hard copies of my books.

All of the books and bundles are Ebooks in PDF file format.

This is intentional and I put a lot of thought into the decision:

  • The books are full of tutorials that must be completed on the computer.
  • The books assume that you are working through the tutorials, not reading passively.
  • The books are intended to be read on the computer screen, next to a code editor.
  • The books are playbooks, they are not intended to be used as references texts and sit the shelf.
  • The books are updated frequently, to keep pace with changes to the field and APIs.

I hope that explains my rationale.

If you really do want a hard copy, you can purchase the book or bundle and create a printed version for your own personal use. There is no digital rights management (DRM) on the PDF files to prevent you from printing them.

Sorry, I cannot create a purchase order for you or fill out your procurement documentation.

You can complete your purchase using the self-service shopping cart with Credit Card or PayPal for payment.

After you complete the purchase, I can prepare a PDF invoice for you for tax or other purposes.

Sorry, no.

I cannot issue a partial refund. It is not supported by my e-commerce system.

If you are truly unhappy with your purchase, please contact me about getting a full refund.

I stand behind my books, I know the tutorials work and have helped tens of thousands of readers.

I am sorry to hear that you want a refund.

Please contact me directly with your purchase details:

  • Book Name: The name of the book or bundle that you purchased.
  • Your Email: The email address that you used to make the purchase (note, this may be different to the email address you used to pay with via PayPal).
  • Order Number: The order number in your purchase receipt email.

I will then organize a refund for you.

I would love to hear why the book is a bad fit for you.

Anything that you can tell me to help improve my materials will be greatly appreciated.

I have a thick skin, so please be honest.

Sample chapters are provided for each book.

Each book has its own webpage, you can access them from the catalog.

On each book’s page, you can access the sample chapter.

  1. Find the section on the book’s page titled “Download Your Sample Chapter“.
  2. Click the link, provide your email address and submit the form.
  3. Check your email, you will be sent a link to download the sample.

If you have trouble with this process or cannot find the email, contact me and I will send the PDF to you directly.

Yes.

I can provide an invoice that you can use for reimbursement from your company or for tax purposes.

Please contact me directly with your purchase details:

  • The name of the book or bundle that you purchased.
  • The email address that you used to make the purchase.
  • Ideally, the order number in your purchase receipt email.
  • Your full name/company name/company address that you would like to appear on the invoice.

I will create a PDF invoice for you and email it back.

Sorry, I no longer distribute evaluation copies of my books due to some past abuse of the privilege.

If you are a teacher or lecturer, I’m happy to offer you a student discount.

Contact me directly and I can organize a discount for you.

Sorry, I do not offer Kindle (mobi) or ePub versions of the books.

The books are only available in PDF file format.

This is by design and I put a lot of thought into it. My rationale is as follows:

  • I use LaTeX to layout the text and code to give a professional look and I am afraid that EBook readers would mess this up.
  • The increase in supported formats would create a maintenance headache that would take a large amount of time away from updating the books and working on new books.
  • Most critically, reading on an e-reader or iPad is antithetical to the book-open-next-to-code-editor approach the PDF format was chosen to support.

My materials are playbooks intended to be open on the computer, next to a text editor and a command line.

They are not textbooks to be read away from the computer.

Sorry, all of my books are self-published and do not have ISBNs.

Thanks for your interest in my books

I’m sorry that you cannot afford my books or purchase them in your country.

I don’t give away free copies of my books.

I do give away a lot of free material on applied machine learning already.

You can access the best free material here:

Maybe.

I offer a discount on my books to:

  • Students
  • Teachers
  • Retirees

If you fall into one of these groups and would like a discount, please contact me and ask.

Sorry, the books and bundles are for individual purchase only.

I do not respond to RFIs or similar.

Maybe.

I support payment via PayPal and Credit Card.

You may be able to set up a PayPal account that accesses your debit card. I recommend contacting PayPal or reading their documentation.

Sorry no.

I do not support WeChat Pay or Alipay at this stage.

I only support payment via PayPal and Credit Card.

Yes, you can print the purchased PDF books for your own personal interest.

There is no digital rights management (DRM) on the PDFs to prevent you from printing them.

Please do not distribute printed copies of your purchased books.

You can review the table of contents for any book.

I provide two copies of the table of contents for each book on the book’s page.

Specifically:

  1. A written summary that lists the tutorials/lessons in the book and their order.
  2. A screenshot of the table of contents taken from the PDF.

If you are having trouble finding the table of contents, search the page for the section titled “Table of Contents”.

No.

I only support payment via PayPal or Credit Card.

Yes.

If you purchase a book or bundle and later decide that you want to upgrade to the super bundle, I can arrange it for you.

Contact me and let me know that you would like to upgrade and what books or bundles you have already purchased and which email address you used to make the purchases.

I will create a special offer code that you can use to get the price of books and bundles purchased so far deducted from the price of the super bundle.

I am happy for you to use parts of my material in the development of your own course material, such as lecture slides for an in person class or homework exercises.

I am not happy if you share my material for free or use it verbatim. This would be copyright infringement.

All code on my site and in my books was developed and provided for educational purposes only. I take no responsibility for the code, what it might do, or how you might use it.

If you use my material to teach, please reference the source, including:

  • The Name of the author, e.g. “Jason Brownlee”.
  • The Title of the tutorial or book.
  • The Name of the website, e.g. “Machine Learning Mastery”.
  • The URL of the tutorial or book.
  • The Date you accessed or copied the code.

For example:

  • Jason Brownlee, Machine Learning Algorithms in Python, Machine Learning Mastery, Available from https://machinelearningmastery.com/machine-learning-with-python/, accessed April 15th, 2018.

Also, if your work is public, contact me, I’d love to see it out of general interest.

Thanks for asking.

Sorry, no.

I prefer to keep complete control over my content for now.

Sorry no.

My books are self-published and are only available from my website.

Generally no.

I don’t have exercises or assignments in my books.

I do have end-to-end projects in some of the books, but they are in a tutorial format where I lead you through each step.

The book chapters are written as self-contained tutorials with a specific learning outcome. You will learn how to do something at the end of the tutorial.

Some books have a section titled “Extensions” with ideas for how to modify the code in the tutorial in some advanced ways. They are like self-study exercises.

Sorry, I do not offer a certificate of completion for my books or my email courses.

Sorry, new books are not included in your super bundle.

I release new books every few months and develop a new super bundle at those times.

All existing customers will get early access to new books at a discount price.

Note, that you do get free updates to all of the books in your super bundle. This includes bug fixes, changes to APIs and even new chapters sometimes. I send out an email to customers for major book updates or you can contact me any time and ask for the latest version of a book.

No.

I have books that do not require any skill in programming, for example:

Other books do have code examples in a given programming language.

You must know the basics of the programming language, such as how to install the environment and how to write simple programs. I do not teach programming, I teach machine learning for developers.

You do not need to be a good programmer.

That being said, I do offer tutorials on how to setup your environment efficiently and even crash courses on programming languages for developers that may not be familiar with the given language.

No.

My books do not cover the theory or derivations of machine learning methods.

This is by design.

My books are focused on the practical concern of applied machine learning. Specifically, how algorithms work and how to use them effectively with modern open source tools.

If you are interested in the theory and derivations of equations, I recommend a machine learning textbook. Some good examples of machine learning textbooks that cover theory include:

I generally don’t run sales.

If I do have a special, such as around the launch of a new book, I only offer it to past customers and subscribers on my email list.

I do offer book bundles that offer a discount for a collection of related books.

I do offer a discount to students, teachers, and retirees. Contact me to find out about discounts.

Sorry, I don’t have videos.

I only have tutorial lessons and projects in text format.

This is by design. I used to have video content and I found the completion rate much lower.

I want you to put the material into practice. I have found that text-based tutorials are the best way of achieving this. With text-based tutorials you must read, implement and run the code.

With videos, you are passively watching and not required to take any action. Videos are entertainment or infotainment instead of productive learning and work.

After reading and working through the tutorials you are far more likely to use what you have learned.

Yes, I offer a 90-day no questions asked money-back guarantee.

I stand behind my books. They contain my best knowledge on a specific machine learning topic, and each book as been read, tested and used by tens of thousands of readers.

Nevertheless, if you find that one of my Ebooks is a bad fit for you, I will issue a full refund.

There are no physical books, therefore no shipping is required.

All books are EBooks that you can download immediately after you complete your purchase.

I support purchases from any country via PayPal or Credit Card.

Yes.

I recommend using standalone Keras version 2.4 (or higher) running on top of TensorFlow version 2.2 (or higher).

All tutorials on the blog have been updated to use standalone Keras running on top of Tensorflow 2.

All books have been updated to use this same combination.

I do not recommend using Keras as part of TensorFlow 2 yet (e.g. tf.keras). It is too new, new things have issues, and I am waiting for the dust to settle. Standalone Keras has been working for years and continues to work extremely well.

There is one case of tutorials that do not support TensorFlow 2 because the tutorials make use of third-party libraries that have not yet been updated to support TensorFlow 2. Specifically tutorials that use Mask-RCNN for object recognition. Once the third party library has been updated, these tutorials too will be updated.

The book “Long Short-Term Memory Networks with Python” is not focused on time series forecasting, instead, it is focused on the LSTM method for a suite of sequence prediction problems.

The book “Deep Learning for Time Series Forecasting” shows you how to develop MLP, CNN and LSTM models for univariate, multivariate and multi-step time series forecasting problems.

Mini-courses are free courses offered on a range of machine learning topics and made available via email, PDF and blog posts.

Mini-courses are:

  • Short, typically 7 days or 14 days in length.
  • Terse, typically giving one tip or code snippet per lesson.
  • Limited, typically narrow in scope to a few related areas.

Ebooks are provided on many of the same topics providing full training courses on the topics.

Ebooks are:

  • Longer, typically 25+ complete tutorial lessons, each taking up to an hour to complete.
  • Complete, providing a gentle introduction into each lesson and includes full working code and further reading.
  • Broad, covering all of the topics required on the topic to get productive quickly and bring the techniques to your own projects.

The mini-courses are designed for you to get a quick result. If you would like more information or fuller code examples on the topic then you can purchase the related Ebook.

The book “Master Machine Learning Algorithms” is for programmers and non-programmers alike. It teaches you how 10 top machine learning algorithms work, with worked examples in arithmetic, and spreadsheets, not code. The focus is on an understanding on how each model learns and makes predictions.

The book “Machine Learning Algorithms From Scratch” is for programmers that learn by writing code to understand. It provides step-by-step tutorials on how to implement top algorithms as well as how to load data, evaluate models and more. It has less on how the algorithms work, instead focusing exclusively on how to implement each in code.

The two books can support each other.

The books are a concentrated and more convenient version of what I put on the blog.

I design my books to be a combination of lessons and projects to teach you how to use a specific machine learning tool or library and then apply it to real predictive modeling problems.

The books get updated with bug fixes, updates for API changes and the addition of new chapters, and these updates are totally free.

I do put some of the book chapters on the blog as examples, but they are not tied to the surrounding chapters or the narrative that a book offers and do not offer the standalone code files.

With each book, you also get all of the source code files used in the book that you can use as recipes to jump-start your own predictive modeling problems.

My books are playbooks. Not textbooks.

They have no deep explanations of theory, just working examples that are laser-focused on the information that you need to know to bring machine learning to your project.

There is little math, no theory or derivations.

My readers really appreciate the top-down, rather than bottom-up approach used in my material. It is the one aspect I get the most feedback about.

My books are not for everyone, they are carefully designed for practitioners that need to get results, fast.

A code file is provided for each example presented in the book.

Dataset files used in each chapter are also provided with the book.

The code and dataset files are provided as part of your .zip download in a code/ subdirectory. Code and datasets are organized into subdirectories, one for each chapter that has a code example.

If you have misplaced your .zip download, you can contact me and I can send an updated purchase receipt email with a link to download your package.

Ebooks can be purchased from my website directly.

  1. First, find the book or bundle that you wish to purchase, you can see the full catalog here:
    1. Machine Learning Mastery Books
  2. Click on the book or bundle that you would like to purchase to go to the book’s details page.
  3. Click the “Buy Now” button for the book or bundle to go to the shopping cart page.
  4. Fill in the shopping cart with your details and payment details, and click the “Place Order” button.
  5. After completing the purchase you will be emailed a link to download your book or bundle.

All prices are in US dollars (USD).

Books can be purchased with PayPal or Credit Card.

All prices on Machine Learning Mastery are in US dollars.

Payments can be made by using either PayPal or a Credit Card that supports international payments (e.g. most credit cards).

You do not have to explicitly convert money from your currency to US dollars.

Currency conversion is performed automatically when you make a payment using PayPal or Credit Card.

After filling out and submitting your order form, you will be able to download your purchase immediately.

Your web browser will be redirected to a webpage where you can download your purchase.

You will also receive an email with a link to download your purchase.

If you lose the email or the link in the email expires, contact me and I will resend the purchase receipt email with an updated download link.

After you complete your purchase you will receive an email with a link to download your bundle.

The download will include the book or books and any bonus material.

To use a discount code, also called an offer code, or discount coupon when making a purchase, follow these steps:

1. Enter the discount code text into the field named “Discount Coupon” on the checkout page.

 

Note, if you don’t see a field called “Discount Coupon” on the checkout page, it means that that product does not support discounts.

2. Click the “Apply” button.

3. You will then see a message that the discount was applied successfully to your order.

 

 

Note, if the discount code that you used is no longer valid, you will see a message that the discount was not successfully applied to your order.

 

There are no physical books, therefore no shipping is required.

All books are EBooks that you can download immediately after you complete your purchase.

I recommend reading one chapter per day.

Momentum is important.

Some readers finish a book in a weekend.

Most readers finish a book in a few weeks by working through it during nights and weekends.

You will get your book immediately.

After you complete and submit the payment form, you will be immediately redirected to a webpage with a link to download your purchase.

You will also immediately be sent an email with a link to download your purchase.

What order should you read the books?

That is a great question, my best suggestions are as follows:

  • Consider starting with a book on a topic that you are most excited about.
  • Consider starting with a book on a topic that you can apply on a project immediately.

Also, consider that you don’t need to read all of the books, perhaps a subset of the books will get you the skills you need or want.

Nevertheless, one suggested order for reading the books is as follows:

    1. Probability for Machine Learning
    2. Statistical Methods for Machine Learning
    3. Linear Algebra for Machine Learning
    4. Optimization for Machine Learning
    5. Calculus for Machine Learning
    6. The Beginner’s Guide to Data Science
    7. Master Machine Learning Algorithms
    8. Machine Learning Algorithms From Scratch
    9. Python for Machine Learning
    10. Machine Learning Mastery With Weka
    11. Machine Learning Mastery With Python
    12. Machine Learning Mastery With R
    13. Data Preparation for Machine Learning
    14. Imbalanced Classification With Python
    15. Time Series Forecasting With Python
    16. Ensemble Learning Algorithms With Python
    17. XGBoost With Python
    18. Deep Learning With Python
    19. Deep Learning with PyTorch
    20. Long Short-Term Memory Networks with Python
    21. Deep Learning for Natural Language Processing
    22. Deep Learning for Computer Vision
    23. Machine Learning in Open CV
    24. Deep Learning for Time Series Forecasting
    25. Better Deep Learning
    26. Generative Adversarial Networks with Python
    27. Building Transformer Models with Attention
    28. Productivity with ChatGPT (this book can be read in any order)

I hope that helps.

Sorry, I do not have a license to purchase my books or bundles for libraries.

The books are for individual use only.

Generally, no.

Multi-seat licenses create a bit of a maintenance nightmare for me, sorry. It takes time away from reading, writing and helping my readers.

If you have a big order, such as for a class of students or a large team, please contact me and we will work something out.

I update the books frequently and you can access the latest version of a book at any time.

In order to get the latest version of a book, contact me directly with your order number or purchase email address and I can resend your purchase receipt email with an updated download link.

I do not maintain a public change log or errata for the changes in the book, sorry.

There are no physical books, therefore no delivery is required.

All books are Ebooks in PDF format that you can download immediately after you complete your purchase.

You will receive an email with a link to download your purchase. You can also contact me any time to get a new download link.

I support purchases from any country via PayPal or Credit Card.

My best advice is to start with a book on a topic that you can use immediately.

Baring that, pick a topic that interests you the most.

If you are unsure, perhaps try working through some of the free tutorials to see what area that you gravitate towards.

Generally, I recommend focusing on the process of working through a predictive modeling problem end-to-end:

I have three books that show you how to do this, with three top open source platforms:

These are great places to start.

You can always circle back and pick-up a book on algorithms later to learn more about how specific methods work in greater detail.

Thanks for your interest.

You can see the full catalog of my books and bundles here:

Thanks for asking.

I try not to plan my books too far into the future. I try to write about the topics that I am asked about the most or topics where I see the most misunderstanding.

If you would like me to write more about a topic, I would love to know.

Contact me directly and let me know the topic and even the types of tutorials you would love for me to write.

Contact me and let me know the email address (or email addresses) that you think you used to make purchases.

I can look up what purchases you have made and resend purchase receipts to you so that you can redownload your books and bundles.

All prices are in US Dollars (USD).

All currency conversion is handled by PayPal for PayPal purchases, or by Stripe and your bank for credit card purchases.

It is possible that your link to download your purchase will expire after a few days.

This is a security precaution.

Please contact me and I will resend you purchase receipt with an updated download link.

The book “Deep Learning With Python” could be a prerequisite to”Long Short-Term Memory Networks with Python“. It teaches you how to get started with Keras and how to develop your first MLP, CNN and LSTM.

The book “Long Short-Term Memory Networks with Python” goes deep on LSTMs and teaches you how to prepare data, how to develop a suite of different LSTM architectures, parameter tuning, updating models and more.

Both books focus on deep learning in Python using the Keras library.

The book “Long Short-Term Memory Networks in Python” focuses on how to develop a suite of different LSTM networks for sequence prediction, in general.

The book “Deep Learning for Time Series Forecasting” focuses on how to use a suite of different deep learning models (MLPs, CNNs, LSTMs, and hybrids) to address a suite of different time series forecasting problems (univariate, multivariate, multistep and combinations).

The LSTM book teaches LSTMs only and does not focus on time series. The Deep Learning for Time Series book focuses on time series and teaches how to use many different models including LSTMs.

The book “Long Short-Term Memory Networks With Python” focuses on how to implement different types of LSTM models.

The book “Deep Learning for Natural Language Processing” focuses on how to use a variety of different networks (including LSTMs) for text prediction problems.

The LSTM book can support the NLP book, but it is not a prerequisite.

You may need a business or corporate tax number for “Machine Learning Mastery“, the company, for your own tax purposes. This is common in EU companies for example.

The Machine Learning Mastery company is operated out of Puerto Rico.

As such, the company does not have a VAT identification number for the EU or similar for your country or regional area.

The company does have a Company Number. The details are as follows:

  • Company Name: Zeus LLC
  • Company Number: 421867-1511

Linux, MacOS, and Windows.

There are no code examples in “Master Machine Learning Algorithms“, therefore no programming language is used.

Algorithms are described and their working is summarized using basic arithmetic. The algorithm behavior is also demonstrated in excel spreadsheets, that are available with the book.

It is a great book for learning how algorithms work, without getting side-tracked with theory or programming syntax.

If you are interested in learning about machine learning algorithms by coding them from scratch (using the Python programming language), I would recommend a different book:

I write the content for the books (words and code) using a text editor, specifically sublime.

I typeset the books and create a PDF using LaTeX.

All of the books have been tested and work with Python 3 (e.g. 3.5 or 3.6).

Most of the books have also been tested and work with Python 2.7.

Where possible, I recommend using the latest version of Python 3.

After you fill in the order form and submit it, two things will happen:

  1. You will be redirected to a webpage where you can download your purchase.
  2. You will be sent an email (to the email address used in the order form) with a link to download your purchase.

The redirect in the browser and the email will happen immediately after you complete the purchase.

You can download your purchase from either the webpage or the email.

If you cannot find the email, perhaps check other email folders, such as the “spam” folder?

If you have any concerns, contact me and I can resend your purchase receipt email with the download link.

I do test my tutorials and projects on the blog first. It’s like the early access to ideas, and many of them do not make it to my training.

Much of the material in the books appeared in some form on my blog first and is later refined, improved and repackaged into a chapter format. I find this helps greatly with quality and bug fixing.

The books provide a more convenient packaging of the material, including source code, datasets and PDF format. They also include updates for new APIs, new chapters, bug and typo fixing, and direct access to me for all the support and help I can provide.

I believe my books offer thousands of dollars of education for tens of dollars each.

They are months if not years of experience distilled into a few hundred pages of carefully crafted and well-tested tutorials.

I think they are a bargain for professional developers looking to rapidly build skills in applied machine learning or use machine learning on a project.

Also, what are skills in machine learning worth to you? to your next project? and you’re current or next employer?

Nevertheless, the price of my books may appear expensive if you are a student or if you are not used to the high salaries for developers in North America, Australia, UK and similar parts of the world. For that, I am sorry.

Discounts

I do offer discounts to students, teachers and retirees.

Please contact me to find out more.

Free Material

I offer a ton of free content on my blog, you can get started with my best free material here:

About my Books

My books are playbooks.

They are intended for developers who want to know how to use a specific library to actually solve problems and deliver value at work.

  • My books guide you only through the elements you need to know in order to get results.
  • My books are in PDF format and come with code and datasets, specifically designed for you to read and work-through on your computer.
  • My books give you direct access to me via email (what other books offer that?)
  • My books are a tiny business expense for a professional developer that can be charged to the company and is tax deductible in most regions.

Very few training materials on machine learning are focused on how to get results.

The vast majority are about repeating the same math and theory and ignore the one thing you really care about: how to use the methods on a project.

Comparison to Other Options

Let me provide some context for you on the pricing of the books:

There are free videos on youtube and tutorials on blogs.

There are very cheap video courses that teach you one or two tricks with an API.

  • My books teach you how to use a library to work through a project end-to-end and deliver value, not just a few tricks

A textbook on machine learning can cost $50 to $100.

  • All of my books are cheaper than the average machine learning textbook, and I expect you may be more productive, sooner.

A bootcamp or other in-person training can cost $1000+ dollars and last for days to weeks.

  • A bundle of all of my books is far cheaper than this, they allow you to work at your own pace, and the bundle covers more content than the average bootcamp.

Sorry, my books are not available on websites like Amazon.com.

I carefully decided to not put my books on Amazon for a number of reasons:

  • Amazon takes 65% of the sale price of self-published books, which would put me out of business.
  • Amazon offers very little control over the sales page and shopping cart experience.
  • Amazon does not allow me to contact my customers via email and offer direct support and updates.
  • Amazon does not allow me to deliver my book to customers as a PDF, the preferred format for my customers to read on the screen.

I hope that helps you understand my rationale.

I am sorry to hear that you’re having difficulty purchasing a book or bundle.

I use Stripe for Credit Card and PayPal services to support secure and encrypted payment processing on my website.

Some common problems when customers have a problem include:

  • Perhaps you can double check that your details are correct, just in case of a typo?
  • Perhaps you could try a different payment method, such as PayPal or Credit Card?
  • Perhaps you’re able to talk to your bank, just in case they blocked the transaction?

I often see customers trying to purchase with a domestic credit card or debit card that does not allow international purchases. This is easy to overcome by talking to your bank.

If you’re still having difficulty, please contact me and I can help investigate further.

When you purchase a book from my website and later review your bank statement, it is possible that you may see an additional small charge of one or two dollars.

The charge does not come from my website or payment processor.

Instead, the charge was added by your bank, credit card company, or financial institution. It may be because your bank adds an additional charge for online or international transactions.

This is rare but I have seen this happen once or twice before, often with credit cards used by enterprise or large corporate institutions.

My advice is to contact your bank or financial institution directly and ask them to explain the cause of the additional charge.

If you would like a copy of the payment transaction from my side (e.g. a screenshot from the payment processor), or a PDF tax invoice, please contact me directly.

I give away a lot of content for free. Most of it in fact.

It is important to me to help students and practitioners that are not well off, hence the enormous amount of free content that I provide.

You can access the free content:

I have thought very hard about this and I sell machine learning Ebooks for a few important reasons:

  • I use the revenue to support the site and all the non-paying customers.
  • I use the revenue to support my family so that I can continue to create content.
  • Practitioners that pay for tutorials are far more likely to work through them and learn something.
  • I target my books towards working professionals that are more likely to afford the materials.

Yes.

All updates to the book or books in your purchase are free.

Books are usually updated once every few months to fix bugs, typos and keep abreast of API changes.

Contact me anytime and check if there have been updates. Let me know what version of the book you have (version is listed on the copyright page).

Yes.

Please contact me anytime with questions about machine learning or the books.

One question at a time please.

Also, each book has a final chapter on getting more help and further reading and points to resources that you can use to get more help.

Yes, the books can help you get a job, but indirectly.

Getting a job is up to you.

It is a matching problem between an organization looking for someone to fill a role and you with your skills and background.

That being said, there are companies that are more interested in the value that you can provide to the business than the degrees that you have. Often, these are smaller companies and start-ups.

You can focus on providing value with machine learning by learning and getting very good at working through predictive modeling problems end-to-end. You can show this skill by developing a machine learning portfolio of completed projects.

My books are specifically designed to help you toward these ends. They teach you exactly how to use open source tools and libraries to get results in a predictive modeling project.

Need More Help?

I’m here to help you become awesome at applied machine learning.

If you still have questions and need help, you have some options: