The post What is a Hypothesis in Machine Learning? appeared first on Machine Learning Mastery.

]]>This description is characterized as searching through and evaluating candidate hypothesis from hypothesis spaces.

The discussion of hypotheses in machine learning can be confusing for a beginner, especially when “*hypothesis*” has a distinct, but related meaning in statistics (e.g. statistical hypothesis testing) and more broadly in science (e.g. scientific hypothesis).

In this post, you will discover the difference between a hypothesis in science, in statistics, and in machine learning.

After reading this post, you will know:

- A scientific hypothesis is a provisional explanation for observations that is falsifiable.
- A statistical hypothesis is an explanation about the relationship between data populations that is interpreted probabilistically.
- A machine learning hypothesis is a candidate model that approximates a target function for mapping inputs to outputs.

Let’s get started.

This tutorial is divided into four parts; they are:

- What Is a Hypothesis?
- Hypothesis in Statistics
- Hypothesis in Machine Learning
- Review of Hypothesis

A hypothesis an explanation for something.

It is a provisional idea, an educated guess that requires some evaluation.

A good hypothesis is testable; it can be either true or false.

In science, a hypothesis must be falsifiable, meaning that there exists a test whose outcome could mean that the hypothesis is not true. The hypothesis must also be framed before the outcome of the test is known.

… not any hypothesis will do. There is one fundamental condition that any hypothesis or system of hypotheses must satisfy if it is to be granted the status of a scientific law or theory. If it is to form part of science, an hypothesis must be falsifiable.

— Pages 61-62, What Is This Thing Called Science?, Third Edition, 1999.

A good hypothesis fits the evidence and can be used to make predictions about new observations or new situations.

The hypothesis that best fits the evidence and can be used to make predictions is called a theory, or is part of a theory.

**Hypothesis in Science**: Provisional explanation that fits the evidence and can be confirmed or disproved.

Much of statistics is concerned with the relationship between observations.

Statistical hypothesis tests are techniques used to calculate a critical value called an “*effect*.” The critical value can then be interpreted in order to determine how likely it is to observe the effect if a relationship does not exist.

If the likelihood is very small, then it suggests that the effect is probably real. If the likelihood is large, then we may have observed a statistical fluctuation, and the effect is probably not real.

For example, we may be interested in evaluating the relationship between the means of two samples, e.g. whether the samples were drawn from the same distribution or not, whether there is a difference between them.

One hypothesis is that there is no difference between the population means, based on the data samples.

This is a hypothesis of no effect and is called the null hypothesis and we can use the statistical hypothesis test to either reject this hypothesis, or fail to reject (retain) it. We don’t say “accept” because the outcome is probabilistic and could still be wrong, just with a very low probability.

… we develop a hypothesis and establish a criterion that we will use when deciding whether to retain or reject our hypothesis. The primary hypothesis of interest in social science research is the null hypothesis

— Pages 64-65, Statistics In Plain English, Third Edition, 2010.

If the null hypothesis is rejected, then we assume the alternative hypothesis that there exists some difference between the means.

**Null Hypothesis (H0)**: Suggests no effect.**Alternate Hypothesis (H1)**: Suggests some effect.

Statistical hypothesis tests don’t comment on the size of the effect, only the likelihood of the presence or absence of the effect in the population, based on the observed samples of data.

**Hypothesis in Statistics**: Probabilistic explanation about the presence of a relationship between observations.

Machine learning, specifically supervised learning, can be described as the desire to use available data to learn a function that best maps inputs to outputs.

Technically, this is a problem called function approximation, where we are approximating an unknown target function (that we assume exists) that can best map inputs to outputs on all possible observations from the problem domain.

An example of a model that approximates the target function and performs mappings of inputs to outputs is called a hypothesis in machine learning.

The choice of algorithm (e.g. neural network) and the configuration of the algorithm (e.g. network topology and hyperparameters) define the space of possible hypothesis that the model may represent.

Learning for a machine learning algorithm involves navigating the chosen space of hypothesis toward the best or a good enough hypothesis that best approximates the target function.

Learning is a search through the space of possible hypotheses for one that will perform well, even on new examples beyond the training set.

— Page 695, Artificial Intelligence: A Modern Approach, Second Edition, 2009.

This framing of machine learning is common and helps to understand the choice of algorithm, the problem of learning and generalization, and even the bias-variance trade-off. For example, the training dataset is used to learn a hypothesis and the test dataset is used to evaluate it.

A common notation is used where lowercase-h (*h*) represents a given specific hypothesis and uppercase-h (*H*) represents the hypothesis space that is being searched.

**h (**: A single hypothesis, e.g. an instance or specific candidate model that maps inputs to outputs and can be evaluated and used to make predictions.*hypothesis*)**H (**: A space of possible hypotheses for mapping inputs to outputs that can be searched, often constrained by the choice of the framing of the problem, the choice of model and the choice of model configuration.*hypothesis set*)

The choice of algorithm and algorithm configuration involves choosing a hypothesis space that is believed to contain a hypothesis that is a good or best approximation for the target function. This is very challenging, and it is often more efficient to spot-check a range of different hypothesis spaces.

We say that a learning problem is realizable if the hypothesis space contains the true function. Unfortunately, we cannot always tell whether a given learning problem is realizable, because the true function is not known.

— Page 697, Artificial Intelligence: A Modern Approach, Second Edition, 2009.

It is a hard problem and we choose to constrain the hypothesis space both in terms of size and in terms of the complexity of the hypotheses that are evaluated in order to make the search process tractable.

There is a tradeoff between the expressiveness of a hypothesis space and the complexity of finding a good hypothesis within that space.

— Page 697, Artificial Intelligence: A Modern Approach, Second Edition, 2009.

**Hypothesis in Machine Learning**: Candidate model that approximates a target function for mapping examples of inputs to outputs.

We can summarize the three definitions again as follows:

**Hypothesis in Science**: Provisional explanation that fits the evidence and can be confirmed or disproved.**Hypothesis in Statistics**: Probabilistic explanation about the presence of a relationship between observations.**Hypothesis in Machine Learning**: Candidate model that approximates a target function for mapping examples of inputs to outputs.

We can see that a hypothesis in machine learning draws upon the definition of a hypothesis more broadly in science.

Just like a hypothesis in science is an explanation that covers available evidence, is falsifiable and can be used to make predictions about new situations in the future, a hypothesis in machine learning has similar properties.

A hypothesis in machine learning:

**Covers the available evidence**: the training dataset.**Is falsifiable (kind-of)**: a test harness is devised beforehand and used to estimate performance and compare it to a baseline model to see if is skillful or not.**Can be used in new situations**: make predictions on new data.

Did this post clear up your questions about what a hypothesis is in machine learning?

Let me know in the comments below.

This section provides more resources on the topic if you are looking to go deeper.

- What Is This Thing Called Science?, Third Edition, 1999.
- Statistics In Plain English, Third Edition, 2010.
- Artificial Intelligence: A Modern Approach, Second Edition, 2009.
- Machine Learning, 1997.

- A Gentle Introduction to Applied Machine Learning as a Search Problem
- A Gentle Introduction to Statistical Hypothesis Tests
- Critical Values for Statistical Hypothesis Testing and How to Calculate Them in Python
- 15 Statistical Hypothesis Tests in Python (Cheat Sheet)

- What is hypothesis in machine learning?, Quora.
- What exactly is a hypothesis space in the context of Machine Learning?, Cross Validated.
- What is Hypothesis set in Machine Learning?, Cross Validated.

In this post, you discovered the difference between a hypothesis in science, in statistics, and in machine learning.

Specifically, you learned:

- A scientific hypothesis is a provisional explanation for observations that is falsifiable.
- A statistical hypothesis is an explanation about the relationship between data populations that is interpreted probabilistically.
- A machine learning hypothesis is a candidate model that approximates a target function for mapping inputs to outputs.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post What is a Hypothesis in Machine Learning? appeared first on Machine Learning Mastery.

]]>The post Analytical vs Numerical Solutions in Machine Learning appeared first on Machine Learning Mastery.

]]>- What data is best for my problem?
- What algorithm is best for my data?
- How do I best configure my algorithm?

Why can’t a machine learning expert just give you a straight answer to your question?

In this post, I want to help you see why no one can ever tell you what algorithm to use or how to configure it for your specific dataset.

I want to help you see that finding good data/algorithm/configuration is in fact the hard part of applied machine learning and the only part you need to focus on solving.

Let’s get started.

In mathematics, some problems can be solved analytically and numerically.

- An analytical solution involves framing the problem in a well-understood form and calculating the exact solution.
- A numerical solution means making guesses at the solution and testing whether the problem is solved well enough to stop.

An example is the square root that can be solved both ways.

We prefer the analytical method in general because it is faster and because the solution is exact. Nevertheless, sometimes we must resort to a numerical method due to limitations of time or hardware capacity.

A good example is in finding the coefficients in a linear regression equation that can be calculated analytically (e.g. using linear algebra), but can be solved numerically when we cannot fit all the data into the memory of a single computer in order to perform the analytical calculation (e.g. via gradient descent).

Sometimes, the analytical solution is unknown and all we have to work with is the numerical approach.

Many problems have well-defined solutions that are obvious once the problem has been defined.

A set of logical steps that we can follow to calculate an exact outcome.

For example, you know what operation to use given a specific arithmetic task such as addition or subtraction.

In linear algebra, there are a suite of methods that you can use to factorize a matrix, depending on if the properties of your matrix are square, rectangular, contain real or imaginary values, and so on.

We can stretch this more broadly to software engineering, where there are problems that turn up again and again that can be solved with a pattern of design that is known to work well, regardless of the specifics of your application. Such as the visitor pattern for performing an operation on each item in a list.

Some problems in applied machine learning are well defined and have an analytical solution.

For example, the method for transforming a categorical variable into a one hot encoding is simple, repeatable and (practically) always the same methodology regardless of the number of integer values in the set.

Unfortunately, most of the problems that we care about solving in machine learning do not have analytical solutions.

There are many problems that we are interested in that do not have exact solutions.

Or at least, analytical solutions that we have figured out yet.

We have to make guesses at solutions and test them to see how good the solution is. This involves framing the problem and using trial and error across a set of candidate solutions.

In essence, the process of finding a numerical solution can be described as a search.

These types of solutions have some interesting properties:

- We often easily can tell a good solution from a bad solution.
- We often don’t objectively know what a “
*good*” solution looks like; we can only compare the goodness between candidate solutions that we have tested. - We are often satisfied with an approximate or “
*good enough*” solution rather than the single best solution.

This last point is key, because often the problems that we are trying to solve with numerical solutions are challenging (as we have no easy way to solve them), where any “*good enough*” solution would be useful. It also highlights that there are many solutions to a given problem and even that many of them may be good enough to be usable.

Most of the problems that we are interested in solving in applied machine learning require a numerical solution.

It’s worse than this.

The numerical solutions to each sub-problem along the way influences the space of possible solutions for subsequent sub-problems.

Applied machine learning is a numerical discipline.

The core of a given machine learning model is an optimization problem, which is really a search for a set of terms with unknown values needed to fill an equation. Each algorithm has a different “*equation*” and “*terms*“, using this terminology loosely.

The equation is easy to calculate in order to make a prediction for a given set of terms, but we don’t know the terms to use in order to get a “*good*” or even “*best*” set of predictions on a given set of data.

This is the numerical optimization problem that we always seek to solve.

It’s numerical, because we are trying to solve the optimization problem with noisy, incomplete, and error-prone limited samples of observations from our domain. The model is trying hard to interpret the data and create a map between the inputs and the outputs of these observations.

The numerical optimization problem at the core of a chosen machine learning algorithm is nested in a broader problem.

The specific optimization problem is influenced by many factors, all of which greatly contribute to the “*goodness*” of the ultimate solution, and all of which do not have analytical solutions.

For example:

- What data to use.
- How much data to use.
- How to treat the data prior to modeling.
- What modeling algorithm or algorithms to use.
- How to configure the algorithms
- How to evaluate machine learning algorithms.

Objectively, these are all part of the open problem that your specific predictive modeling machine learning problem represents.

There is no analytical solution; you must discover what combination of these elements works best for your specific problem.

It is one big search problem where combinations of elements are trialed and evaluated.

Where you only really know what a good score is relative to the scores of other candidate solutions that you have tried.

Where there is no objective path through this maze other than trial and error and perhaps borrowing ideas from other related problems that do have known “*good enough*” solutions.

This great empirical approach to applied machine learning is often referred to as “*machine learning as search*” and is described further in the post:

This is also covered in the post:

We bring this back to the specific question you have.

The question of what data, algorithm, or configuration will work best for your specific predictive modeling problem.

No one can look at your data or a description of your problem and tell you how to solve it best, or even well.

Experience may inform an expert on areas to start looking, and some of those early guesses may pay off, but more often than not, early guesses are too complicated or plain wrong.

A predictive modeling problem must be worked in order to find a good-enough solution and it is your job as the machine learning practitioner to work it.

This is the hard work of applied machine learning and it is the area to practice and get good at to be considered competent in the field.

This section provides more resources on the topic if you are looking to go deeper.

- A Data-Driven Approach to Choosing Machine Learning Algorithms
- A Gentle Introduction to Applied Machine Learning as a Search Problem
- Why Applied Machine Learning Is Hard
- What’s the difference between analytical and numerical approaches to problems?

In this post, you discovered the difference between analytical and numerical solutions and the empirical nature of applied machine learning.

Specifically, you learned:

- Analytical solutions are logical procedures that yield an exact solution.
- Numerical solutions are trial-and-error procedures that are slower and result in approximate solutions.
- Applied Machine learning has a numerical solution at the core with an adjusted mindset in order to choose data, algorithms, and configurations for a specific predictive modeling problem.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post Analytical vs Numerical Solutions in Machine Learning appeared first on Machine Learning Mastery.

]]>The post Machine Learning Development Environment appeared first on Machine Learning Mastery.

]]>A few times a week, I get a question such as:

What is your development environment for machine learning?

In this post, you will discover the development environment that I use and recommend for applied machine learning for developers.

After reading this post, you will know:

- The important distinctions between the role of workstation and server hardware in machine learning.
- How to ensure that your machine learning dependencies are installed and updated in a repeatable manner.
- How to develop machine learning code and run it in a safe way that does not introduce new issues.

Let’s get started.

What does your machine learning development environment look like?

Let me know in the comments below.

Whether you are learning machine learning or are developing large models for operations, your workstation hardware does not matter that much.

Here’s why:

I do not recommend that you fit large models on your workstation.

Machine learning development involves lots of small tests to figure out preliminary answers to questions such as:

- What data to use.
- How to prepare data.
- What models to use.
- What configuration to use.

Ultimately, your goal on your workstation is to figure out what experiments to run. I call this preliminary experiments. For your preliminary experiments, use less data: a small sample that will fit within your hardware capabilities.

Larger experiments take minutes, hours, or even days to complete. They should be run on large hardware other than your workstation.

This may be a server environment, perhaps with GPU hardware if you are using deep learning methods. This hardware may be provided by your employer or you can rent it cheaply in the cloud, such as AWS.

It is true that the faster (CPU) your workstation is and the more capacity (RAM) your workstation has, the more or larger preliminary small experiments you can run and the more you can get out of your larger experiments. So, get the best hardware you can, but in general, work with what you have got.

I myself like large Linux boxes with lots of RAM and lots of cores for serious R&D. For everyday work, I like an iMac, again with as many cores and as much RAM as I can get.

In summary:

**Workstation**. Work with a small sample of your data and figure out what large experiments to run.**Server(s)**. Run large experiments that take hours or days and help you figure out what model to use in operations.

You must install the library dependencies you have for machine learning development.

This is mainly the libraries you are using.

In Python, this may be Pandas, scikit-learn, Keras, and more. In R, this is all the packages and perhaps caret.

More than just installing the dependencies, you should have a repeatable process so that you can set-up the development environment again in seconds, such as on new workstations and on new servers.

I recommend using a package manager and a script, such as a shell script to install everything.

On my iMac, I use macports to manage installed packages. I think have two scripts: one to install all the packages I require on a new mac (such as after an upgrade of workstation or laptop) and another script specifically to update the installed packages.

Libraries are always being updated with bug fixes, so this second script to update the specifically installed libraries (and their dependencies) is key.

These are shell scripts that I can run at any time and that I keep updated as I need to install new libraries.

If you need help setting up your environment, one of these tutorials may help:

- How to Setup a Python Environment for Machine Learning and Deep Learning with Anaconda
- How to Install a Python 3 Environment on Mac OS X for Machine Learning and Deep Learning
- How to Create a Linux Virtual Machine For Machine Learning Development With Python 3

You may wish to take things to the next level in terms of having a repeatable environment, such as using a container such as Docker or maintaining your own virtualized instance.

In summary:

**Install Script**. Maintain a script that you can use to reinstall everything needed for your development environment.**Update Script**. Maintain a script to update all key dependencies for machine learning development and run it periodically.

I recommend a very simple editing environment.

The hard work with machine learning development is not writing code; it is instead dealing with the unknowns already mentioned. Unknowns such as:

- What data to use.
- How to prepare the data.
- What algorithm/s to use.
- What configurations to use.

Writing code is the easy part, especially because you are very likely to use an existing algorithm implementation from a modern machine learning library.

For this reason, you do not need a fancy IDE; it will not help you get answers to these questions.

Instead, I recommend using a very simple text editor that offers basic code highlighting.

Personally, I use and recommend Sublime Text, but any similar text editor will work just as well.

Some developers like to use notebooks, such as Jupyter. I do not use or recommend them as I have found that these environments to be challenging for development; they can hide errors and introduce dependency strangeness for development.

For studying machine learning and for machine learning development, I recommend writing scripts or code that can be run directly from the command line or from a shell script.

For example, R scripts and Python scripts can be run directly using the respective interpreter.

For more advice on how to run experiments from the command line, see the post:

Once you have a finalized model (or set of predictions), you can integrate it into your application using your standard development tools for your project.

This section provides more resources on the topic if you are looking to go deeper.

- Computer Hardware for Machine Learning
- How To Develop and Evaluate Large Deep Learning Models with Keras on Amazon Web Services
- How to Setup a Python Environment for Machine Learning and Deep Learning with Anaconda
- How to Install a Python 3 Environment on Mac OS X for Machine Learning and Deep Learning
- How to Create a Linux Virtual Machine For Machine Learning Development With Python 3
- How to Run Deep Learning Experiments on a Linux Server

In this post, you discovered the hardware, dependencies, and editor to use for machine learning development.

Specifically, you learned:

- The important distinctions between the role of workstation and server hardware in machine learning.
- How to ensure that your machine learning dependencies are installed and updated in a repeatable manner.
- How to develop machine learning code and run it in a safe way that does not introduce new issues.

What does your machine learning development environment look like?

Let me know in the comments below.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post Machine Learning Development Environment appeared first on Machine Learning Mastery.

]]>The post How to Think About Machine Learning appeared first on Machine Learning Mastery.

]]>You can achieve impressive results with machine learning and find solutions to very challenging problems. But this is only a small corner of the broader field of machine learning often called predictive modeling or predictive analytics.

In this post, you will discover how to change the way you think about machine learning in order to best serve you as a machine learning practitioner.

After reading this post, you will know:

- What machine learning is and how it relates to artificial intelligence and statistics.
- The corner of machine learning that you should focus on.
- How to think about your problem and the machine learning solution to your problem.

Let’s get started.

This post is divided into 3 parts; they are:

- You’re Confused
- What is Machine Learning?
- Your Machine Learning

You have a machine learning problem to solve, but you’re confused about what exactly machine learning is.

There’s good reason to be confused. It is confusing to beginners.

Machine learning is a large field of study, and not all much of it is going to be relevant to you if you’re focused on solving a problem.

In this post, I hope to clear things up for you.

We will start off by describing machine learning in the broadest terms and how it relates to other fields of study like statistics and artificial intelligence.

After that, we will zoom in on the aspects of machine learning that you really need to know about for practical engineering and problem solving.

Machine learning is a field of computer science concerned with programs that learn.

The field of machine learning is concerned with the question of how to construct computer programs that automatically improve with experience.

— Machine Learning, 1997.

That is super broad.

There are many types of learning, many types of feedback to learn from, and many things that can be learned.

This could encompass diverse types of learning, such as:

- Developing code to investigate how populations of organisms “learn” to adapt to their environment over evolutionary time.
- Developing code to investigate how one neuron in the brain “learns” in response to stimulus from other neurons.
- Developing code to investigate how ants “learn” the optimal path from their home to their food source.

I give these esoteric examples on purpose to help you really nail down that machine learning is a broad and far reaching program of research.

Another case that you may be more familiar with is:

- Developing code to investigate how to “learn” patterns in historical data.

This is less glamorous, but is the basis of the small corner of machine learning in which we as practitioners are deeply interested.

This corner is not distinct from the other examples; there can be a lot of overlap in methods for learning, fundamental tasks, ways of evaluating learning, and so on.

Machine learning is a subfield of artificial intelligence.

It overlaps with machine learning.

Artificial intelligence is also an area of computer science, but it is concerned with developing programs that are intelligent, or can do intelligent things.

Intelligence involves learning, e.g. machine learning, but may involve other concerns such as reasoning, planning, memory, and much more.

This could encompass diverse types of learning such as:

- Developing code to investigate how to optimally plan logistics.
- Developing code to investigate how to reason about a paragraph of text.
- Developing code to investigate how to perceive the contents of a photograph.

Artificial intelligence is often framed in the context of an agent in an environment with the intent to address some problem, but this does not have to be the case.

Machine learning could just as easily be named artificial learning to remain consistent with artificial intelligence and help out beginners.

The lines are blurry. Machine learning problems are also artificial intelligence problems.

Statistics, or applied statistics with computers, is a sub-field of mathematics that is concerned with describing and understanding the relationships in data.

This could encompass diverse types of learning such as:

- Developing models to summarize the distribution of a variable.
- Developing models to best characterize the relationship between two variables.
- Developing models to test the similarity between two populations of observations.

It also overlaps with the corner of machine learning interested in learning patterns in data.

Many methods used for understanding data in statistics can be used in machine learning to learn patterns in data. These tasks could be called machine learning or applied statistics.

Machine learning is a large field of study, and it can help you solve specific problems.

But you don’t need to know about all of it.

- You’re not an academic investigating an esoteric type of learning as in machine learning.
- You’re not trying to make an intelligent agent as in artificial intelligence.
- You’re not interested in learning more about why variables relate to each other in data as in statistics.

In fact, when it comes to learning relationships in data:

- You’re not investigating the capabilities of an algorithm.
- You’re not developing an entirely new theory or algorithm.
- You’re not extending an existing machine learning algorithm to new cases.

These may be activities in the corner of machine learning that we may be interested in, but activities for academics, not practitioners like you.

**So what parts of machine learning do you need to focus on?**

I think there are two ways to think about machine learning:

- In terms of the problem you are trying to solve.
- In terms of the solution you require.

Your problem can best be described as the following:

Find a model or procedure that makes best use of historical data comprised of inputs and outputs in order to skillfully predict outputs given new and unseen inputs in the future.

This is super specific.

First of all, it discards entire sub-fields of machine learning, such as unsupervised learning, to focus on one type of learning called supervised learning and all the algorithms that fit into that bucket.

That does not mean that you cannot leverage unsupervised methods; it just means that you do not focus your attention there, at least not to begin with.

Second of all, it gives you a clear objective that dominates all others: that is model skill at the expense of other concerns such as model complexity, model interpretability, and so on.

Again, this does not mean that these are not important, just that they are considered after or in conjunction with model skill.

Thirdly, the framing of your problem this way fits neatly into another field of study called predictive modeling. That is a field of study that borrows methods from machine learning with the objective of developing models that make skillful predictions.

In some areas of business, this area may also be called predictive analytics and encompasses more than just the modeling component to include related activities of gathering and preparing data and deploying and maintaining the model.

More recently, this activity can also be called data science, although that phrase also has connotations of inventing or discovering the problem in addition to working it through to a solution.

I don’t think it matters what you call this activity. But I do think it is important to deeply understand that your interest in and use of machine learning is highly specific and different from some other uses by academics.

It allows you to filter the material you read and the tools you choose in order to stay focused on the problem you’re trying to solve.

The solution you require is best described as the following:

A model or procedure that automatically creates the most likely approximation of the unknown underlying relationship between inputs and associated outputs in historical data.

Again, this is super specific.

You need an automatic method that produces a program or model that you can use to make predictions.

You cannot sit down and write code to solve your problem. It is entirely data-specific and you have a lot of data.

In fact, problems of this type resist top-down hand-coded solutions. If you could sit down and write some if-statements to solve your problem, you would not need a machine learning solution. It would be a programming problem.

The type of machine learning methods that you need will learn the relationship between the inputs and outputs in your historical data.

This framing allows you to think about what that real underlying yet unknown mapping function might look like and how noise, corruption, and sampling of your historical data may impact approximations of this mapping made by different modeling methods.

Without this framing, you will wonder things like:

- Why there isn’t just one super algorithm or set of parameters.
- Why the experts can’t just tell you what algorithm to use.
- Why you can’t achieve a zero error rate with predictions from your model.

It helps you see the ill-defined nature of the predictive modeling problem you’re trying to solve and sets reasonable expectations.

Now that you know how to think about machine learning, the next step is to change the way you think about the process of solving a problem with a machine learning solution.

For a hint, see the post:

This section provides more resources on the topic if you are looking to go deeper.

- Gentle Introduction to Predictive Modeling
- How Machine Learning Algorithms Work
- What is Machine Learning?
- A Gentle Introduction to Applied Machine Learning as a Search Problem
- Where Does Machine Learning Fit In?

In this post, you discovered how to change the way you think about machine learning in order to best serve you as a machine learning practitioner.

Specifically, you learned:

- What machine learning is and how it relates to artificial intelligence and statistics.
- The corner of machine learning that you should focus on.
- How to think about your problem and the machine learning solution to your problem.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post How to Think About Machine Learning appeared first on Machine Learning Mastery.

]]>The post You’re Doing it Wrong. Why Machine Learning Does Not Have to Be So Hard appeared first on Machine Learning Mastery.

]]>This approach involves laying out the topics in an area of study in a logical way with a natural progression in complexity and capability.

The problem is, humans are not robots executing a learning program. We require motivation, excitement, and most importantly, a connection of the topic to tangible results.

Useful skills we use every day like reading, driving, and programming were not learned this way and were in fact learned using an inverted top-down approach. This top-down approach can be used to learn technical subjects directly such as machine learning, which can make you a lot more productive a lot sooner, and be a lot of fun.

In this post, you will discover the concrete difference between the top-down and bottom-up approaches to learning technical material and why this is the approach that practitioners should use to learn machine learning and even related mathematics.

After reading this post, you will know:

- The bottom-up approach used in universities to teach technical subjects and the problems with it.
- How people learn to read, drive, and program in a top-down manner and how the top-down approach works.
- The frame of machine learning and even mathematics using the top-down approach to learning and how to start to make rapid progress as a practitioner.

Let’s get started.

This is an important blog post, because I think it can really help to shake you out of the bottom-up, university-style way of learning machine learning.

This post is divided into seven parts; they are:

- Bottom-Up Learning
- Learning to Read
- Learning to Drive
- Learning to Code
- Top-Down Learning
- Learn Machine Learning
- Learning Mathematics

Take a field of study, such as mathematics.

There is a logical way to lay out the topics in mathematics that build on each other and lead through a natural progression in skills, capability, and understanding.

The problem is, this logical progression might only make sense to those who are already on the other side and can intuit the relationships between the topics.

Most of school is built around this bottom-up natural progression through material. A host of technical and scientific fields of study are taught this way.

Think back to high-school or undergraduate studies and the fundamental fields you may have worked through: examples such as:

- Mathematics, as mentioned.
- Biology.
- Chemistry.
- Physics.
- Computer Science.

Think about how the material was laid out, week-by-week, semester-by-semester, year-by-year. Bottom-up, logical progression.

The problem is, the logical progression through the material may not be the best way to learn the material in order to be productive.

We are not robots executing a learning program. We are emotional humans that need motivation, interest, attention, encouragement, and results.

You can learn technical subjects from the bottom-up, and a small percentage of people do prefer things this way, but it is not the only way.

Now, if you have completed a technical subject, think back to how to you actually learned it. I bet it was not bottom-up.

Think back; how did you learn to read?

My son is starting to read. Without thinking too much, here are the general techniques he’s using (really the school and us as parents):

- Start by being read to in order to generate interest and show benefits.
- Get the alphabet down and making the right sounds.
- Memorize the most frequent words, their sounds, and how to spell them.
- Learn the “spell-out-the-word” heuristic to deal with unknown words.
- Read through books with supervision.
- Read through books without supervision.

It is important that he continually knows why reading is important, connected to very tangible things he wants to do, like:

- Read captions on TV shows.
- Read stories on topics he loves, like Star Wars.
- Read signs and menus when we are out and about.
- So on…

It is also important that he gets results that he can track and in which he can see improvement.

- Larger vocabulary.
- Smoother reading style
- Books of increasing complexity.

Here’s how he did not learn to read:

- Definitions of word types (verbs, nouns, adverbs, etc.)
- Rules of grammar.
- Rules of punctuation.
- Theory of human languages.

Do you drive?

It’s cool if you don’t, but most adults do out of necessity. Society and city design is built around personal mobility.

How did you learn to drive?

I remember some written tests and maybe a test on a computer. I have no memory of studying for them, though I very likely did. Here’s what I do remember.

I remember hiring a driving instructor and doing driving lessons. Every single lesson was practical, in the car, practicing the skill I was required to master, driving the vehicle in traffic.

Here’s what I did not study or discuss with my driving instructor:

- The history of the automobile.
- The theory of combustion engines.
- The common mechanical faults in cars.
- The electrical system of the car.
- The theory of traffic flows.

To this day, I still manage to drive safely without any knowledge on these topics.

In fact, I never expect to learn these topics. I have zero need or interest and they will not help me realize the thing I want and need, which is safe and easy personal mobility.

If the car breaks, I’ll call an expert.

I started programming without any idea of what coding or software engineering meant.

At home, I messed around with commands in Basic. I messed around with commands in Excel. I modified computer games. And so on. It was fun.

When I started to learn programming and software engineering, it was in university and it was bottom up.

We started with:

- Language theory
- Data types
- Control flow structures
- Data structures
- etc.

When we did get to write code, it was on the command line and plagued with compiler problems, path problems, and a whole host of problems unrelated to actually learning programming.

**I hated programming.**

Flash-forward a few years. Somehow, I eventually starting working as a professional software engineer on some complex systems that were valued by their users. I was really good at it and I loved it.

Eventually, I did a course that showed how to create graphical user interfaces. And another that showed how to get computers to talk to each other using socket programming. And another on how to get multiple things to run at the same time using threads.

I connected the boring stuff with the thing I really liked: making software that could solve problems, that others could use. I connected it to something that mattered. It was no longer abstract and esoteric.

At least for me, and many developers like me, they taught it wrong. They really did. And it wasted years of time, effort, and results/outcomes that enthusiastic and time-free students like me could dedicate to something they are truly passionate about.

The bottom-up approach is not just a common way for teaching technical topics; it looks like the only way.

At least until you think about how you actually learn.

The designers of university courses, masters of their subject area, are trying to help. They are laying everything out to give you the logical progression through the material that they think will get you to the skills and capabilities that you require (hopefully).

And as I mentioned, it can work for some people.

It does not work for me, and I expect it does not work for you. In fact, very few programmers I’ve met that are really good at their craft came through computer science programs, or if they did, they learned at home, alone, hacking on side projects.

An alternative is the top-down approach.

**Flip the conventional approach on its head.**

Don’t start with definitions and theory. Instead, start by connecting the subject with the results you want and show how to get results immediately.

Lay out a program that focuses on practicing this process of getting results, going deeper into some areas as needed, but always in the context of the result they require.

It is not the traditional path.

Be careful not to use traditional ways of thinking or comparison if you take this path.

The onus is on you. There is no system to blame. You only fail when you stop.

**It is iterative**. Topics are revisited many times with deeper understanding.**It is imperfect**. Results may be poor in the beginning, but improve with practice.**It requires discovery**. The learner must be open to continual learning and discoverery.**It requires ownership**. The learner is responsible for improvement.**It requires curiosity**. The learner must pay attention to what interests them and follow it.

Seriously, I’ve heard “*experts*” say this many times, saying things like:

You have to know the theory first before you can use this technique, otherwise you cannot use it properly.

I agree that results will be imperfect in the beginning, but improvement and even expertise does not only have to come from theory and fundamentals.

If you believe that a beginner programmer should not be pushing changes to production and deploying them, then surely you must believe that a beginner machine learning practitioner would suffer the same constraints.

Skill must be demonstrated.

Trust must be earned.

This is true regardless of how a skill is acquired.

Really!?

This is another “*criticism*” I’ve seen leveled at this approach to learning.

Exactly. We want to be technicians, using the tools in practice to help people and not be researchers..

You do not need to cover all of the same ground because you have a different learning objective. Although you can circle back and learn anything you like later once you have a context in which to integrate the abstract knowledge.

Developers in industry are not computer scientists; they are engineers. They are proud technicians of the craft.

The benefits vastly outweigh the challenge of learning this way:

- You go straight to the thing you want and start practicing it.
- You have a context for connecting deeper knowledge and even theory.
- You can efficiently sift and filter topics based on your goals in the subject.

It’s faster.

It’s more fun.

And, I bet it makes you much better.

How could you be better?

Because the subject is connected to you emotionally. You have connected it to an outcome or result that matters to you. You are invested. You have demonstrable competence. We all love things we are good at (even if we are a little color blind to how good we are), which drives motivation, enthusiasm, and passion.

An enthusiastic learner will blow straight past the fundamentalist.

So, how have you approached the subject of machine learning?

Seriously, tell me your approach in the comments below.

- Are you taking a bottom-up university course?
- Are you modeling your learning on such a course?

Or worse:

Are you following a top-down type approach but are riddled with guilt, math envy, and insecurities?

You are not alone; I see this every single day in helping beginners on this website.

To connect the dots for you, I strongly encourage you to study machine learning using the top-down approach.

- Don’t start with precursor math.
- Don’t start with machine learning theory.
- Don’t code every algorithm from scratch.

This can all come later to refine and deepen your understanding once you have connections for this abstract knowledge.

- Start by learning how to work through very simple predictive modeling problems using a fixed framework with free and easy-to-use open source tools.
- Practice on many small projects and slowly increase their complexity.
- Show your work by building a public portfolio.

I have written about this approach many times; see the “*Further Reading*” section at the end of the post for some solid posts on how to get started with the top-down approach to machine learning.

“Experts” entrenched in universities will say it’s dangerous. Ignore them.

World-class practitioners will tell you it’s the way they learned and continue to learn. Model them.

Remember:

- You learned to read by practicing reading, not by studying language theory.
- You learned to drive by practicing driving, not by studying combustion engines.
- You learned to code by practicing coding, not by studying computability theory.

You can learn machine learning by practicing predictive modeling, not by studying math and theory.

Not only is this the way I learned and continue to practice machine learning, but it has helped tens of thousands of my students (and the many millions of readers of this blog).

Don’t stop there.

A time may come when you want or need to pull back the curtain on the mathematical pillars of machine learning such as linear algebra, calculus, statistics, probability, and so on.

You can use the exact same top-down approach.

Pick a goal or result that matters to you, and use that as a lens, filter, or sift on the topics to study and learn to the depth you need to get that result.

For example, let’s say you pick linear algebra.

A goal might be to grok SVD or PCA. These are methods used in machine learning for data projection, data reduction, and feature selection type tasks.

A top-down approach might be to:

- Implement the method in a high-level library such as scikit-learn and get a result.
- Implement the method in a lower-level library such as NumPy/SciPy and reproduce the result.
- Implement the method directly using matrices and matrix operations in NumPy or Octave.
- Study and explore the matrix arithmetic operations involved.
- Study and explore the matrix decomposition operations involved.
- Study methods for approximating the eigendecomposition of a matrix.
- And so on…

The goal provides the context and you can let your curiosity define the depth of study.

Painted this way, studying math is no different to studying any other topic in programming, machine learning, or other technical subjects.

It’s highly productive, and it’s a lot of fun!

This section provides more resources on the topic if you are looking to go deeper.

In this post, you discovered the concrete difference between the top-down and bottom-up approaches to learning technical material and why this is the approach that practitioners should and do use to learn machine learning and even related mathematics.

Specifically, you learned:

- The bottom-up approach used in universities to teach technical subjects and the problems with it.
- How people learn to read, drive, and program in a top-down manner and how the top-down approach works.
- The frame of machine learning and even mathematics using the top-down approach to learning and how to start to make rapid progress as a practitioner.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post You’re Doing it Wrong. Why Machine Learning Does Not Have to Be So Hard appeared first on Machine Learning Mastery.

]]>The post Why Do Machine Learning Algorithms Work on Data That They Have Not Seen Before? appeared first on Machine Learning Mastery.

]]>I recently got the question:

“How can a machine learning model make accurate predictions on data that it has not seen before?”

The answer is generalization, and this is the capability that we seek when we apply machine learning to challenging problems.

In this post, you will discover generalization, the superpower of machine learning

After reading this post, you will know:

- That machine learning algorithms all seek to learn a mapping from inputs to outputs.
- That simpler skillful machine learning models are easier to understand and more robust.
- That machine learning is only suitable when the problem requires generalization.

Let’s get started.

When we fit a machine learning algorithm, we require a training dataset.

This training dataset includes a set of input patterns and the corresponding output patterns. The goal of the machine learning algorithm is to learn a reasonable approximation of the mapping from input patterns to output patterns.

Here are some examples to make this concrete:

- Mapping from emails to whether they are spam or not for email spam classification.
- Mapping from house details to house sale price for house sale price regression.
- Mapping from photograph to text to describe the photo in photo caption generation.

The list could go on.

We can summarize this mapping that machine learning algorithms learn as a function (f) that predicts the output (y) given the input (X), or restated:

y = f(X)

Our goal in fitting the machine learning algorithms is to get the best possible f() for our purposes.

We are training the model to make predictions in the future given inputs for cases where we do not have the outputs. Where the outputs are unknown. This requires that the algorithm learn in general how to take observations from the domain and make a prediction, not just the specifics of the training data.

This is called generalization.

A machine learning algorithm must generalize from training data to the entire domain of all unseen observations in the domain so that it can make accurate predictions when you use the model.

This is really hard.

This approach of generalization requires that the data that we use to train the model (X) is a good and reliable sample of the observations in the mapping we want the algorithm to learn. The higher the quality and the more representative, the easier it will be for the model to learn the unknown and underlying “true” mapping that exists from inputs to outputs.

To generalize means to go from something specific to something broad.

It is the way humans we learn.

- We don’t memorize specific roads when we learn to drive; we learn to drive in general so that we can drive on any road or set of conditions.
- We don’t memorize specific computer programs when learning to code; we learn general ways to solve problems with code for any business case that might come up.
- We don’t memorize the specific word order in natural language; we learn general meanings for words and put them together in new sequences as needed.

The list could go on.

Machine learning algorithms are procedures to automatically generalize from historical observations. And they can generalize on more data than a human could consider, faster than a human could consider it.

It is the speed and scale with which these automated generalization machines operate that is what is so exciting in the field of machine learning.

The machine learning model is the result of the automated generalization procedure called the machine learning algorithm.

The model could be said to be a generalization of the mapping from training inputs to training outputs.

There may be many ways to map inputs to outputs for a specific problem and we can navigate these ways by testing different algorithms, different algorithm configurations, different training data, and so on.

We cannot know which approach will result in the most skillful model beforehand, therefore we must test a suite of approaches, configurations, and framings of the problem to discover what works and what the limits of learning are on the problem before selecting a final model to use.

The skill of the model at making predictions determines the quality of the generalization and can help as a guide during the model selection process.

Out of the millions of possible mappings, we prefer simpler mappings over complex mappings. Put another way, we prefer the simplest possible hypothesis that explains the data. This is one way to choose models and comes from Occam’s razor.

The simpler model is often (but not always) easier to understand and maintain and is more robust. In practice, you may want to choose the best performing simplest model.

The ability to automatically learn by generalization is powerful, but is not suitable for all problems.

- Some problems require a precise solution, such as arithmetic on a bank account balance.
- Some problems can be solved by generalization, but simpler solutions exist, such as calculating the square root of positive numbers.
- Some problems look like they could be solved by generalization but there exists no structured underlying relationship to generalize from the data, or such a function is too complex, such as predicting security prices.

Key to the effective use of machine learning is learning where it can and cannot (or should not) be used.

Sometimes this is obvious, but often it is not. Again, you must use experience and experimentation to help tease out whether a problem is a good fit for being solved by generalization.

This section provides more resources on the topic if you are looking to go deeper.

- How Machine Learning Algorithms Work
- What is Machine Learning?
- Generalization (learning) on Wikipedia
- Ockham’s razor on Wikipedia

In this post, you discovered generalization, the key capabilities that underlie all supervised machine learning algorithms.

Specifically, you learned:

- That machine learning algorithms all seek to learn a mapping from inputs to outputs.
- That simpler skillful machine learning models are easier to understand and more robust.
- That machine learning is only suitable when the problem requires generalization.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post Why Do Machine Learning Algorithms Work on Data That They Have Not Seen Before? appeared first on Machine Learning Mastery.

]]>The post Difference Between Classification and Regression in Machine Learning appeared first on Machine Learning Mastery.

]]>Fundamentally, classification is about predicting a label and regression is about predicting a quantity.

I often see questions such as:

How do I calculate accuracy for my regression problem?

Questions like this are a symptom of not truly understanding the difference between classification and regression and what accuracy is trying to measure.

In this tutorial, you will discover the differences between classification and regression.

After completing this tutorial, you will know:

- That predictive modeling is about the problem of learning a mapping function from inputs to outputs called function approximation.
- That classification is the problem of predicting a discrete class label output for an example.
- That regression is the problem of predicting a continuous quantity output for an example.

Let’s get started.

This tutorial is divided into 5 parts; they are:

- Function Approximation
- Classification
- Regression
- Classification vs Regression
- Converting Between Classification and Regression Problems

Predictive modeling is the problem of developing a model using historical data to make a prediction on new data where we do not have the answer.

For more on predictive modeling, see the post:

Predictive modeling can be described as the mathematical problem of approximating a mapping function (f) from input variables (X) to output variables (y). This is called the problem of function approximation.

The job of the modeling algorithm is to find the best mapping function we can given the time and resources available.

For more on approximating functions in applied machine learning, see the post:

Generally, we can divide all function approximation tasks into classification tasks and regression tasks.

Classification predictive modeling is the task of approximating a mapping function (f) from input variables (X) to discrete output variables (y).

The output variables are often called labels or categories. The mapping function predicts the class or category for a given observation.

For example, an email of text can be classified as belonging to one of two classes: “spam*“* and “*not spam*“.

- A classification problem requires that examples be classified into one of two or more classes.
- A classification can have real-valued or discrete input variables.
- A problem with two classes is often called a two-class or binary classification problem.
- A problem with more than two classes is often called a multi-class classification problem.
- A problem where an example is assigned multiple classes is called a multi-label classification problem.

It is common for classification models to predict a continuous value as the probability of a given example belonging to each output class. The probabilities can be interpreted as the likelihood or confidence of a given example belonging to each class. A predicted probability can be converted into a class value by selecting the class label that has the highest probability.

For example, a specific email of text may be assigned the probabilities of 0.1 as being “spam” and 0.9 as being “not spam”. We can convert these probabilities to a class label by selecting the “not spam” label as it has the highest predicted likelihood.

There are many ways to estimate the skill of a classification predictive model, but perhaps the most common is to calculate the classification accuracy.

The classification accuracy is the percentage of correctly classified examples out of all predictions made.

For example, if a classification predictive model made 5 predictions and 3 of them were correct and 2 of them were incorrect, then the classification accuracy of the model based on just these predictions would be:

accuracy = correct predictions / total predictions * 100 accuracy = 3 / 5 * 100 accuracy = 60%

An algorithm that is capable of learning a classification predictive model is called a classification algorithm.

Regression predictive modeling is the task of approximating a mapping function (f) from input variables (X) to a continuous output variable (y).

A continuous output variable is a real-value, such as an integer or floating point value. These are often quantities, such as amounts and sizes.

For example, a house may be predicted to sell for a specific dollar value, perhaps in the range of $100,000 to $200,000.

- A regression problem requires the prediction of a quantity.
- A regression can have real valued or discrete input variables.
- A problem with multiple input variables is often called a multivariate regression problem.
- A regression problem where input variables are ordered by time is called a time series forecasting problem.

Because a regression predictive model predicts a quantity, the skill of the model must be reported as an error in those predictions.

There are many ways to estimate the skill of a regression predictive model, but perhaps the most common is to calculate the root mean squared error, abbreviated by the acronym RMSE.

For example, if a regression predictive model made 2 predictions, one of 1.5 where the expected value is 1.0 and another of 3.3 and the expected value is 3.0, then the RMSE would be:

RMSE = sqrt(average(error^2)) RMSE = sqrt(((1.0 - 1.5)^2 + (3.0 - 3.3)^2) / 2) RMSE = sqrt((0.25 + 0.09) / 2) RMSE = sqrt(0.17) RMSE = 0.412

A benefit of RMSE is that the units of the error score are in the same units as the predicted value.

An algorithm that is capable of learning a regression predictive model is called a regression algorithm.

Some algorithms have the word “regression” in their name, such as linear regression and logistic regression, which can make things confusing because linear regression is a regression algorithm whereas logistic regression is a classification algorithm.

Classification predictive modeling problems are different from regression predictive modeling problems.

- Classification is the task of predicting a discrete class label.
- Regression is the task of predicting a continuous quantity.

There is some overlap between the algorithms for classification and regression; for example:

- A classification algorithm may predict a continuous value, but the continuous value is in the form of a probability for a class label.
- A regression algorithm may predict a discrete value, but the discrete value in the form of an integer quantity.

Some algorithms can be used for both classification and regression with small modifications, such as decision trees and artificial neural networks. Some algorithms cannot, or cannot easily be used for both problem types, such as linear regression for regression predictive modeling and logistic regression for classification predictive modeling.

Importantly, the way that we evaluate classification and regression predictions varies and does not overlap, for example:

- Classification predictions can be evaluated using accuracy, whereas regression predictions cannot.
- Regression predictions can be evaluated using root mean squared error, whereas classification predictions cannot.

In some cases, it is possible to convert a regression problem to a classification problem. For example, the quantity to be predicted could be converted into discrete buckets.

For example, amounts in a continuous range between $0 and $100 could be converted into 2 buckets:

- Class 0: $0 to $49
- Class 1: $50 to $100

This is often called discretization and the resulting output variable is a classification where the labels have an ordered relationship (called ordinal).

In some cases, a classification problem can be converted to a regression problem. For example, a label can be converted into a continuous range.

Some algorithms do this already by predicting a probability for each class that in turn could be scaled to a specific range:

quantity = min + probability * range

Alternately, class values can be ordered and mapped to a continuous range:

- $0 to $49 for Class 1
- $50 to $100 for Class 2

If the class labels in the classification problem do not have a natural ordinal relationship, the conversion from classification to regression may result in surprising or poor performance as the model may learn a false or non-existent mapping from inputs to the continuous output range.

This section provides more resources on the topic if you are looking to go deeper.

In this tutorial, you discovered the difference between classification and regression problems.

Specifically, you learned:

- That predictive modeling is about the problem of learning a mapping function from inputs to outputs called function approximation.
- That classification is the problem of predicting a discrete class label output for an example.
- That regression is the problem of predicting a continuous quantity output for an example.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post Difference Between Classification and Regression in Machine Learning appeared first on Machine Learning Mastery.

]]>The post How Álvaro Lemos got a Machine Learning Internship on a Data Science Team appeared first on Machine Learning Mastery.

]]>In this post, you will hear about Álvaro Lemos story and his transition from student to getting a machine learning internship. Including:

- How interest in genetic algorithms lead to the discovery of neural networks and the broader field of machine learning.
- How tutorial-based blog posts and books helped pass a test for a machine learning internship on a data science team

Let’s get started.

**Update Feb/2017**: Corrections made regarding Álvaro’s internship.

I’m from Salvador, Bahia (Brazil), but currently, I live in Belo Horizonte, Minas Gerais (also in Brazil).

I am studying Electrical Engineering at the Federal University of Minas Gerais and since the beginning of my undergraduation course, I’ve been involved with software development in some way.

On my first week as a freshman, I joined a research group called LabCOM to help a colleague on his master’s degree project. He wanted to build a self-managed traffic engineering system in which the network operation and maintenance are performed efficiently and without human intervention. It was built on top of a network simulator and I was assigned to deliver a module to measure some network parameters.

After that, I kept doing things related to software development, like the maintenance of a Linux server on my university, did a bunch of web development courses on sites like Code School, Codecademy and Coursera and a year ago I got my first internship job on a big software company.

It was an amazing experience, because I could work with state of the art technologies, very experienced developers, with whom I learned a lot of good practices and procedures.

When I was going to complete a year there, I received a proposal to work on another company on a Data Science team that was being formed, so I decided to accept that.

Good question…

I first heard of it during one meeting of the research group that I mentioned.

To make a long story short, we were using a genetic algorithm to get some results and, although they were reasonably good, they were taking longer than we could afford to be processed.

To overcome this, a colleague suggested to train a neural network with these results, because once we had a trained model, it would output results really fast.

I was one of the assigned people to implement this solution, but I didn’t know anything about it, so I googled it.

When I realized that an algorithm could provide the expected output without being explicitly programmed to do so and that it did so by mimicking the human brain, I was like “wooooow*, that’s magic!*”

When I decided that I wanted to learn machine learning, my first goal was to start the Johns Hopkins Data Science specialization on Coursera.

After completing two (out of ten!) courses, I let it go. I wasn’t really needing to apply that knowledge at that moment, I just wanted to learn machine learning and I felt that attending ten courses just to get that knowledge was quite overwhelming. I got distracted with other things and forgot about it.

One year after that, I decided to give a second shot to my “learning machine learning” quest. I registered myself on the famous Andrew Ng’s machine learning course on Coursera. It was just one course (instead of ten!), so I thought it would be okay. I really liked his lessons, he knows how to explain complex stuff in an easy way.

I was making progress there quite fast, but after finishing 60%, my first internship started and I started to use my spare time to learn the technologies I was using there. Then my classes at my university started and yes, I never came back to Coursera to finish that course again.

On the next semester, I attended an “Artificial Neural Network” class at my university. It was a good experience, reminded me of Andrew Ng’s approach, but I left that class with the same feeling that I still didn’t know machine learning enough, or that I wasn’t allowed to say that I know it.

Nobody told me, but I started thinking that in order to say that you can apply machine learning, you have to do some master’s degree program, because I saw a lot of students doing that.

Oh, another thing that I tried was to learn from articles (research papers). Please, do not do that. **That’s by far the worst approach I have ever tried**.

Maybe I was naive, but well, some teachers encourage you to learn that way. I think they are good to find techniques and/or algorithms that do what you want, but after making a short list, leave them and start googling for YouTube videos, blog posts and books.

It helped me a lot.

I was doing fine on my previous job when I heard of a machine learning internship opportunity. It was with a company that I had already heard good things about, so I decided to give it a shot.

They gave me three machine learning challenges to do within a week, but since I was working and studying, I just had a weekend to do so.

- The first problem asked us to train a logistic regression model to predict a target variable from a dataset with four features. I was supposed to do an exploratory data analysis, order the most relevant features, estimate the error and do some prediction on a test dataset. For this one I was able to use the knowledge that I already had, just had to learn the Scikit Learn API.
- The second one was quite similar, but the dataset was heavily imbalanced and I had no clue on how to deal with that, so I started googling and I found your blog. It really helped me because I discovered that I could use other metrics instead of the default accuracy, do cross validation, stratified cross validation, undersample and oversample the dataset, compare algorithms, etc. With all this new information, I created a Python module that would do that automatically for me and rank the models based on their F1-Score.
- The third one was the most challenging. I was supposed to find the most relevant features in a classification dataset with 128 features. Your blog posts also helped me with that.

I couldn’t simply send them the results, I also had to write a detailed report, so your blog posts were fundamental as they helped to fill my knowledge gap very fast.

Now, in my new job, your books are helping me a lot, our manager bought the Super Bundle for us

Thank you!

The company is called Radix and I just joined the Data Science team.

My first project was already finishing when I got there, but was very interesting. It’s a system called Oil X!pert, which receives as input oil sample of trucks, loaders and other equipment and output both the criticality level of the part and a diagnosis text, as shown in the figure below:

Now we’re using data-driven approaches in other projects to reach better solutions.

Specifically, the project I’m currently working on aims to find the root cause of fouling deposition on heat exchangers.

- GitHub: https://github.com/alvarolemos
- LinkedIn: https://www.linkedin.com/in/alvarolemos

The post How Álvaro Lemos got a Machine Learning Internship on a Data Science Team appeared first on Machine Learning Mastery.

]]>The post How to Go From Working in a Bank To Hired as Senior Data Scientist at Target appeared first on Machine Learning Mastery.

]]>Getting Hired as a

Santhosh Sharma recently reached out to me to share his inspirational story and I want to share it with you.

His story shows how with enthusiasm for machine learning, taking the initiative, sharing your results and a little luck can change your career and throw you deep into applied machine learning.

After reading this interview, you will know:

- How Santhosh demonstrated is growing machine learning skills publicly on Kaggle.
- The technical details of the methodical thing that Santhosh did and why it is note worthy.
- How he used his public recognition to help get hired as a Data Scientist.

Let’s dive in.

**Do you have your own success story?**

Share it in the comments.

I have an M.Tech. in Computer Science and Engineering specializing in Parallel and Distributed Computing from IIT Kanpur, India.

I was working in the loans department of a bank.

The bank had developed software which used machine learning to predict whether it would be a good bet to sanction a loan application or not.

The results of the software in many cases were better than some of the credit officers.

I was impressed by this technology and started developing an interest in machine learning since then.

Machine Learning Mastery helped me master machine learning. Period.

I don’t have a background in mathematics and statistics.

I was under the wrong assumption that I needed one.

I struggled for close to 3 years to get a good hold on ML algorithms. Lots of time got wasted in learning unnecessary things from many books which were theoretical in nature.

Progress made using Machine Learning Mastery books helped improve my skills by leaps and bounds in a very short span of time.

Kaggle is a great platform for learning Machine Learning.

The datasets hosted represent real-world observations. Experts all across the world post solutions to these problems. Learning from these solutions helped accelerate my learning.

It made learning machine learning fun and enjoyable.

I worked on the Allstate Claims Severity dataset.

I did a spot-check using the popular regression algorithms such as LR, Ridge, Lasso, Elastic Net, etc.

I used the seaborn library for EDA and the scikit-learn library for modeling.

The approach followed is inspired by the recipes and the approach in ML Mastery Python books.

In the feedback received for this kernel, most of the users say that it is very easy to follow.

I am thankful to Machine Learning Mastery books which taught me how to approach a machine learning problem.

I have followed this in all of my kernels.

The kernel can be accessed directly here.

The steps followed is in accordance with the approach mentioned in the Machine Learning Mastery books. The steps are mentioned below.

- Shape of the train and test dataset
- Peek – eyeball the data
- Description – min, max, avg, etc of each column
- Skew – of each numerical column to check if correction is necessary

- Correction of skew – one of the columns needed correction – I used log transform

- Correlation – I filtered out only highly correlated pairs
- Scatter plot – plotting using seaborn

- Box and density plots – violin plot showed spectacular visualization
- Grouping of one hot encoded attributes – to show the count

- One hot encoding of categorical data – many columns are categorical
- Test-train split – for model evaluation

- Linear Regression (Linear algo)
- Ridge Regression (Linear algo)
- LASSO Linear Regression (Linear algo)
- Elastic Net Regression (Linear algo)
- KNN (non-linear algo)
- CART (non-linear algo)
- SVM (Non-linear algo)
- Bagged Decision Trees (Bagging)
- Random Forest (Bagging)
- Extra Trees (Bagging)
- AdaBoost (Boosting)
- Stochastic Gradient Boosting (Boosting)
- MLP (Deep Learning)
- XGBoost

- Using the best model (XGBRegressor)
- Surprising results : Simple linear models such as LR, Ridge, Lasso, and ElasticNet performed very well

I showcased my top-voted kernel on Kaggle to the interviewer.

He was very impressed by the systematic approach and the results I got.

I will be working as a Senior Data Scientist with Target Corporation.

I will be joining next week.

I’m looking forward to working with the team and make a small difference in the shopping experience of millions of customers of Target.

I’m looking forward to the next book by Machine Learning Mastery on Time Series!

In this post, you discovered how a Santhosh went from working in a bank to getting a job as a Senior Data Scientist at Target.

You learned that:

- Santhosh applied the skills he learned to real datasets on a Kaggle problem.
- He shared his results publically, showing how others can do what he did and in turn gaining credibility with a top ranking Kaggle Kernel.
- The top voted Kernel helped Santhosh get a new job as a Data Scientist at Target.

So, what can you do?

- Are you practicing on real datasets?
- Are you sharing everything you’re learning publicly?
- Are you helping others?

**What is your next step going to be?**

Share it in the comments below.

The post How to Go From Working in a Bank To Hired as Senior Data Scientist at Target appeared first on Machine Learning Mastery.

]]>The post The Machine Learning Mastery Method appeared first on Machine Learning Mastery.

]]>I teach a 5-step process that you can use to get your start in applied machine learning.

It is unconventional.

The traditional way to teach machine learning is bottom-up.

Start with the theory and math, then algorithm implementations, then send you off to figure out how to start solving real-world problems.

The Machine Learning Mastery approach flips this and starts with the outcome that is most valuable.

**It targets the outcome that business wants to pay for: **

**how to deliver a result.**

A result in the form of a set of predictions or model that can reliably make predictions.

**This is a top-down and results-first approach.**

Starting with the goal of achieving the result that is most desirable in the marketplace, what is the shortest path to take you, the practitioner, to that result?

We can summarize this path in 5-steps as follows:

**Step 1: Adjust Mindset**(*believe!*).**Step 2: Pick a Process**(how to get results).**Step 3: Pick a Tool**(implementation).**Step 4: Practice on Datasets**(*put in the work*).**Step 5: Build a Portfolio**(*show your skills*).

That’s it.

This is the philosophy behind all of my Ebook training.

It’s why I created this website. I knew an easier way and just had to share it.

Below is a cartoon to illustrate the process, where step 1 (on mindset) and step 2 (on show your work) are omitted for brevity.

Let’s take a closer look at each step.

Before we begin, you must know the landmarks of machine learning.

I often just assume this, but you cannot proceed unless you know some true basics.

For example:

- You should know what machine learning is and be able to explain it to a colleague.
- You should know some examples of machine learning problems off the top of your head
- You should know that machine learning is the only way to solve some complex problems.
- You should know that predictive modeling is the most useful part of applied machine learning.
- You should know where machine learning fits with regard to AI and Data Science
- You should know the types of machine learning algorithms available.
- You should know some basic machine learning terms

Machine learning is not just for the professors.

It is not just for the gifted or the academics.

You can learn the topic and apply it to solve problems.

There’s no reason why not.

- You do not need to write code.
- You do not need to know or be good at math.
- You do not need a higher degree.
- You do not need big data.
- You do not need access to a supercomputer.
- You do not need a lot of time.

Really, there is only one thing that can stop you from getting started and getting good at machine learning.

It’s you.

- Maybe you just can’t find the motivation.
- Maybe you think you have to implement everything from scratch.
- Maybe you keep picking advanced problems rather than beginner problems to work on.
- Maybe you don’t have a systematic process to follow in order to deliver a result.
- Maybe you’re not making use of good tools and libraries.

Clear the limiting beliefs stopping you from getting started.

This post might help:

There are a lot of speed bumps you can hit.

Identify them, address them, and keep moving.

Once you know that you can do machine learning, understand why.

- Maybe you’re interested in learning more about machine learning algorithms.
- Maybe you’re interested in creating predictions.
- Maybe you’re interested in solving complex problems.
- Maybe you’re interested in creating smarter software.
- Maybe you’re even interested in becoming a data scientist.

Think hard on this topic and try and figure out your “*why*“.

This post might help:

Once you have your “*why*“, find your tribe.

Which group of machine learning practitioners do you have the most affinity?

- Maybe you’re a business person with a general interest.
- Maybe you’re a manager delivering a project.
- Maybe you’re a machine learning student.
- Maybe you’re a machine learning researcher.
- Maybe you’re a researcher with a sticky problem.
- Maybe you want to implement algorithms
- Maybe you need one-off predictions.
- Maybe you need a model you can deploy.
- Maybe you’re a data scientist.
- Maybe you’re a data analyst.

Each tribe has different interests and will approach the field of machine learning from a different direction.

Not all books and materials are right for you, find your tribe, then find the materials that speak to you.

This post might help:

Do you want to reliably get above average results on problem after problem?

You need to follow a systematic process.

- A process allows you to harness and reuse best practices.
- It means you don’t have to rely on memory or intuition.
- It guides you through a project end-to-end.
- It means that you always know what to do next.
- It can be tailored to your specific problem types and tools.

A systematic process is the difference between a roller coaster of good and bad results on the one hand and above average and forever improving results on the other.

I would choose above average and forever improving results every time.

A process template that I recommend is as follows:

**Step 1**: Define your problem.**Step 2**: Prepare your data.**Step 3**: Spot-check algorithms.**Step 4**: Improve results.**Step 5**: Present results.

Below is a nice cartoon to summarize this systematic process:

You can learn more about this process in the post:

You do not have to use this process, but you do need a systematic process for working through predictive modeling problems.

Pick a best-of-breed tool that you can use to deliver machine learning results.

Map your process onto the tool and learn how to use it most effectively.

There are three tools I recommend the most:

**Weka Machine Learning Workbench**(*Perfect for beginners*). Weka offers a GUI interface and no code is required. I use it for quick one-off modeling problems.**Python Ecosystem**(*Perfect for intermediate*). Specifically pandas and scikit-learn on top of the SciPy platform. You can use the same code and models in development and they are reliable enough to run in operations.**R Platform**(*Perfect for advanced*). R was designed for statistical computing, and although the language is arcane and some of the packages are poorly documented, it offers the most methods as well as state of the art techniques.

I also have recommendations for specialty areas:

**Keras for Deep Learning**. It uses Python meaning you can leverage the whole Python ecosystem which saves a lot of time. The interface is very clean, whilst also supporting the power of the Theano and Keras back-ends.**XGBoost for Gradient Boosting**. It is the fastest implementation of the technique around. It also supports both R and Python allowing you to leverage either platform in your project.

These are just my personal recommendations and I have lots of posts as well as more detailed training on each.

Learn how to use your chosen tool well. Study it. Become an expert in it.

The programming language does not matter.

Even the tool you use does not matter.

The skills you learn working through problems will transfer from platform to platform easily.

Nevertheless, here are some survey results on the most popular languages in machine learning:

Once you have a process and a tool, you need to practice.

You need to practice a lot.

Practice on standard machine learning datasets.

- Use real-world datasets, collected from an actual problem domain (rather than contrived).
- Use small datasets that fit into memory or an excel spreadsheet.
- Use well-understood datasets so you know what kind of results to expect.

Practice on different types of datasets. Practice on problems that make you uncomfortable as you will have to push your skills to get a solution. Seek out different traits in data problems, such as:

- Different types of supervised learning such as classification and regression.
- Different sized datasets from tens, hundreds, thousands and millions of instances.
- Different numbers of attributes from less than ten, tens, hundreds and thousands of attributes.
- Different attribute types from real, integer, categorical, ordinal and mixtures.
- Different domains that force you to quickly understand and characterize a new problem in which you have no previous experience.

These are the most used and best-understood datasets and the best place to start.

Learn more in the post:

These datasets are often larger and require more preparation to model well.

For a list of the most popular datasets that you could practice on, see the post:

Collect data on machine learning problems that matter to you.

You will find the problems and the solutions you devise so much more rewarding.

For more information, see the post:

You will build up a collection of completed projects.

Put them to good use.

As you work through datasets and get better, create semi-formal outputs that summarize your findings.

- Maybe upload your code and summarize it in a readme.
- Maybe you write up your results in a blog post.
- Maybe you make a slide deck.
- Maybe you create a little video on youtube.

Each one of these completed projects represents one piece of your growing portfolio.

Just like a painter, you can build a portfolio of completed work to demonstrate your growing skills in delivering results with machine learning.

You can learn more about this approach in the post:

You can use this portfolio yourself, leveraging code and knowledge in your prior results in larger and more ambitious projects.

Once your portfolio is mature, you may even choose to leverage it into more responsibility at work or into a new machine learning focused role.

For more on this see the post:

Below are some practical tips and tricks you may consider when using this process.

- Start with a simple process (
*like above*) and a simple tool (*like Weka*), then advance once you have confidence. - Begin with the simplest and most used datasets (
*iris flowers*and*Pima diabetes*). - Each time you apply the process, look for ways to improve it and your usage of it.
- If you discover new methods, figure out the best way to integrate them into your process.
- Study algorithms, but only as much and in ways that help you achieve better results with your process.
- Study and learn from experts and see what methods you can steal and add to your process.
- Study your tool like you do predictive modeling problems and get the most out of it.
- Tackle harder and harder problems, leave the easy ones as you won’t learn much from them.
- Focus on clearly presenting results, the better you do this, the greater the impact of your portfolio.
- Engage in the community on forums and Q&A sites, both ask and answer questions.

In this post, you discovered a simple 5-step process that you can use to get started and make progress in applied machine learning.

Although simple to layout, the approach does take hard work, but it does payoff.

Many of my students worked through this process and got work as machine learning engineers and data scientists.

If you are in a deeper treatment of this process and related ideas, see the post:

**Do you have any questions?**

*Ask in the comments below and I will do my best to answer.*

The post The Machine Learning Mastery Method appeared first on Machine Learning Mastery.

]]>