Probability for Machine Learning Discover How To Harness Uncertainty With Python Machine Learning DOES NOT MAKE SENSE Without Probability What is Probability?…it’s about handling uncertainty Uncertainty involves making decisions with incomplete information, and this is the way we generally operate in the world. Handling uncertainty is typically described using everyday words like chance, luck, and […]

# Search results for "Model Risk"

## What Is Probability?

Uncertainty involves making decisions with incomplete information, and this is the way we generally operate in the world. Handling uncertainty is typically described using everyday words like chance, luck, and risk. Probability is a field of mathematics that gives us the language and tools to quantify the uncertainty of events and reason in a principled […]

## Resources for Getting Started With Probability in Machine Learning

Machine Learning is a field of computer science concerned with developing systems that can learn from data. Like statistics and linear algebra, probability is another foundational field that supports machine learning. Probability is a field of mathematics concerned with quantifying uncertainty. Many aspects of machine learning are uncertain, including, most critically, observations from the problem […]

## A Gentle Introduction to Dropout for Regularizing Deep Neural Networks

Deep learning neural networks are likely to quickly overfit a training dataset with few examples. Ensembles of neural networks with different model configurations are known to reduce overfitting, but require the additional computational expense of training and maintaining multiple models. A single model can be used to simulate having a large number of different network […]

## How to Develop Baseline Forecasts for Multi-Site Multivariate Air Pollution Time Series Forecasting

Real-world time series forecasting is challenging for a whole host of reasons not limited to problem features such as having multiple input variables, the requirement to predict multiple time steps, and the need to perform the same type of prediction for multiple physical sites. The EMC Data Science Global Hackathon dataset, or the ‘Air Quality […]

## Why Initialize a Neural Network with Random Weights?

The weights of artificial neural networks must be initialized to small random numbers. This is because this is an expectation of the stochastic optimization algorithm used to train the model, called stochastic gradient descent. To understand this approach to problem solving, you must first understand the role of nondeterministic and randomized algorithms as well as […]

## The Role of Randomization to Address Confounding Variables in Machine Learning

A large part of applied machine learning is about running controlled experiments to discover what algorithm or algorithm configuration to use on a predictive modeling problem. A challenge is that there are aspects of the problem and the algorithm called confounding variables that cannot be controlled (held constant) and must be controlled-for. An example is […]

## A Gentle Introduction to Statistical Power and Power Analysis in Python

The statistical power of a hypothesis test is the probability of detecting an effect, if there is a true effect present to detect. Power can be calculated and reported for a completed experiment to comment on the confidence one might have in the conclusions drawn from the results of the study. It can also be […]

## Statistical Significance Tests for Comparing Machine Learning Algorithms

Comparing machine learning methods and selecting a final model is a common operation in applied machine learning. Models are commonly evaluated using resampling methods like k-fold cross-validation from which mean skill scores are calculated and compared directly. Although simple, this approach can be misleading as it is hard to know whether the difference between mean […]

## Why Applied Machine Learning Is Hard

How to Handle the Intractability of Applied Machine Learning. Applied machine learning is challenging. You must make many decisions where there is no known “right answer” for your specific problem, such as: What framing of the problem to use? What input and output data to use? What learning algorithm to use? What algorithm configuration to […]