It is common in statistics and machine learning to create a linear transform or mapping of a variable. An example is a linear scaling of a feature variable. We have the natural intuition that the mean of the scaled values is the same as the scaled value of the mean raw variable values. This makes […]

# Archive | Probability

## A Gentle Introduction to Probability Scoring Methods in Python

How to Score Probability Predictions in Python and Develop an Intuition for Different Metrics. Predicting probabilities instead of class labels for a classification problem can provide additional nuance and uncertainty for the predictions. The added nuance allows more sophisticated metrics to be used to interpret and evaluate the predicted probabilities. In general, methods for the […]

## How and When to Use a Calibrated Classification Model with scikit-learn

Instead of predicting class values directly for a classification problem, it can be convenient to predict the probability of an observation belonging to each possible class. Predicting probabilities allows some flexibility including deciding how to interpret the probabilities, presenting predictions with uncertainty, and providing more nuanced ways to evaluate the skill of the model. Predicted […]

## How to Use ROC Curves and Precision-Recall Curves for Classification in Python

It can be more flexible to predict probabilities of an observation belonging to each class in a classification problem rather than predicting classes directly. This flexibility comes from the way that probabilities may be interpreted using different thresholds that allow the operator of the model to trade-off concerns in the errors made by the model, […]

## Do Not Use Random Guessing As Your Baseline Classifier

I recently received the following question via email: Hi Jason, quick question. A case of class imbalance: 90 cases of thumbs up 10 cases of thumbs down. How would we calculate random guessing accuracy in this case? We can answer this question using some basic probability (I opened excel and typed in some numbers). Let’s […]