Data Management Matters

Data Management Matters And Why You Need To Take It Seriously

We live in a world drowning in data. Internet tracking, stock market movement, genome sequencing technologies and their ilk all produce enormous amounts of data. Most of this data is someone else’s responsibility, generated by someone else, stored in someone else’s database, which is maintained and made available by… you guessed it… someone else. But. […]

What Is the Naive Classifier for Each Imbalanced Classification Metric?

What Is the Naive Classifier for Each Imbalanced Classification Metric?

A common mistake made by beginners is to apply machine learning algorithms to a problem without establishing a performance baseline. A performance baseline provides a minimum score above which a model is considered to have skill on the dataset. It also provides a point of relative improvement for all models evaluated on the dataset. A […]

Line Plot of Events vs Probability or the Probability Density Function for the Normal Distribution

Continuous Probability Distributions for Machine Learning

The probability for a continuous random variable can be summarized with a continuous probability distribution. Continuous probability distributions are encountered in machine learning, most notably in the distribution of numerical input and output variables for models and in the distribution of errors made by models. Knowledge of the normal continuous probability distribution is also required […]

Moving Average Smoothing for Data Preparation, Feature Engineering, and Time Series Forecasting with Python

Moving Average Smoothing for Data Preparation and Time Series Forecasting in Python

Moving average smoothing is a naive and effective technique in time series forecasting. It can be used for data preparation, feature engineering, and even directly for making predictions. In this tutorial, you will discover how to use moving average smoothing for time series forecasting with Python. After completing this tutorial, you will know: How moving […]

applied predictive modeling

Books for Machine Learning with R

R is a powerful platform for data analysis and machine learning. It is my main workhorse for things like competitions and consulting work. The reason is the large amounts of powerful algorithms available, all on the one platform. In this post I want to point out some resources you can use to get started in […]

