Blog - Page 94 of 187

Line Plot of Mean Accuracy for Cross-Validation k-Values With Error Bars (Blue) vs. the Ideal Case (red)

How to Configure k-Fold Cross-Validation

By Jason Brownlee on August 26, 2020 in Python Machine Learning 49

The k-fold cross-validation procedure is a standard method for estimating the performance of a machine learning algorithm on a dataset. A common value for k is 10, although how do we know that this configuration is appropriate for our dataset and our algorithms? One approach is to explore the effect of different k values on […]

Nested Cross-Validation for Machine Learning with Python

By Jason Brownlee on November 20, 2021 in Python Machine Learning 170

The k-fold cross-validation procedure is used to estimate the performance of machine learning models when making predictions on data not used during training. This procedure can be used both when optimizing the hyperparameters of a model on a dataset, and when comparing and selecting a model for the dataset. When the same cross-validation procedure and […]

LOOCV for Evaluating Machine Learning Algorithms

By Jason Brownlee on August 26, 2020 in Python Machine Learning 51

The Leave-One-Out Cross-Validation, or LOOCV, procedure is used to estimate the performance of machine learning algorithms when they are used to make predictions on data not used to train the model. It is a computationally expensive procedure to perform, although it results in a reliable and unbiased estimate of model performance. Although simple to use […]

Train-Test Split for Evaluating Machine Learning Algorithms

By Jason Brownlee on August 26, 2020 in Python Machine Learning 79

The train-test split procedure is used to estimate the performance of machine learning algorithms when they are used to make predictions on data not used to train the model. It is a fast and easy procedure to perform, the results of which allow you to compare the performance of machine learning algorithms for your predictive […]

Histogram of Each Variable in the Diabetes Classification Dataset

How to Selectively Scale Numerical Input Variables for Machine Learning

By Jason Brownlee on August 17, 2020 in Data Preparation 14

Many machine learning models perform better when input variables are carefully transformed or scaled prior to modeling. It is convenient, and therefore common, to apply the same data transforms, such as standardization and normalization, equally to all input variables. This can achieve good results on many problems. Nevertheless, better results may be achieved by carefully […]

Add Binary Flags for Missing Values for Machine Learning

By Jason Brownlee on August 17, 2020 in Data Preparation 7

Missing values can cause problems when modeling classification and regression prediction problems with machine learning algorithms. A common approach is to replace missing values with a calculated statistic, such as the mean of the column. This allows the dataset to be modeled as per normal but gives no indication to the model that the row […]

How to Create Custom Data Transforms for Scikit-Learn

By Jason Brownlee on July 19, 2020 in Data Preparation 24

The scikit-learn Python library for machine learning offers a suite of data transforms for changing the scale and distribution of input data, as well as removing input features (columns). There are many simple data cleaning operations, such as removing outliers and removing columns with few observations, that are often performed manually to the data, requiring […]

Box and Whisker Plot of Classification Accuracy for Different Data Transforms on the Wine Classification Dataset

How to Grid Search Data Preparation Techniques

By Jason Brownlee on August 17, 2020 in Data Preparation 11

Machine learning predictive modeling performance is only as good as your data, and your data is only as good as the way you prepare it for modeling. The most common approach to data preparation is to study a dataset and review the expectations of a machine learning algorithms, then carefully choose the most appropriate data […]

Machine Learning Data Preparation Framework

Framework for Data Preparation Techniques in Machine Learning

By Jason Brownlee on July 17, 2020 in Data Preparation 42

There are a vast number of different types of data preparation techniques that could be used on a predictive modeling project. In some cases, the distribution of the data or the requirements of a machine learning model may suggest the data preparation needed, although this is rarely the case given the complexity and high-dimensionality of […]

6 Dimensionality Reduction Algorithms With Python

By Jason Brownlee on August 17, 2020 in Data Preparation 20

Dimensionality reduction is an unsupervised learning technique. Nevertheless, it can be used as a data transform pre-processing step for machine learning algorithms on classification and regression predictive modeling datasets with supervised learning algorithms. There are many dimensionality reduction algorithms to choose from and no single best algorithm for all cases. Instead, it is a good […]

← Previous 1 … 93 94 95 … 187 Next →