[New Book] Click to get The Beginner's Guide to Data Science!
Use the offer code 20offearlybird to get 20% off. Hurry, sale ends soon!

Archive | Python Machine Learning

Box and Whisker Plots of Classification Accuracy vs Repeats for k-Fold Cross-Validation

Repeated k-Fold Cross-Validation for Model Evaluation in Python

The k-fold cross-validation procedure is a standard method for estimating the performance of a machine learning algorithm or configuration on a dataset. A single run of the k-fold cross-validation procedure may result in a noisy estimate of model performance. Different splits of the data may result in very different results. Repeated k-fold cross-validation provides a […]

Continue Reading
Nested Cross-Validation for Machine Learning with Python

Nested Cross-Validation for Machine Learning with Python

The k-fold cross-validation procedure is used to estimate the performance of machine learning models when making predictions on data not used during training. This procedure can be used both when optimizing the hyperparameters of a model on a dataset, and when comparing and selecting a model for the dataset. When the same cross-validation procedure and […]

Continue Reading
Scatter Plot of Synthetic Clustering Dataset With Points Colored by Known Cluster

10 Clustering Algorithms With Python

Clustering or cluster analysis is an unsupervised learning problem. It is often used as a data analysis technique for discovering interesting patterns in data, such as groups of customers based on their behavior. There are many clustering algorithms to choose from and no single best clustering algorithm for all cases. Instead, it is a good […]

Continue Reading
Distance Measures for Machine Learning

4 Distance Measures for Machine Learning

Distance measures play an important role in machine learning. They provide the foundation for many popular and effective machine learning algorithms like k-nearest neighbors for supervised learning and k-means clustering for unsupervised learning. Different distance measures must be chosen and used depending on the types of the data. As such, it is important to know […]

Continue Reading
Results for Standard Classification and Regression Machine Learning Datasets

Best Results for Standard Machine Learning Datasets

It is important that beginner machine learning practitioners practice on small real-world datasets. So-called standard machine learning datasets contain actual observations, fit into memory, and are well studied and well understood. As such, they can be used by beginner practitioners to quickly test, explore, and practice data preparation and modeling techniques. A practitioner can confirm […]

Continue Reading
Hyperparameters for Classification Machine Learning Algorithms

Tune Hyperparameters for Classification Machine Learning Algorithms

Machine learning algorithms have hyperparameters that allow you to tailor the behavior of the algorithm to your specific dataset. Hyperparameters are different from parameters, which are the internal coefficients or weights for a model found by the learning algorithm. Unlike parameters, hyperparameters are specified by the practitioner when configuring the model. Typically, it is challenging […]

Continue Reading