Blog - Page 93 of 187

How to Set NumPy Axis for Rows and Columns in Python

How to Set Axis for Rows and Columns in NumPy

By Jason Brownlee on May 1, 2020 in Linear Algebra 2

NumPy arrays provide a fast and efficient way to store and manipulate data in Python. They are particularly useful for representing data as vectors and matrices in machine learning. Data in NumPy arrays can be accessed directly via column and row indexes, and this is reasonably straightforward. Nevertheless, sometimes we must perform operations on arrays […]

Box and Whisker Plot of Classification Accuracy Scores for Two Algorithms

Hypothesis Test for Comparing Machine Learning Algorithms

By Jason Brownlee on September 1, 2020 in Statistics 43

Machine learning models are chosen based on their mean performance, often calculated using k-fold cross-validation. The algorithm with the best mean performance is expected to be better than those algorithms with worse mean performance. But what if the difference in the mean performance is caused by a statistical fluke? The solution is to use a […]

How to Calculate the Bias-Variance Trade-off in Python

How to Calculate the Bias-Variance Trade-off with Python

By Jason Brownlee on August 26, 2020 in Python Machine Learning 25

The performance of a machine learning model can be characterized in terms of the bias and the variance of the model. A model with high bias makes strong assumptions about the form of the unknown underlying function that maps inputs to outputs in the dataset, such as linear regression. A model with high variance is […]

Why Do I Get Different Results Each Time in Machine Learning?

By Jason Brownlee on August 27, 2020 in Machine Learning Process 39

Are you getting different results for your machine learning algorithm? Perhaps your results differ from a tutorial and you want to understand why. Perhaps your model is making different predictions each time it is trained, even when it is trained on the same data set each time. This is to be expected and might even […]

Probability Decision Surface for Logistic Regression on a Binary Classification Task

Plot a Decision Surface for Machine Learning Algorithms in Python

By Jason Brownlee on August 26, 2020 in Python Machine Learning 19

Classification algorithms learn how to assign class labels to examples, although their decisions can appear opaque. A popular diagnostic for understanding the decisions made by a classification algorithm is the decision surface. This is a plot that shows how a fit machine learning algorithm predicts a coarse grid across the input feature space. A decision […]

A Gentle Introduction to Computational Learning Theory

By Jason Brownlee on September 7, 2020 in Probability 11

Computational learning theory, or statistical learning theory, refers to mathematical frameworks for quantifying learning tasks and algorithms. These are sub-fields of machine learning that a machine learning practitioner does not need to know in great depth in order to achieve good results on a wide range of problems. Nevertheless, it is a sub-field where having […]

Scatter Plot of Number of Times Pregnant vs. Plasma Glucose Numerical Variables by Class Label

How to use Seaborn Data Visualization for Machine Learning

By Jason Brownlee on August 19, 2020 in Python Machine Learning 19

Data visualization provides insight into the distribution and relationships between variables in a dataset. This insight can be helpful in selecting data preparation techniques to apply prior to modeling and the types of algorithms that may be most suited to the data. Seaborn is a data visualization library for Python that runs on top of […]

Histogram of Examples in Each Class in the Glass Multi-Class Classification Dataset

Multi-Class Imbalanced Classification

By Jason Brownlee on January 5, 2021 in Imbalanced Classification 66

Imbalanced classification are those prediction tasks where the distribution of examples across class labels is not equal. Most imbalanced classification examples focus on binary classification tasks, yet many of the tools and techniques for imbalanced classification also directly support multi-class classification problems. In this tutorial, you will discover how to use the tools of imbalanced […]

Line Plot of Expected vs. Births Predicted Using XGBoost

How to Use XGBoost for Time Series Forecasting

By Jason Brownlee on March 19, 2021 in XGBoost 135

XGBoost is an efficient implementation of gradient boosting for classification and regression problems. It is both fast and efficient, performing well, if not the best, on a wide range of predictive modeling tasks and is a favorite among data science competition winners, such as those on Kaggle. XGBoost can also be used for time series […]

Box and Whisker Plots of Classification Accuracy vs Repeats for k-Fold Cross-Validation

Repeated k-Fold Cross-Validation for Model Evaluation in Python

By Jason Brownlee on August 26, 2020 in Python Machine Learning 52

The k-fold cross-validation procedure is a standard method for estimating the performance of a machine learning algorithm or configuration on a dataset. A single run of the k-fold cross-validation procedure may result in a noisy estimate of model performance. Different splits of the data may result in very different results. Repeated k-fold cross-validation provides a […]

← Previous 1 … 92 93 94 … 187 Next →