A Gentle Introduction to PyCaret for Machine Learning

By Jason Brownlee on November 15, 2020 in Python Machine Learning 15

PyCaret is a Python open source machine learning library designed to make performing standard tasks in a machine learning project easy.

It is a Python version of the Caret machine learning package in R, popular because it allows models to be evaluated, compared, and tuned on a given dataset with just a few lines of code.

The PyCaret library provides these features, allowing the machine learning practitioner in Python to spot check a suite of standard machine learning algorithms on a classification or regression dataset with a single function call.

In this tutorial, you will discover the PyCaret Python open source library for machine learning.

After completing this tutorial, you will know:

PyCaret is a Python version of the popular and widely used caret machine learning package in R.
How to use PyCaret to easily evaluate and compare standard machine learning models on a dataset.
How to use PyCaret to easily tune the hyperparameters of a well-performing machine learning model.

Let’s get started.

A Gentle Introduction to PyCaret for Machine Learning
Photo by Thomas, some rights reserved.

Tutorial Overview

This tutorial is divided into four parts; they are:

What Is PyCaret?
Sonar Dataset
Comparing Machine Learning Models
Tuning Machine Learning Models

What Is PyCaret?

PyCaret is an open source Python machine learning library inspired by the caret R package.

The goal of the caret package is to automate the major steps for evaluating and comparing machine learning algorithms for classification and regression. The main benefit of the library is that a lot can be achieved with very few lines of code and little manual configuration. The PyCaret library brings these capabilities to Python.

PyCaret is an open-source, low-code machine learning library in Python that aims to reduce the cycle time from hypothesis to insights. It is well suited for seasoned data scientists who want to increase the productivity of their ML experiments by using PyCaret in their workflows or for citizen data scientists and those new to data science with little or no background in coding.

— PyCaret Homepage

The PyCaret library automates many steps of a machine learning project, such as:

Defining the data transforms to perform (setup())
Evaluating and comparing standard models (compare_models())
Tuning model hyperparameters (tune_model())

As well as many more features not limited to creating ensembles, saving models, and deploying models.

The PyCaret library has a wealth of documentation for using the API; you can get started here:

PyCaret Homepage

We will not explore all of the features of the library in this tutorial; instead, we will focus on simple machine learning model comparison and hyperparameter tuning.

You can install PyCaret using your Python package manager, such as pip. For example:

pip install pycaret

1	pip install pycaret

Once installed, we can confirm that the library is available in your development environment and is working correctly by printing the installed version.

# check pycaret version
import pycaret
print('PyCaret: %s' % pycaret.__version__)

# check pycaret version

import pycaret

print('PyCaret: %s' % pycaret.__version__)

Running the example will load the PyCaret library and print the installed version number.

Your version number should be the same or higher.

PyCaret: 2.0.0

1	PyCaret: 2.0.0

If you need help installing PyCaret for your system, you can see the installation instructions here:

PyCaret Installation Instructions

Now that we are familiar with what PyCaret is, let’s explore how we might use it on a machine learning project.

Sonar Dataset

We will use the Sonar standard binary classification dataset. You can learn more about it here:

We can download the dataset directly from the URL and load it as a Pandas DataFrame.

...
# define the location of the dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'
# load the dataset
df = read_csv(url, header=None)
# summarize the shape of the dataset
print(df.shape)

...

# define the location of the dataset

url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'

# load the dataset

df = read_csv(url, header=None)

# summarize the shape of the dataset

print(df.shape)

The PyCaret seems to require that a dataset has column names, and our dataset does not have column names, so we can set the column number as the column name directly.

...
# set column names as the column number
n_cols = df.shape[1]
df.columns = [str(i) for i in range(n_cols)]

...

# set column names as the column number

n_cols = df.shape[1]

df.columns = [str(i) for i in range(n_cols)]

Finally, we can summarize the first few rows of data.

...
# summarize the first few rows of data
print(df.head())

...

# summarize the first few rows of data

print(df.head())

Tying this together, the complete example of loading and summarizing the Sonar dataset is listed below.

# load the sonar dataset
from pandas import read_csv
# define the location of the dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'
# load the dataset
df = read_csv(url, header=None)
# summarize the shape of the dataset
print(df.shape)
# set column names as the column number
n_cols = df.shape[1]
df.columns = [str(i) for i in range(n_cols)]
# summarize the first few rows of data
print(df.head())

# load the sonar dataset

from pandas import read_csv

# define the location of the dataset

url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'

# load the dataset

df = read_csv(url, header=None)

# summarize the shape of the dataset

print(df.shape)

# set column names as the column number

n_cols = df.shape[1]

df.columns = [str(i) for i in range(n_cols)]

# summarize the first few rows of data

print(df.head())

Running the example first loads the dataset and reports the shape, showing it has 208 rows and 61 columns.

The first five rows are then printed showing that the input variables are all numeric and the target variable is column “60” and has string labels.

(208, 61)
0 1 2 3 4 ... 56 57 58 59 60
0 0.0200 0.0371 0.0428 0.0207 0.0954 ... 0.0180 0.0084 0.0090 0.0032 R
1 0.0453 0.0523 0.0843 0.0689 0.1183 ... 0.0140 0.0049 0.0052 0.0044 R
2 0.0262 0.0582 0.1099 0.1083 0.0974 ... 0.0316 0.0164 0.0095 0.0078 R
3 0.0100 0.0171 0.0623 0.0205 0.0205 ... 0.0050 0.0044 0.0040 0.0117 R
4 0.0762 0.0666 0.0481 0.0394 0.0590 ... 0.0072 0.0048 0.0107 0.0094 R

(208, 61)

0 1 2 3 4 ... 56 57 58 59 60

0 0.0200 0.0371 0.0428 0.0207 0.0954 ... 0.0180 0.0084 0.0090 0.0032 R

1 0.0453 0.0523 0.0843 0.0689 0.1183 ... 0.0140 0.0049 0.0052 0.0044 R

2 0.0262 0.0582 0.1099 0.1083 0.0974 ... 0.0316 0.0164 0.0095 0.0078 R

3 0.0100 0.0171 0.0623 0.0205 0.0205 ... 0.0050 0.0044 0.0040 0.0117 R

4 0.0762 0.0666 0.0481 0.0394 0.0590 ... 0.0072 0.0048 0.0107 0.0094 R

Next, we can use PyCaret to evaluate and compare a suite of standard machine learning algorithms to quickly discover what works well on this dataset.

PyCaret for Comparing Machine Learning Models

In this section, we will evaluate and compare the performance of standard machine learning models on the Sonar classification dataset.

First, we must set the dataset with the PyCaret library via the setup() function. This requires that we provide the Pandas DataFrame and specify the name of the column that contains the target variable.

The setup() function also allows you to configure simple data preparation, such as scaling, power transforms, missing data handling, and PCA transforms.

We will specify the data, target variable, and turn off HTML output, verbose output, and requests for user feedback.

...
# setup the dataset
grid = setup(data=df, target=df.columns[-1], html=False, silent=True, verbose=False)

...

# setup the dataset

grid = setup(data=df, target=df.columns[-1], html=False, silent=True, verbose=False)

Next, we can compare standard machine learning models by calling the compare_models() function.

By default, it will evaluate models using 10-fold cross-validation, sort results by classification accuracy, and return the single best model.

These are good defaults, and we don’t need to change a thing.

...
# evaluate models and compare models
best = compare_models()

...

# evaluate models and compare models

best = compare_models()

Call the compare_models() function will also report a table of results summarizing all of the models that were evaluated and their performance.

Finally, we can report the best-performing model and its configuration.

Tying this together, the complete example of evaluating a suite of standard models on the Sonar classification dataset is listed below.

# compare machine learning algorithms on the sonar classification dataset
from pandas import read_csv
from pycaret.classification import setup
from pycaret.classification import compare_models
# define the location of the dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'
# load the dataset
df = read_csv(url, header=None)
# set column names as the column number
n_cols = df.shape[1]
df.columns = [str(i) for i in range(n_cols)]
# setup the dataset
grid = setup(data=df, target=df.columns[-1], html=False, silent=True, verbose=False)
# evaluate models and compare models
best = compare_models()
# report the best model
print(best)

# compare machine learning algorithms on the sonar classification dataset

from pandas import read_csv

from pycaret.classification import setup

from pycaret.classification import compare_models

# define the location of the dataset

url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'

# load the dataset

df = read_csv(url, header=None)

# set column names as the column number

n_cols = df.shape[1]

df.columns = [str(i) for i in range(n_cols)]

# setup the dataset

grid = setup(data=df, target=df.columns[-1], html=False, silent=True, verbose=False)

# evaluate models and compare models

best = compare_models()

# report the best model

print(best)

Running the example will load the dataset, configure the PyCaret library, evaluate a suite of standard models, and report the best model found for the dataset.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that the “Extra Trees Classifier” has the best accuracy on the dataset with a score of about 86.95 percent.

We can then see the configuration of the model that was used, which looks like it used default hyperparameter values.

                              Model  Accuracy     AUC  Recall   Prec.      F1  \
0            Extra Trees Classifier    0.8695  0.9497  0.8571  0.8778  0.8631
1               CatBoost Classifier    0.8695  0.9548  0.8143  0.9177  0.8508
2   Light Gradient Boosting Machine    0.8219  0.9096  0.8000  0.8327  0.8012
3      Gradient Boosting Classifier    0.8010  0.8801  0.7690  0.8110  0.7805
4              Ada Boost Classifier    0.8000  0.8474  0.7952  0.8071  0.7890
5            K Neighbors Classifier    0.7995  0.8613  0.7405  0.8276  0.7773
6         Extreme Gradient Boosting    0.7995  0.8934  0.7833  0.8095  0.7802
7          Random Forest Classifier    0.7662  0.8778  0.6976  0.8024  0.7345
8          Decision Tree Classifier    0.7533  0.7524  0.7119  0.7655  0.7213
9                  Ridge Classifier    0.7448  0.0000  0.6952  0.7574  0.7135
10                      Naive Bayes    0.7214  0.8159  0.8286  0.6700  0.7308
11              SVM - Linear Kernel    0.7181  0.0000  0.6286  0.7146  0.6309
12              Logistic Regression    0.7100  0.8104  0.6357  0.7263  0.6634
13     Linear Discriminant Analysis    0.6924  0.7510  0.6667  0.6762  0.6628
14  Quadratic Discriminant Analysis    0.5800  0.6308  0.1095  0.5000  0.1750

     Kappa     MCC  TT (Sec)
0   0.7383  0.7446    0.1415
1   0.7368  0.7552    1.9930
2   0.6410  0.6581    0.0134
3   0.5989  0.6090    0.1413
4   0.5979  0.6123    0.0726
5   0.5957  0.6038    0.0019
6   0.5970  0.6132    0.0287
7   0.5277  0.5438    0.1107
8   0.5028  0.5192    0.0035
9   0.4870  0.5003    0.0030
10  0.4488  0.4752    0.0019
11  0.4235  0.4609    0.0024
12  0.4143  0.4285    0.0059
13  0.3825  0.3927    0.0034
14  0.1172  0.1792    0.0033
ExtraTreesClassifier(bootstrap=False, ccp_alpha=0.0, class_weight=None,
                     criterion='gini', max_depth=None, max_features='auto',
                     max_leaf_nodes=None, max_samples=None,
                     min_impurity_decrease=0.0, min_impurity_split=None,
                     min_samples_leaf=1, min_samples_split=2,
                     min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
                     oob_score=False, random_state=2728, verbose=0,
                     warm_start=False)

Model Accuracy AUC Recall Prec. F1 \

0 Extra Trees Classifier 0.8695 0.9497 0.8571 0.8778 0.8631

1 CatBoost Classifier 0.8695 0.9548 0.8143 0.9177 0.8508

2 Light Gradient Boosting Machine 0.8219 0.9096 0.8000 0.8327 0.8012

3 Gradient Boosting Classifier 0.8010 0.8801 0.7690 0.8110 0.7805

4 Ada Boost Classifier 0.8000 0.8474 0.7952 0.8071 0.7890

5 K Neighbors Classifier 0.7995 0.8613 0.7405 0.8276 0.7773

6 Extreme Gradient Boosting 0.7995 0.8934 0.7833 0.8095 0.7802

7 Random Forest Classifier 0.7662 0.8778 0.6976 0.8024 0.7345

8 Decision Tree Classifier 0.7533 0.7524 0.7119 0.7655 0.7213

9 Ridge Classifier 0.7448 0.0000 0.6952 0.7574 0.7135

10 Naive Bayes 0.7214 0.8159 0.8286 0.6700 0.7308

11 SVM - Linear Kernel 0.7181 0.0000 0.6286 0.7146 0.6309

12 Logistic Regression 0.7100 0.8104 0.6357 0.7263 0.6634

13 Linear Discriminant Analysis 0.6924 0.7510 0.6667 0.6762 0.6628

14 Quadratic Discriminant Analysis 0.5800 0.6308 0.1095 0.5000 0.1750

Kappa MCC TT (Sec)

0 0.7383 0.7446 0.1415

1 0.7368 0.7552 1.9930

2 0.6410 0.6581 0.0134

3 0.5989 0.6090 0.1413

4 0.5979 0.6123 0.0726

5 0.5957 0.6038 0.0019

6 0.5970 0.6132 0.0287

7 0.5277 0.5438 0.1107

8 0.5028 0.5192 0.0035

9 0.4870 0.5003 0.0030

10 0.4488 0.4752 0.0019

11 0.4235 0.4609 0.0024

12 0.4143 0.4285 0.0059

13 0.3825 0.3927 0.0034

14 0.1172 0.1792 0.0033

ExtraTreesClassifier(bootstrap=False, ccp_alpha=0.0, class_weight=None,

criterion='gini', max_depth=None, max_features='auto',

max_leaf_nodes=None, max_samples=None,

min_impurity_decrease=0.0, min_impurity_split=None,

min_samples_leaf=1, min_samples_split=2,

min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,

oob_score=False, random_state=2728, verbose=0,

warm_start=False)

We could use this configuration directly and fit a model on the entire dataset and use it to make predictions on new data.

We can also use the table of results to get an idea of the types of models that perform well on the dataset, in this case, ensembles of decision trees.

Now that we are familiar with how to compare machine learning models using PyCaret, let’s look at how we might use the library to tune model hyperparameters.

Tuning Machine Learning Models

In this section, we will tune the hyperparameters of a machine learning model on the Sonar classification dataset.

We must load and set up the dataset as we did before when comparing models.

...
# setup the dataset
grid = setup(data=df, target=df.columns[-1], html=False, silent=True, verbose=False)

...

# setup the dataset

grid = setup(data=df, target=df.columns[-1], html=False, silent=True, verbose=False)

We can tune model hyperparameters using the tune_model() function in the PyCaret library.

The function takes an instance of the model to tune as input and knows what hyperparameters to tune automatically. A random search of model hyperparameters is performed and the total number of evaluations can be controlled via the “n_iter” argument.

By default, the function will optimize the ‘Accuracy‘ and will evaluate the performance of each configuration using 10-fold cross-validation, although this sensible default configuration can be changed.

We can perform a random search of the extra trees classifier as follows:

...
# tune model hyperparameters
best = tune_model(ExtraTreesClassifier(), n_iter=200)

...

# tune model hyperparameters

best = tune_model(ExtraTreesClassifier(), n_iter=200)

The function will return the best-performing model, which can be used directly or printed to determine the hyperparameters that were selected.

It will also print a table of the results for the best configuration across the number of folds in the k-fold cross-validation (e.g. 10 folds).

Tying this together, the complete example of tuning the hyperparameters of the extra trees classifier on the Sonar dataset is listed below.

# tune model hyperparameters on the sonar classification dataset
from pandas import read_csv
from sklearn.ensemble import ExtraTreesClassifier
from pycaret.classification import setup
from pycaret.classification import tune_model
# define the location of the dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'
# load the dataset
df = read_csv(url, header=None)
# set column names as the column number
n_cols = df.shape[1]
df.columns = [str(i) for i in range(n_cols)]
# setup the dataset
grid = setup(data=df, target=df.columns[-1], html=False, silent=True, verbose=False)
# tune model hyperparameters
best = tune_model(ExtraTreesClassifier(), n_iter=200, choose_better=True)
# report the best model
print(best)

# tune model hyperparameters on the sonar classification dataset

from pandas import read_csv

from sklearn.ensemble import ExtraTreesClassifier

from pycaret.classification import setup

from pycaret.classification import tune_model

# define the location of the dataset

url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'

# load the dataset

df = read_csv(url, header=None)

# set column names as the column number

n_cols = df.shape[1]

df.columns = [str(i) for i in range(n_cols)]

# setup the dataset

grid = setup(data=df, target=df.columns[-1], html=False, silent=True, verbose=False)

# tune model hyperparameters

best = tune_model(ExtraTreesClassifier(), n_iter=200, choose_better=True)

# report the best model

print(best)

Running the example first loads the dataset and configures the PyCaret library.

A grid search is then performed reporting the performance of the best-performing configuration across the 10 folds of cross-validation and the mean accuracy.

In this case, we can see that the random search found a configuration with an accuracy of about 75.29 percent, which is not better than the default configuration from the previous section that achieved a score of about 86.95 percent.

      Accuracy     AUC  Recall   Prec.      F1   Kappa     MCC
0       0.8667  1.0000  1.0000  0.7778  0.8750  0.7368  0.7638
1       0.6667  0.8393  0.4286  0.7500  0.5455  0.3119  0.3425
2       0.6667  0.8036  0.2857  1.0000  0.4444  0.2991  0.4193
3       0.7333  0.7321  0.4286  1.0000  0.6000  0.4444  0.5345
4       0.6667  0.5714  0.2857  1.0000  0.4444  0.2991  0.4193
5       0.8571  0.8750  0.6667  1.0000  0.8000  0.6957  0.7303
6       0.8571  0.9583  0.6667  1.0000  0.8000  0.6957  0.7303
7       0.7857  0.8776  0.5714  1.0000  0.7273  0.5714  0.6325
8       0.6429  0.7959  0.2857  1.0000  0.4444  0.2857  0.4082
9       0.7857  0.8163  0.5714  1.0000  0.7273  0.5714  0.6325
Mean    0.7529  0.8270  0.5190  0.9528  0.6408  0.4911  0.5613
SD      0.0846  0.1132  0.2145  0.0946  0.1571  0.1753  0.1485
ExtraTreesClassifier(bootstrap=False, ccp_alpha=0.0, class_weight=None,
                     criterion='gini', max_depth=1, max_features='auto',
                     max_leaf_nodes=None, max_samples=None,
                     min_impurity_decrease=0.0, min_impurity_split=None,
                     min_samples_leaf=4, min_samples_split=2,
                     min_weight_fraction_leaf=0.0, n_estimators=120,
                     n_jobs=None, oob_score=False, random_state=None, verbose=0,
                     warm_start=False)

Accuracy AUC Recall Prec. F1 Kappa MCC

0 0.8667 1.0000 1.0000 0.7778 0.8750 0.7368 0.7638

1 0.6667 0.8393 0.4286 0.7500 0.5455 0.3119 0.3425

2 0.6667 0.8036 0.2857 1.0000 0.4444 0.2991 0.4193

3 0.7333 0.7321 0.4286 1.0000 0.6000 0.4444 0.5345

4 0.6667 0.5714 0.2857 1.0000 0.4444 0.2991 0.4193

5 0.8571 0.8750 0.6667 1.0000 0.8000 0.6957 0.7303

6 0.8571 0.9583 0.6667 1.0000 0.8000 0.6957 0.7303

7 0.7857 0.8776 0.5714 1.0000 0.7273 0.5714 0.6325

8 0.6429 0.7959 0.2857 1.0000 0.4444 0.2857 0.4082

9 0.7857 0.8163 0.5714 1.0000 0.7273 0.5714 0.6325

Mean 0.7529 0.8270 0.5190 0.9528 0.6408 0.4911 0.5613

SD 0.0846 0.1132 0.2145 0.0946 0.1571 0.1753 0.1485

ExtraTreesClassifier(bootstrap=False, ccp_alpha=0.0, class_weight=None,

criterion='gini', max_depth=1, max_features='auto',

max_leaf_nodes=None, max_samples=None,

min_impurity_decrease=0.0, min_impurity_split=None,

min_samples_leaf=4, min_samples_split=2,

min_weight_fraction_leaf=0.0, n_estimators=120,

n_jobs=None, oob_score=False, random_state=None, verbose=0,

warm_start=False)

We might be able to improve upon the grid search by specifying to the tune_model() function what hyperparameters to search and what ranges to search.

Summary

In this tutorial, you discovered the PyCaret Python open source library for machine learning.

Specifically, you learned:

PyCaret is a Python version of the popular and widely used caret machine learning package in R.
How to use PyCaret to easily evaluate and compare standard machine learning models on a dataset.
How to use PyCaret to easily tune the hyperparameters of a well-performing machine learning model.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

15 Responses to A Gentle Introduction to PyCaret for Machine Learning

Faisal Malik November 20, 2020 at 5:29 am #

PyCaret is really a great boon for the beginners and lazy programmers. Thanks for sharing.

Reply
- Jason Brownlee November 20, 2020 at 6:47 am #
  
  You’re welcome!
  
  Reply
Dan November 20, 2020 at 1:49 pm #

Hi Jason,
Love your work! I’m new to pycaret, so how can we use the above with training and test data? Thank you!

Reply
- rajat November 20, 2020 at 11:31 pm #
  
  #split data into 95% and 5%
  
  data = dataset.sample(frac=0.95, random_state=786).reset_index(drop=True)
  data_unseen = dataset.drop(data.index).reset_index(drop=True)
  
  Reply
- Jason Brownlee November 21, 2020 at 6:37 am #
  
  Thanks!
  
  Good question, perhaps you can review the documentation for the setup() function and see how to specify a custom train/test set:
  https://pycaret.org/classification/
  
  Reply
Bartosz November 20, 2020 at 7:26 pm #

In my case the LGBMClassifier won with the accuracy of 0.8552. I wonder how we can check the value of hyperparamaters before tuning?

Reply
- Jason Brownlee November 21, 2020 at 6:39 am #
  
  You can review the documentation for the default model hyperparameters, or print the model object to see what hyperparameters were used.
  
  Reply
Sandeep S Ramesh November 21, 2020 at 3:04 pm #

Hi Jason, thanks for sharing this nice quickstart guide for pycaret. I would like to know if pycaret will be able to handle large datasets that cannot be held as one single pandas dataframe. (I.e big data )?

Reply
- Jason Brownlee November 22, 2020 at 6:52 am #
  
  You’re welcome.
  
  I don’t believe so, but perhaps check the documentation.
  
  Reply
Thibault November 25, 2020 at 8:58 pm #

Thank you Jason for sharing. I have had consistently worse results with their tuning feature, which is a bit opaque. I struggle to think that it is not a bug.

Also, I still like to do a lot of my own data preparation with sklearn as I still feel that pycaret lacks transparency and features. But do you think we’re heading towards such landscapes where these frameworks automate all we old timer data scientists used to do from the ground up?

Reply
- Jason Brownlee November 26, 2020 at 6:32 am #
  
  I like to do it myself as well.
  
  Yes, I think automl tools and framework start replacing manual work for in-memory datasets.
  
  Reply
Iman December 5, 2020 at 4:42 am #

Hello Jason,
Thanks for providing this useful information. Really nice!!

As I see, the Pycaret provide the final results in term of ACC, AUC and ..for classification or MAE for regression tasks.
Is there any option to have the predicted labels (in classification) and predicted values (in regression)?
Thank you,
Iman

Reply
- Jason Brownlee December 5, 2020 at 8:12 am #
  
  You’re welcome.
  
  I don’t know, sorry.
  
  Reply
Bob Hoyt July 26, 2021 at 8:04 am #

Many machine learning programs generate performance measures for both of the binary classes e.g. heart disease and no heart disease. In the case of PyCaret, you set the target = target column but I don’t see a way to predict only one class. In the field of medicine, you most commonly want to predict the minority class. When you look at a ROC curve in PyCaret class = 0 and class =1 always have the same AUC. What am I missing?

Reply
- Jason Brownlee July 27, 2021 at 5:03 am #
  
  ROC Curves and AUC is always for the positive class. This may help:
  https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-imbalanced-classification/
  
  Reply

Navigation

A Gentle Introduction to PyCaret for Machine Learning

Tutorial Overview

What Is PyCaret?

Sonar Dataset

PyCaret for Comparing Machine Learning Models

Tuning Machine Learning Models

Further Reading

Summary

Discover Fast Machine Learning in Python!

Develop Your Own Models in Minutes

Finally Bring Machine Learning To
Your Own Projects

More On This Topic

15 Responses to A Gentle Introduction to PyCaret for Machine Learning

Leave a Reply Click here to cancel reply.

Navigation

Tutorial Overview

What Is PyCaret?

Sonar Dataset

PyCaret for Comparing Machine Learning Models

Tuning Machine Learning Models

Further Reading

Summary

Discover Fast Machine Learning in Python!

Develop Your Own Models in Minutes

Finally Bring Machine Learning To Your Own Projects

More On This Topic

15 Responses to A Gentle Introduction to PyCaret for Machine Learning

Leave a Reply Click here to cancel reply.

Finally Bring Machine Learning To
Your Own Projects