[New Book] Click to get The Beginner's Guide to Data Science!
Use the offer code 20offearlybird to get 20% off. Hurry, sale ends soon!

Radius Neighbors Classifier Algorithm With Python

Radius Neighbors Classifier is a classification machine learning algorithm.

It is an extension to the k-nearest neighbors algorithm that makes predictions using all examples in the radius of a new example rather than the k-closest neighbors.

As such, the radius-based approach to selecting neighbors is more appropriate for sparse data, preventing examples that are far away in the feature space from contributing to a prediction.

In this tutorial, you will discover the Radius Neighbors Classifier classification machine learning algorithm.

After completing this tutorial, you will know:

  • The Nearest Radius Neighbors Classifier is a simple extension of the k-nearest neighbors classification algorithm.
  • How to fit, evaluate, and make predictions with the Radius Neighbors Classifier model with Scikit-Learn.
  • How to tune the hyperparameters of the Radius Neighbors Classifier algorithm on a given dataset.

Let’s get started.

Radius Neighbors Classifier Algorithm With Python

Radius Neighbors Classifier Algorithm With Python
Photo by J. Triepke, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

  1. Radius Neighbors Classifier
  2. Radius Neighbors Classifier With Scikit-Learn
  3. Tune Radius Neighbors Classifier Hyperparameters

Radius Neighbors Classifier

Radius Neighbors is a classification machine learning algorithm.

It is based on the k-nearest neighbors algorithm, or kNN. kNN involves taking the entire training dataset and storing it. Then, at prediction time, the k-closest examples in the training dataset are located for each new example for which we want to predict. The mode (most common value) class label from the k neighbors is then assigned to the new example.

For more on the k-nearest neighbours algorithm, see the tutorial:

The Radius Neighbors Classifier is similar in that training involves storing the entire training dataset. The way that the training dataset is used during prediction is different.

Instead of locating the k-neighbors, the Radius Neighbors Classifier locates all examples in the training dataset that are within a given radius of the new example. The radius neighbors are then used to make a prediction for the new example.

The radius is defined in the feature space and generally assumes that the input variables are numeric and scaled to the range 0-1, e.g. normalized.

The radius-based approach to locating neighbors is appropriate for those datasets where it is desirable for the contribution of neighbors to be proportional to the density of examples in the feature space.

Given a fixed radius, dense regions of the feature space will contribute more information and sparse regions will contribute less information. It is this latter case that is most desirable and it prevents examples very far in feature space from the new example from contributing to the prediction.

As such, the Radius Neighbors Classifier may be more appropriate for prediction problems where there are sparse regions of the feature space.

Given that the radius is fixed in all dimensions of the feature space, it will become less effective as the number of input features is increased, which causes examples in the feature space to spread further and further apart. This property is referred to as the curse of dimensionality.

Radius Neighbors Classifier With Scikit-Learn

The Radius Neighbors Classifier is available in the scikit-learn Python machine learning library via the RadiusNeighborsClassifier class.

The class allows you to specify the size of the radius used when making a prediction via the “radius” argument, which defaults to 1.0.

Another important hyperparameter is the “weights” argument that controls whether neighbors contribute to the prediction in a ‘uniform‘ manner or inverse to the distance (‘distance‘) from the example. Uniform weight is used by default.

We can demonstrate the Radius Neighbors Classifier with a worked example.

First, let’s define a synthetic classification dataset.

We will use the make_classification() function to create a dataset with 1,000 examples, each with 20 input variables.

The example below creates and summarizes the dataset.

Running the example creates the dataset and confirms the number of rows and columns of the dataset.

We can fit and evaluate a Radius Neighbors Classifier model using repeated stratified k-fold cross-validation via the RepeatedStratifiedKFold class. We will use 10 folds and three repeats in the test harness.

We will use the default configuration.

It is important that the feature space is scaled prior to preparing and using the model.

We can achieve this by using the MinMaxScaler to normalize the input features and use a Pipeline to first apply the scaling, then use the model.

The complete example of evaluating the Radius Neighbors Classifier model for the synthetic binary classification task is listed below.

Running the example evaluates the Radius Neighbors Classifier algorithm on the synthetic dataset and reports the average accuracy across the three repeats of 10-fold cross-validation.

Your specific results may vary given the stochastic nature of the learning algorithm. Consider running the example a few times.

In this case, we can see that the model achieved a mean accuracy of about 75.4 percent.

We may decide to use the Radius Neighbors Classifier as our final model and make predictions on new data.

This can be achieved by fitting the model pipeline on all available data and calling the predict() function passing in a new row of data.

We can demonstrate this with a complete example listed below.

Running the example fits the model and makes a class label prediction for a new row of data.

Next, we can look at configuring the model hyperparameters.

Tune Radius Neighbors Classifier Hyperparameters

The hyperparameters for the Radius Neighbors Classifier method must be configured for your specific dataset.

Perhaps the most important hyperparameter is the radius controlled via the “radius” argument. It is a good idea to test a range of values, perhaps around the value of 1.0.

We will explore values between 0.8 and 1.5 with a grid of 0.01 on our synthetic dataset.

Note that we are grid searching the “radius” hyperparameter of the RadiusNeighborsClassifier within the Pipeline where the model is named “model” and, therefore, the radius parameter is accessed via model->radius with a double underscore (__) separator, e.g. “model__radius“.

The example below demonstrates this using the GridSearchCV class with a grid of values we have defined.

Running the example will evaluate each combination of configurations using repeated cross-validation.

Your specific results may vary given the stochastic nature of the learning algorithm. Try running the example a few times.

In this case, we can see that we achieved better results using a radius of 0.8 that gave an accuracy of about 87.2 percent compared to a radius of 1.0 in the previous example that gave an accuracy of about 75.4 percent.

Another key hyperparameter is the manner in which examples in the radius contribute to the prediction via the “weights” argument. This can be set to “uniform” (the default), “distance” for inverse distance, or a custom function.

We can test both of these built-in weightings and see which performs better with our radius of 0.8.

The complete example is listed below.

Running the example fits the model and discovers the hyperparameters that give the best results using cross-validation.

Your specific results may vary given the stochastic nature of the learning algorithm. Try running the example a few times.

In this case, we can see an additional lift in mean classification accuracy from about 87.2 percent with ‘uniform‘ weights in the previous example to about 89.3 percent with ‘distance‘ weights in this case.

Another metric that you might wish to explore is the distance metric used via the ‘metric‘ argument that defaults to ‘minkowski‘.

It might be interesting to compare results to ‘euclidean‘ distance and perhaps ‘cityblock‘.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Tutorials

Books

APIs

Articles

Summary

In this tutorial, you discovered the Radius Neighbors Classifier classification machine learning algorithm.

Specifically, you learned:

  • The Nearest Radius Neighbors Classifier is a simple extension of the k-nearest neighbors classification algorithm.
  • How to fit, evaluate, and make predictions with the Radius Neighbors Classifier model with Scikit-Learn.
  • How to tune the hyperparameters of the Radius Neighbors Classifier algorithm on a given dataset.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Discover Fast Machine Learning in Python!

Master Machine Learning With Python

Develop Your Own Models in Minutes

...with just a few lines of scikit-learn code

Learn how in my new Ebook:
Machine Learning Mastery With Python

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, modeling, tuning, and much more...

Finally Bring Machine Learning To
Your Own Projects

Skip the Academics. Just Results.

See What's Inside

2 Responses to Radius Neighbors Classifier Algorithm With Python

  1. Avatar
    Amul October 2, 2020 at 2:35 pm #

    Thanks for the article .
    Does this algorithm handles new class(untrained new class).

    • Avatar
      Jason Brownlee October 3, 2020 at 6:03 am #

      Thanks.

      No, not directly. Perhaps you can modify the algorithm for that case.

Leave a Reply