How To Load Your Machine Learning Data Into R

You need to be able to load data into R when working on a machine learning problem.

In this short post, you will discover how you can load your data files into R and start your machine learning project.

Kick-start your project with my new book Machine Learning Mastery With R, including step-by-step tutorials and the R source code files for all examples.

Let’s get started.

Load Your Machine Learning Data Into R

Load Your Machine Learning Data Into R
Photo by Paul Miller, some rights reserved.

Access To Your Data

The most common way to work with data in machine learning is in data files.

Data may originally be stored in all manner of formats and diverse locations. For example:

  • Relational database tables
  • XML files
  • JSON files
  • Fixed-width formatted file
  • Spreadsheet file (e.g. Microsoft Office)

You need to consolidate your data into a single file with rows and columns before you can work with it on a machine learning project. The standard format for representing a machine learning dataset is a CSV file. This is because machine learning algorithms, for the most part, work with data in tabular format (e.g. a matrix or input and output vectors).

Datasets in R are often represented as a matrix or data frame structure.

The first step of a machine learning project in R is loading your data into R as a matrix or data frame.

Need more Help with R for Machine Learning?

Take my free 14-day email course and discover how to use R on your project (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Load CSV Data Files In R

This section provides recipes that you can copy into your own machine learning projects and adapt to load data into R.

Load Data From CSV File

This example shows the loading of the iris dataset from a CSV file. This recipe will load a CSV file without a header (e.g. column names) located in the current directory into R as a data frame.

Running this recipe, you will see:

This recipe is useful if you want to store the data locally with your R scripts, such as in a project managed under revision control.

If the data is not in your local directory, you can either:

  1. Specify the full path to the dataset on your local environment.
  2. Use the setwd() function to set your current working directory to where the dataset is located

Load Data From CSV URL

This example shows the loading of the iris data from a CSV file located on the UCI Machine Learning Repository. This recipe will load a CSV file without a header from a URL into R as a data frame.

Running this recipe, you will see:

This recipe is useful if your dataset is stored on a server, such as on your GitHub account. It is also useful if you want to use datasets from the UCI Machine Learning Repository but do not want to store them locally.

Data In Other Formats

You may have data stored in format other than CSV.

I would recommend that you use standard tools and libraries to convert it to CSV format before working with the data in R. Once converted, you can then use the recipes above to work with it.

Summary

In this short post, you discovered how you can load your data into R.

You learned two recipes for loading data:

  1. Load data from a local CSV file.
  2. Load data from a CSV file located on a server.

Next Step

Did you try out these recipes?

  1. Start your R interactive environment.
  2. Type or copy-and-paste the recipes above and try them out.
  3. Use the built-in help in R to learn more about the functions used.

Do you have a question. Ask it in the comments and I will do my best to answer it.

Discover Faster Machine Learning in R!

Master Machine Learning With R

Develop Your Own Models in Minutes

...with just a few lines of R code

Discover how in my new Ebook:
Machine Learning Mastery With R

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more...

Finally Bring Machine Learning To Your Own Projects

Skip the Academics. Just Results.

See What's Inside

5 Responses to How To Load Your Machine Learning Data Into R

  1. Avatar
    John February 10, 2016 at 5:54 pm #

    Thanks for your post, it was really good. Here’s my input get best Process View / Machine View monitoring system from us.

  2. Avatar
    Andrea February 8, 2017 at 5:45 am #

    I’m happy with your skip-the-academics approach (and glad to know I don’t have to break down and get that PhD either!)

    But here’s an even more obvious question that I think you should answer on this page: how do you load the sample data that’s already packaged with R (e.g. the “iris” dataset) into your workspace and look at it? I just spent several embarrassing hours tracking down the answer to that question, because most sites don’t cover such preliminary, nuts-and-bolts stuff.

  3. Avatar
    max March 30, 2017 at 5:16 am #

    For the Iris dataset tne step are shown. Is the steps the same for the wine data set? Please help, how to implement knn on the dataset then.

Leave a Reply