You need to be able to load data into R when working on a machine learning problem.
In this short post, you will discover how you can load your data files into R and start your machine learning project.
Kick-start your project with my new book Machine Learning Mastery With R, including step-by-step tutorials and the R source code files for all examples.
Let’s get started.

Load Your Machine Learning Data Into R
Photo by Paul Miller, some rights reserved.
Access To Your Data
The most common way to work with data in machine learning is in data files.
Data may originally be stored in all manner of formats and diverse locations. For example:
- Relational database tables
- XML files
- JSON files
- Fixed-width formatted file
- Spreadsheet file (e.g. Microsoft Office)
You need to consolidate your data into a single file with rows and columns before you can work with it on a machine learning project. The standard format for representing a machine learning dataset is a CSV file. This is because machine learning algorithms, for the most part, work with data in tabular format (e.g. a matrix or input and output vectors).
Datasets in R are often represented as a matrix or data frame structure.
The first step of a machine learning project in R is loading your data into R as a matrix or data frame.
Need more Help with R for Machine Learning?
Take my free 14-day email course and discover how to use R on your project (with sample code).
Click to sign-up and also get a free PDF Ebook version of the course.
Load CSV Data Files In R
This section provides recipes that you can copy into your own machine learning projects and adapt to load data into R.
Load Data From CSV File
This example shows the loading of the iris dataset from a CSV file. This recipe will load a CSV file without a header (e.g. column names) located in the current directory into R as a data frame.
1 2 3 4 5 6 |
# define the filename filename <- "iris.csv" # load the CSV file from the local directory dataset <- read.csv(filename, header=FALSE) # preview the first 5 rows head(dataset) |
Running this recipe, you will see:
1 2 3 4 5 6 7 |
V1 V2 V3 V4 V5 1 5.1 3.5 1.4 0.2 Iris-setosa 2 4.9 3.0 1.4 0.2 Iris-setosa 3 4.7 3.2 1.3 0.2 Iris-setosa 4 4.6 3.1 1.5 0.2 Iris-setosa 5 5.0 3.6 1.4 0.2 Iris-setosa 6 5.4 3.9 1.7 0.4 Iris-setosa |
This recipe is useful if you want to store the data locally with your R scripts, such as in a project managed under revision control.
If the data is not in your local directory, you can either:
- Specify the full path to the dataset on your local environment.
- Use the setwd() function to set your current working directory to where the dataset is located
Load Data From CSV URL
This example shows the loading of the iris data from a CSV file located on the UCI Machine Learning Repository. This recipe will load a CSV file without a header from a URL into R as a data frame.
1 2 3 4 5 6 7 8 9 10 11 12 |
# load the library library(RCurl) # specify the URL for the Iris data CSV urlfile <-'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' # download the file downloaded <- getURL(urlfile, ssl.verifypeer=FALSE) # treat the text data as a steam so we can read from it connection <- textConnection(downloaded) # parse the downloaded data as CSV dataset <- read.csv(connection, header=FALSE) # preview the first 5 rows head(dataset) |
Running this recipe, you will see:
1 2 3 4 5 6 7 |
V1 V2 V3 V4 V5 1 5.1 3.5 1.4 0.2 Iris-setosa 2 4.9 3.0 1.4 0.2 Iris-setosa 3 4.7 3.2 1.3 0.2 Iris-setosa 4 4.6 3.1 1.5 0.2 Iris-setosa 5 5.0 3.6 1.4 0.2 Iris-setosa 6 5.4 3.9 1.7 0.4 Iris-setosa |
This recipe is useful if your dataset is stored on a server, such as on your GitHub account. It is also useful if you want to use datasets from the UCI Machine Learning Repository but do not want to store them locally.
Data In Other Formats
You may have data stored in format other than CSV.
I would recommend that you use standard tools and libraries to convert it to CSV format before working with the data in R. Once converted, you can then use the recipes above to work with it.
Summary
In this short post, you discovered how you can load your data into R.
You learned two recipes for loading data:
- Load data from a local CSV file.
- Load data from a CSV file located on a server.
Next Step
Did you try out these recipes?
- Start your R interactive environment.
- Type or copy-and-paste the recipes above and try them out.
- Use the built-in help in R to learn more about the functions used.
Do you have a question. Ask it in the comments and I will do my best to answer it.
Thanks for your post, it was really good. Here’s my input get best Process View / Machine View monitoring system from us.
You’re welcome John.
I’m happy with your skip-the-academics approach (and glad to know I don’t have to break down and get that PhD either!)
But here’s an even more obvious question that I think you should answer on this page: how do you load the sample data that’s already packaged with R (e.g. the “iris” dataset) into your workspace and look at it? I just spent several embarrassing hours tracking down the answer to that question, because most sites don’t cover such preliminary, nuts-and-bolts stuff.
I cover how to work with built-in datasets in R in this post (including iris):
https://machinelearningmastery.com/machine-learning-datasets-in-r/
I hope that helps
For the Iris dataset tne step are shown. Is the steps the same for the wine data set? Please help, how to implement knn on the dataset then.