A Gentle Introduction to Lists and Data Frames in R

Vectors in R are supposed to be of homogeneous data type. You can use a list as the container if there are mixed data types, such as numbers and strings. The list and data frame are closely related in R. The data frame is probably more useful because it reflects how we usually collect statistics. In this post, you will learn about them. Specifically, you will know:

  • What are lists and data frames in R
  • How to manipulate lists and data frames

Let’s get started.

A Gentle Introduction to Lists and Data Frames in R
Photo by Pine Watt. Some rights reserved.

Overview

This post is divided into three parts; they are:

  • Lists in R
  • Data Frames in R
  • Saving and Loading Data Frames

Lists in R

A list in R is very similar to a list in Python in that arbitrary elements can be put in a sequential container. An example is as follows:

Here the list has four elements of data types character, numeric, and logical. To access one element of a list, you need to use the double square bracket syntax:

But surely, this is not very readable because there must be some reason you want to put multiple data into a list. You can build a list with names as follows:

Once you have a list with names, you can access the element with the “$” operator.

Data Frames in R

Lists in R can hold anything. But if each element in a list is a vector of equal length, you can make it into a data frame as well. An example is as follows:

As you can see above, data frame is like a table of data. Hence it expects all columns you specified to have the same number of rows. If you are familiar with pandas in Python, you will recall that pandas also have data frames. And similar to the syntax with pandas, you can filter a data frame in R:

Note the trailing comma. That’s required to filter on rows.

An alternative syntax would be to use the subset() function:

The “days” is known as an attribute in the data frame “x” when you use it in the function subset() as above. Adding new column to the data frame is intuitive, like pandas:

But deleting a column would mean you select the remaining columns in the data frame. The subset() function in R can also be used:

You can provide a list of column names to select argument, or the negation of the removed column names to it.

Columns in a data frame is a list. Hence you can modify the column name as modifying an element:

Saving and Loading Data Frames

Data frames as objects in R can be saved as a file. For example,

will create a file containing the variable name “x” and its content. You can put multiple variables into the first argument of the dump() function. To load it back, you use

this will recreate the variables in the current space.

Another way of saving and loading is to consider one object at a time without storing the name of the variable:

You saved “x” into a file and loaded it back into the variable “x.restored”.

These functions are suitable for generic data objects. For data frame, because it has a tabular structure, you may prefer to save it as a CSV file so you can reuse it with other software, such as Excel. Saving a data frame into CSV is as follows:

The variable iris is a built-in data frame with R to hold the famous iris dataset. The output file irisdata.csv is as follows:

To load it back, you use:

Further Readings

You can learn more about the above topics from the following:

Web site

Books

As exercises, you can try:

  • Save the CO2 data frame into a CSV file and load it back, where “CO2” is a built-in data frame with R
  • In one line of R command, extract the “conc” column of the CO2 data frame into a vector and find the mean value (answer: 435)
  • In one line of R command, among the rows of the CO2 data frame where “conc” is below 200, find the mean of “conc” (answer: 135)

Summary

In this post, you learned how to deal with one of the most important data objects in R, the data frame. Specifically, you learned:

  • A list is a container for heterogeneous data types; optionally, you can assign names to each element.
  • A data frame is a list with an equal number of “rows” on each attribute column.
  • You can manipulate data frames by filtering on rows and operating on columns.

No comments yet.

Leave a Reply