An Introduction to R

R is a programming language of its kind. It is a language for statistics, and its ecosystem has a lot of libraries for all kinds of statistical tasks. It is a language targeted the statisticians rather than computer scientists. Hence you will see some unorthodox patterns in the language. In this post, you will learn about R. Specifically, you will learn

  • Why R is important
  • How to install R in your computer

Let’s get started.

An Introduction to R.
Photo by Naser Tamimi. Some rights reserved.

 

Overview

This post is divided into three parts; they are:

  • What is R?
  • Getting Started with R
  • Data in R

What is R?

R is a programming language for statistical computing and graphics. As famous as the language itself, a software called RStudio works as an IDE on top of R to allow you to visualize the computation. R is a free and open-source software, and it is available for Windows, macOS, and Linux. R has been widely used by statisticians, data scientists, and machine learning engineers since the rich ecosystem in R supports virtually all kinds of statistical analysis.

The syntax of R is simple. However, it may not be intuitive to people who come from other programming languages. The syntax of R is close to mathematical formulas. One of the most famous syntax is probably the assignment:

This is to assign the value 42 to a variable X. The use of <- is to reflect how some mathematics literature denotes assignment that intentionally to avoid the confusion with “=” for equality comparison. Note, however, that using = for assignment is allowed in later versions of R but conventionally, we stay with <-.

Over the years, R has developed a lot of language features that you can consider as a general programming language. But R also amassed a lot of statisticians as their go-to language. Therefore, you will find it powerful in performing statistical analyses because of the packages and libraries available. Examples of analyses include:

  • Linear regression
  • Logistic regression
  • Time series analysis
  • Hypothesis testing
  • Data mining

To learn, you can refer to the R documentation.

Getting Started with R

You will need to install it on your computer to get started with R. You can download R from the R Project website:

  • https://www.r-project.org/

which installers for Mac and Windows are available. For Linux, usually R is available as a package in your distribution. For example, in Debian and Ubuntu, you can run the following command to install R:

Once you have installed R, you can open it by typing R (uppercase) in a terminal window. You can type

to quit R.

In addition, you can download and install RStudio from the RStudio website

It is a GUI that allows you easier to write R code, debug R code, and visualize R data.

Screenshot of RStudio

Once you are in R, you will be at the R prompt. This is where you can type R commands. To learn more about R commands, you can type help() at the R prompt. But a simple example of an R command would be the following:

This command will add 2 and 2 and print the result, which is 4. Vectors (lists) are also native in R, so in a similar sense, you can also do vector addition:

which R returns

Data in R

R can work with data in a variety of formats. The most common data formats in R are:

  • Lists: A list is a collection of data objects.
  • Matrices: A matrix is a rectangular array of data.
  • Data frames: A data frame is a table of data with rows and columns.

For example, if you have a table of data saved in the format of a CSV file, you can use the read.csv() function in R to read the data into a data frame named df:

This syntax is like pandas if you learned the Python counterpart before. Note that read.csv() is a function where the dot . is just a character in the name of the function. Making dot a legitimate character for an identifier is another property of R that is different from other programming languages. As another example, below is how we create a matrix in R:

where 1:9 is R’s shortcut to create a list of ascending numbers, and the matrix() function above reformat the list into a matrix of three columns. With this, computing the 3×3 matrix’s determinant using R has a function built-in:

You can always learn more about a function in R using help(det) or search for a particular function with help.search(keyword).

Further Readings

Below are some resources that may be helpful for you to learn about the above topics:

Web site

Books

Summary

R is a powerful programming language for statistical computing and graphics. It is free and open-source software with a large community of users. R is a great language to learn if you are interested in data science.

No comments yet.

Leave a Reply