As a developer you can pick-up R super fast.

If you are already a developer, you don’t need to know much about a new language to be able to reading and understanding code snippets and writing your own small scripts and programs.

In this post you will discover the basic syntax, data structures and control structures that you need to know to start reading and writing R scripts.

Let’s get started.

## R Syntax is Different, But The Same

The syntax in R looks confusing, but only to begin with.

It is an older LISP-style language inspired by an even older language (S). The assignment syntax is probably the strangest thing you will see. Assignment uses the arrow *(**<-**)* rather than a single equals (=).

R has all of your familiar control flow structures like if-the-else, for-loops and while loops.

You can create your own functions and libraries of helper functions for your scripts.

If you have done any scripting before, like JavaScript, Python, Ruby, BASH or similar, then you will pick up R very quickly.

## You Can Already Program, Just Learn the R Syntax

As a developer, you already know how to program.

You can take a problem and think up the type of procedure and data structures you need. The language you are using is just a detail. You only need to map your idea of the solution onto the specifics of the language you are using.

This is how you can get started using R very quickly.

To get started, you need to know the absolute basics. Basics such as:

- How do we assign data to variables?
- How do we work with different data types?
- How do we work with the data structures for handling data?
- How do we use the standard flow control structures?
- How do you work with functions and third-party packages?

You learn the answers to these questions by looking at code examples. You can then:

- Map third party code you’re reading onto those examples to better understand them.
- Pattern the code you write from scratch from the examples.

Let’s take a quick tour of the basic syntax of R

### Need more Help with R for Machine Learning?

Take my free 14-day email course and discover how to use R on your project (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

## R Crash Course For Developers (Start Here)

In this section we will take a quick look at the basic syntax used in R.

After reading (and ideally working through) the examples in this section, you will have enough background as a developer to start reading and understanding other peoples R code.

You will also have the confidence to start writing your own small R scripts.

The examples in this section are split into the following sections:

- Assignment
- Data Structures
- Flow Control
- Functions
- Packages

Start the R interactive environment (type R on the command line) and let’s get started.

### 1. Assignment

The key to assignment in R is the arrow operator (<-) for assignment.

Below are examples of assigning an integer, double, string and a boolean, and printing each out to the console in turn.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
> # integer > i <- 23 > i [1] 23 > # double > d <- 2.3 > d [1] 2.3 > # string > s <- 'hello world' > s [1] "hello world" > # boolean > b <- TRUE > b [1] TRUE |

Remember, do not use equals (=) for assignment. It is the biggest mistake new R programmers make.

### 2. Data Structures

There three data structures that you will use the most in R:

- Vectors
- Lists
- Matrices
- Data Frames

#### Lists

Lists provide a group of named items, not unlike a map.

1 2 3 4 5 6 7 8 |
# create a list of named items a <- list(aa=1, bb=2, cc=3) a a$aa # add a named item to a list a$dd=4 a |

You can define a new list with the list() function. A list can be initialized with values or empty. Note that the named values in the list can be accessed using the dollar operator ($). Once referenced, they can be read or written. This is also how new items can be added to the list.

#### Vectors

Vectors are lists of data that can be the same or different types:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
> # create a vector using the c() function > v <- c(98, 99, 100) > v [1] 98 99 100 > v[1:2] [1] 98 99 > # create a vector from a range of integers > r <- (1:10) > r [1] 1 2 3 4 5 6 7 8 9 10 > r[5:10] [1] 5 6 7 8 9 10 > # add a new item to the end of a vector > v <- c(1, 2, 3) > v[4] <- 4 > v [1] 1 2 3 4 |

Notice that vectors are 1-index (indexes start at 1 not 0).

You will use the *c()* function a lot to concatenate variables into a vector.

#### Matrices

A matrix is a table of data. It has dimensions (rows and columns) and the columns can be named.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# Create a 2-row, 3-column matrix with named headings > data <- c(1, 2, 3, 4, 5, 6) > headings <- list(NULL, c("a","b","c")) > m <- matrix(data, nrow=2, ncol=3, byrow=TRUE, dimnames=headings) > m a b c [1,] 1 2 3 [2,] 4 5 6 > m[1,] a b c 1 2 3 > m[,1] [1] 1 4 |

A lot of useful plotting and machine learning algorithms require the data to be provide as a matrix.

Note the syntax to index into rows [1,] and columns [,1] of a matrix.

#### Data Frame

Data frames are useful for actually representing tables of your data in R.

1 2 3 4 5 6 |
# create a new data frame years <- c(1980, 1985, 1990) scores <- c(34, 44, 83) df <- data.frame(years, scores) df[,1] df$years |

A matrix is much simpler structure, intended for mathematical operations. A data frame is more suited to representing a table of data and is expected by modern implementations of machine learning algorithms in R.

Note that you can index into rows and columns of a data frame just like you can for a matrix. Also note that you can reference a column using its name (*df$years*)

Some other data structures you could go on to learn about are lists and arrays.

### 3. Flow Control

R supports all the same flow control structures that you are used to.

- If-Then-Else
- For Loop
- While Loop

As a developer, these are all self explanatory.

#### If-Then-Else

1 2 3 4 5 6 7 8 9 |
# if then else a <- 66 if (a > 55) { print("a is more than 55") } else { print("A is less than or equal to 55") } [1] "a is more than 55" |

#### For Loop

1 2 3 4 5 6 7 8 9 10 11 |
# for loop mylist <- c(55, 66, 77, 88, 99) for (value in mylist) { print(value) } [1] 55 [1] 66 [1] 77 [1] 88 [1] 99 |

#### While Loop

1 2 3 4 5 6 7 8 |
# while loop a <- 100 while (a < 500) { a <- a + 100 } a [1] 500 |

### 4. Functions

Functions let you group code and call that code repeatedly with arguments.

The two main concerns with functions are:

- Calling Functions
- Help For Functions
- Writing Custom Functions

#### Call Functions

You have already used one function, the c() function for concatenating objects into a vector.

R has many built in functions and additional functions can be provided by installing and loading third-party packages.

Here is an example of using a statistical function to calculate the mean of a vector of numbers:

1 2 3 4 5 |
# call function to calculate the mean on a vector of integers numbers <- c(1, 2, 3, 4, 5, 6) mean(numbers) [1] 3.5 |

#### Help for Functions

You can help help with a function in R by using the question mark operator (?) followed by the function name.

1 2 3 |
# help with the mean() function ?mean help(mean) |

Alternatively, you can call the *help()* function and pass the function name you need help with as an argument (e.g. *help(mean)*).

You can get example usage of a function by calling the example() function and passing the name of the function as an argument.

1 2 |
# example usage of the mean function example(mean) |

#### Custom Functions

You can define your own functions that may or may not take arguments or return a result.

Below is an example of a custom function to calculate and return the sum of three numbers:

1 2 3 4 5 6 7 8 9 |
# define custom function mysum <- function(a, b, c) { sum <- a + b + c return(sum) } # call custom function mysum(1,2,3) [1] 6 |

### 5. Packages

Packages are the way that third party R code is distributed. The Comprehensive R Archive Network (CRAN) provides hosting and listing of third party R packages that you can download.

#### Install a Package

You can install a package hosted on CRAN by calling a function. It will then pop-up a dialog to ask you which mirror you would like to download the package from.

For example, here is how you can install the caret package which is very useful in machine learning:

1 2 3 4 |
# install the caret package install.packages("caret") # load the package library(caret) |

#### Help For Package

A package can provide a lot of new functions. You can read up on a package on it’s CRAN page, but you can also get help for the package within R using the library function.

1 2 |
# help for the caret package library(help="caret") |

## 5 Things To Remember

Here are five quick tips to remember when getting started in R:

**Assignment**. R uses the arrow operator (<-) for assignment, not a single equals (=).**Case Sensitive**. The R language is case sensitive, meaning that C() and c() are two different function calls.**Help**. You can help on any operator or function using the help() function or the ? operator and help with packages using the double question mark operator (??).**How To Quit**. You can exit the R interactive environment by calling the q() function.**Documentation**. R installs with a lot of useful documentation. You can review it in the browser by typing:*help.start()*

## Get a Reference Book

There are many great resources online for learning more about how to use R.

I recommend grabbing a good reference text and keeping it close by. I use and recommend R in a Nutshell.

## Summary

In this post you took a crash course in basic R syntax.

As a developer, you now know enough to read other peoples R scripts.

You also have the tools to start writing your own little scripts in the R interactive environment.

## Next Step

Did you work through all of the examples?

- Start R.
- Work through the tutorial.
- Let me know how you went (leave a comment)

Do you have any questions? I there something else you would like covered?

Leave a comment and let me know.

Thanks for quick reference. I would like to know, whether we should go with R or Python for ML, as we know that Python is fast and do not have memory issue. Please reply.

Python is great because you can develop the models in the same language that you deploy them in.

R has the most and the most powerful machine learning algorithms on the platform, but is more suited for R&D and one-off projects.

I hope that helps.

Thank you very much,it is very helpful for the beginners.

You’re welcome Manjunath. I want you and all visitors to become awesome at machine learning.

Hello Jason,

Can you please help with the following 3 questions:

a) Some people say the parent environment is the enclosing environment but some say the parent environment is the function calling environment, which one is correct ?

b) How can you achieve function overloading since you don’t have types ?

c) How do modelling and graphing functions achieve dynamic scoping and why do they have to be dynamically scoped ?

Thank you,

Tim

Tim, I don’t know. I teach machine learning and not the finer points of R programming.

You might be better served with a book on R programming, such as: The Art of R Programming: A Tour of Statistical Software Design

Good evening Jason,

Thank you for getting back to me, it’s well appreciated.

I have consulted some books but non of them explain it to a level which gives me the ‘Aha’ stimulus. I will however check out the book you highlighted

Even ‘=’ also working as an assignment operator

What is the real difference between arrow and = ?

Great question. The “=” can give you surprising behavior sometimes which is why I strongly suggest you avoid using it. Convention in R is “< -". This post may clear it up if you want tech details: http://blog.revolutionanalytics.com/2008/12/use-equals-or-arrow-for-assignment.html

Jason

I am doing my PhD in Clinical data mining using genetic algorithms.Which language will u suggest python R or matlab

Hi Sreejith,

I think Python is great for programmers building systems.

I think R is great for R&D and going deep into data/stats/models.

I think matlab is great for learning but not for doing.

Thanks Jason

Thanks Jason

I’m glad you found it useful Tounsi.

can i get any small project on machine learning from you ?

Here are some project ideas Amit:

http://machinelearningmastery.com/tour-of-real-world-machine-learning-problems/