Super Fast Crash Course in R (for developers)

As a developer you can pick-up R super fast.

If you are already a developer, you don’t need to know much about a new language to be able to reading and understanding code snippets and writing your own small scripts and programs.

In this post you will discover the basic syntax, data structures and control structures that you need to know to start reading and writing R scripts.

Let’s get started.

R Crash Course For Developers

R Crash Course For Developers
Photo by hackNY.org, some rights reserved.

R Syntax is Different, But The Same

The syntax in R  looks confusing, but only to begin with.

It is an older LISP-style language inspired by an even older language (S). The assignment syntax is probably the strangest thing you will see. Assignment uses the arrow (<-) rather than a single equals (=).

R has all of your familiar control flow structures like if-the-else, for-loops and while loops.

You can create your own functions and libraries of helper functions for your scripts.

If you have done any scripting before, like JavaScript, Python, Ruby, BASH or similar, then you will pick up R very quickly.

You Can Already Program, Just Learn the R Syntax

As a developer, you already know how to program.

You can take a problem and think up the type of procedure and data structures you need. The language you are using is just a detail. You only need to map your idea of the solution onto the specifics of the language you are using.

This is how you can get started using R very quickly.

To get started, you need to know the absolute basics. Basics such as:

  • How do we assign data to variables?
  • How do we work with different data types?
  • How do we work with the data structures for handling data?
  • How do we use the standard flow control structures?
  • How do you work with functions and third-party packages?

You learn the answers to these questions by looking at code examples. You can then:

  • Map third party code you’re reading onto those examples to better understand them.
  • Pattern the code you write from scratch from the examples.

Let’s take a quick tour of the basic syntax of R

Get Started with Machine Learning in R, Right Now

Machine Learning Mastery With R Mini Course Table of Contents

R is the most popular platform among professional data scientists for applied machine learning.

Download your mini-course in Machine Learning with R.

Start Your FREE Mini-Course >> 

FREE 14-Day Mini-Course in
Machine Learning with R

Download your PDF containing all 14 lessons.

Get your daily lesson via email with tips and tricks.

R Crash Course For Developers (Start Here)

In this section we will take a quick look at the basic syntax used in R.

After reading (and ideally working through) the examples in this section, you will have enough background as a developer to start reading and understanding other peoples R code.

You will also have the confidence to start writing your own small R scripts.

The examples in this section are split into the following sections:

  1. Assignment
  2. Data Structures
  3. Flow Control
  4. Functions
  5. Packages

Start the R interactive environment (type R on the command line) and let’s get started.

1. Assignment

The key to assignment in R is the arrow operator (<-) for assignment.

Below are examples of assigning an integer, double, string and a boolean, and printing each out to the console in turn.

Remember, do not use equals (=) for assignment. It is the biggest mistake new R programmers make.

2. Data Structures

There three data structures that you will use the most in R:

  1. Vectors
  2. Lists
  3. Matrices
  4. Data Frames

Lists

Lists provide a group of named items, not unlike a map.

You can define a new list with the list() function. A list can be initialized with values or empty. Note that the named values in the list can be accessed using the dollar operator ($). Once referenced, they can be read or written. This is also how new items can be added to the list.

Vectors

Vectors are lists of data that can be the same or different types:

Notice that vectors are 1-index (indexes start at 1 not 0).

You will use the c() function a lot to concatenate variables into a vector.

Matrices

A matrix is a table of data. It has dimensions (rows and columns) and the columns can be named.

A lot of useful plotting and machine learning algorithms require the data to be provide as a matrix.

Note the syntax to index into rows [1,] and columns [,1] of a matrix.

Data Frame

Data frames are useful for actually representing tables of your data in R.

A matrix is much simpler structure, intended for mathematical operations. A data frame is more suited to representing a table of data and is expected by modern implementations of machine learning algorithms in R.

Note that you can index into rows and columns of a data frame just like you can for a matrix. Also note that you can reference a column using its name (df$years)

Some other data structures you could go on to learn about are lists and arrays.

3. Flow Control

R supports all the same flow control structures that you are used to.

  1. If-Then-Else
  2. For Loop
  3. While Loop

As a developer, these are all self explanatory.

If-Then-Else

For Loop

While Loop

4. Functions

Functions let you group code and call that code repeatedly with arguments.

The two main concerns with functions are:

  1. Calling Functions
  2. Help For Functions
  3. Writing Custom Functions

Call Functions

You have already used one function, the c() function for concatenating objects into a vector.

R has many built in functions and additional functions can be provided by installing and loading third-party packages.

Here is an example of using a statistical function to calculate the mean of a vector of numbers:

Help for Functions

You can help help with a function in R by using the question mark operator (?) followed by the function name.

Alternatively, you can call the help() function and pass the function name you need help with as an argument (e.g. help(mean)).

You can get example usage of a function by calling the example() function and passing the name of the function as an argument.

Custom Functions

You can define your own functions that may or may not take arguments or return a result.

Below is an example of a custom function to calculate and return the sum of three numbers:

5. Packages

Packages are the way that third party R code is distributed. The Comprehensive R Archive Network (CRAN) provides hosting and listing of third party R packages that you can download.

Install a Package

You can install a package hosted on CRAN by calling a function. It will then pop-up a dialog to ask you which mirror you would like to download the package from.

For example, here is how you can install the caret package which is very useful in machine learning:

Help For Package

A package can provide a lot of new functions. You can read up on a package on it’s CRAN page, but you can also get help for the package within R using the library function.

5 Things To Remember

Here are five quick tips to remember when getting started in R:

  • Assignment. R uses the arrow operator (<-) for assignment, not a single equals (=).
  • Case Sensitive. The R language is case sensitive, meaning that C() and c() are two different function calls.
  • Help. You can help on any operator or function using the help() function or the ? operator and help with packages using the double question mark operator (??).
  • How To Quit. You can exit the R interactive environment by calling the q() function.
  • Documentation. R installs with a lot of useful documentation. You can review it in the browser by typing: help.start()

Get a Reference Book

There are many great resources online for learning more about how to use R.

I recommend grabbing a good reference text and keeping it close by. I use and recommend R in a Nutshell.

Amazon Image

Summary

In this post you took a crash course in basic R syntax.

As a developer, you now know enough to read other peoples R scripts.

You also have the tools to start writing your own little scripts in the R interactive environment.

Next Step

Did you work through all of the examples?

  1. Start R.
  2. Work through the tutorial.
  3. Let me know how you went (leave a comment)

Do you have any questions? I there something else you would like covered?

Leave a comment and let me know.

Frustrated With Your Progress In R Machine Learning?

Develop Your Own Models and Predictions in Minutes

...with just a few lines of R code

Discover how in my new Ebook: Machine Learning Mastery With R

It covers self-study tutorials and end-to-end projects on topics like:
Loading data, visualization, build models, algorithm tuning, and much more...

Finally Bring Machine Learning To
Your Own Projects

Skip the Academics. Just Results.

Click to learn more.

16 Responses to Super Fast Crash Course in R (for developers)

  1. vikas January 25, 2016 at 4:47 pm #

    Thanks for quick reference. I would like to know, whether we should go with R or Python for ML, as we know that Python is fast and do not have memory issue. Please reply.

    • Jason Brownlee January 25, 2016 at 4:53 pm #

      Python is great because you can develop the models in the same language that you deploy them in.

      R has the most and the most powerful machine learning algorithms on the platform, but is more suited for R&D and one-off projects.

      I hope that helps.

  2. Manjunath TK August 30, 2016 at 4:28 pm #

    Thank you very much,it is very helpful for the beginners.

    • Jason Brownlee August 31, 2016 at 8:44 am #

      You’re welcome Manjunath. I want you and all visitors to become awesome at machine learning.

  3. Tim Ndlovu September 16, 2016 at 11:09 pm #

    Hello Jason,

    Can you please help with the following 3 questions:
    a) Some people say the parent environment is the enclosing environment but some say the parent environment is the function calling environment, which one is correct ?
    b) How can you achieve function overloading since you don’t have types ?
    c) How do modelling and graphing functions achieve dynamic scoping and why do they have to be dynamically scoped ?

    Thank you,
    Tim

    • Jason Brownlee September 17, 2016 at 9:32 am #

      Tim, I don’t know. I teach machine learning and not the finer points of R programming.

      You might be better served with a book on R programming, such as: The Art of R Programming: A Tour of Statistical Software Design

      • Tim Ndlovu September 22, 2016 at 6:26 am #

        Good evening Jason,
        Thank you for getting back to me, it’s well appreciated.
        I have consulted some books but non of them explain it to a level which gives me the ‘Aha’ stimulus. I will however check out the book you highlighted

  4. Mr y October 5, 2016 at 7:59 pm #

    Even ‘=’ also working as an assignment operator
    What is the real difference between arrow and = ?

  5. Sreejith November 3, 2016 at 4:52 pm #

    Jason
    I am doing my PhD in Clinical data mining using genetic algorithms.Which language will u suggest python R or matlab

    • Jason Brownlee November 4, 2016 at 9:04 am #

      Hi Sreejith,

      I think Python is great for programmers building systems.

      I think R is great for R&D and going deep into data/stats/models.

      I think matlab is great for learning but not for doing.

      • Sreejith November 5, 2016 at 9:29 pm #

        Thanks Jason

  6. Tounsi Youssef December 6, 2016 at 12:16 am #

    Thanks Jason

  7. Amit January 25, 2017 at 5:30 pm #

    can i get any small project on machine learning from you ?

Leave a Reply