Logic, Flow Control, and Functions in R

By Adrian Tam on August 28, 2023 in R for Data Science 0

R is a procedural programming language. Therefore, it has the full set of flow control syntax like many other languages. Indeed, the flow control syntax in R is similar to Java and C. In this post, you will see some examples of using the flow control syntax in R.

Let’s get started.

Logic, Flow Control, and Functions in R
Photo by Cris DiNoto. Some rights reserved.

Overview

This post is in three parts; they are:

Finding Primes
The Sieve of Eratosthenes
Sum of the Most Consecutive Primes

Finding Primes

Let’s start with a simple problem: Find the list of all primes below a certain number N.

The first prime is 2. Any integer larger than 2 is a prime if it is not divisible by any prime less than it. This is a simple definition. We can convert this into R program as follows:

# find all primes below a number
pmax <- 1000       # upper limit to find primes

# Initialize a vector to store the primes
primes <- c()

# Loop over all integers
for (i in 2:pmax) {
  # Check if the integer is divisible by any of the primes already found
  isPrime <- TRUE
  for (j in primes) {
    if (i %% j == 0) {
      isPrime <- FALSE
      break
    }
  }

  # If the integer is prime, add it to the primes vector
  if (isPrime) {
    primes <- c(primes, i)
  }
}

# Print the primes
print(primes)

# find all primes below a number

pmax <- 1000 # upper limit to find primes

# Initialize a vector to store the primes

primes <- c()

# Loop over all integers

for (i in 2:pmax) {

# Check if the integer is divisible by any of the primes already found

isPrime <- TRUE

for (j in primes) {

if (i %% j == 0) {

isPrime <- FALSE

break

}

# If the integer is prime, add it to the primes vector

if (isPrime) {

primes <- c(primes, i)

}

# Print the primes

print(primes)

If you can run it successfully, you will see the following output:

  [1]   2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61
 [19]  67  71  73  79  83  89  97 101 103 107 109 113 127 131 137 139 149 151
 [37] 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251
 [55] 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359
 [73] 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463
 [91] 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593
[109] 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701
[127] 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827
[145] 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953
[163] 967 971 977 983 991 997

[1] 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61

[19] 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151

[37] 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251

[55] 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359

[73] 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463

[91] 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593

[109] 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701

[127] 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827

[145] 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953

[163] 967 971 977 983 991 997

The algorithm of the above code is as follows: You scan from 2 until pmax (includes both ends) and for each number i, you use another for-loop to check if any existing prime j can divide the number in concern. If i %% j == 0, you know that i is not a prime. Hence you mark isPrime as FALSE and stop.

The primes are appended to the vector prime at the end of each iteration. This will hold all the primes below the upper limit when this program ends.

From the above, you see some basic R language features. Conditional branching in R has the syntax:

if (expression) {
    statement1
} else {
    statement2
}

if (expression) {

statement1

} else {

statement2

}

This syntax is like JavaScript, even that the semicolons to mark the end of each statement are optional.

The conditions are supposed to be Boolean. Hence we can use the logical variable isPrime above, or a comparison statemenet i %% j == 0. The operator %% is for modulus division. You can find the table of common R operators and their precedence as follows:

:: :::		access variables in a namespace
$ @		component / slot extraction
[ [[		indexing
^		exponentiation (right to left)
- +		unary minus and plus
:		sequence operator
%any% |>	special operators (including %% and %/%)
* /		multiply, divide
+ -		(binary) add, subtract
< > <= >= == !=	ordering and comparison
!		negation
& &&		and
| ||		or
~		as in formulae
-> ->>		rightwards assignment
<- <<-		assignment (right to left)
=		assignment (right to left)
?		help (unary and binary)

:: ::: access variables in a namespace

$ @ component / slot extraction

[ [[ indexing

^ exponentiation (right to left)

- + unary minus and plus

: sequence operator

%any% |> special operators (including %% and %/%)

* / multiply, divide

+ - (binary) add, subtract

< > <= >= == != ordering and comparison

! negation

& && and

| || or

~ as in formulae

-> ->> rightwards assignment

<- <<- assignment (right to left)

= assignment (right to left)

? help (unary and binary)

You can find this table in R using the help statement “?Syntax” with uppercase S in “Syntax”.

In C and Java, you may recall there’s a ternary operator “condition?value_true:value_false”. This is an operator because its use is limited to return a value (either value_true or value_false based on the truth value of the condition), rather than executing a large chunk of code. The similar can be found in R as a function:

ifelse(condition, value.true, value.false)

1	ifelse(condition, value.true, value.false)

But you should not confused with the if-else statement.

Furthermore, you can use nested if in the similar syntax as C or Java:

if (condition) {
    statement1
} else if (condition) {
    statement2
}

if (condition) {

statement1

} else if (condition) {

statement2

}

However, you do not have switch statement in R. Instead, switch() is a function with the syntax like the following:

y <- "fruit"
letseat <- switch(y, fruit="apple", veg="broccoli", "nothing")
cat(sprintf("Let's eat %s\n", letseat))

y <- "fruit"

letseat <- switch(y, fruit="apple", veg="broccoli", "nothing")

cat(sprintf("Let's eat %s\n", letseat))

You also see in the previous example how a for-loop in R is created: You need to provide a vector and the loop will scan the vector elements one by one. It is not required the for-loop is to iterate over integers, the code above is just an example.

When you’re in the loop, you can always terminate the loop early using the break statement, or start another iteration early using the next statement. Another example is as follows.

The Sieve of Eratosthenes

The previous example of finding prime is slow if you set the limit to a higher value (e.g., one million). A faster algorithm would be the Sieve of Eratosthenes, at the expense that slightly more memory would be used. The idea is to find one prime at a time, and upon a prime is found, all its multiples are excluded from the list of prime candidates.

The implementation of the Sieve of Eratosthenes in R is as follows:

# find primes using the Sieve of Eratosthenes

# Create a vector of all TRUE
pmax <- 1000
primality <- rep(TRUE, pmax)

# run the Sieve
primality[1] <- FALSE
for (i in 1:pmax) {
    if (!primality[i]) {
        next
    }
    if (i*i > pmax) {
        break
    }
    for (j in seq(i*i, pmax, by=i)) {
        primality[j] <- FALSE
    }
}

# find the indices that are TRUE
primes <- which(primality)
print(primes)

# find primes using the Sieve of Eratosthenes

# Create a vector of all TRUE

pmax <- 1000

primality <- rep(TRUE, pmax)

# run the Sieve

primality[1] <- FALSE

for (i in 1:pmax) {

if (!primality[i]) {

}

if (i*i > pmax) {

break

}

for (j in seq(i*i, pmax, by=i)) {

primality[j] <- FALSE

}

# find the indices that are TRUE

primes <- which(primality)

print(primes)

This code should produce the same output as the previous one.

In the code above, you see how you used next and break statement to control the flow inside a for-loop. You can also see how to use rep() function to create a vector of identical values (TRUE) and to use seq() function to create a vector of uniformly-spaced values from i*i to pmax.

At the end of the code, you used the which() function to find the indices where the vector’s value is TRUE. In R, vectors are indexed with 1. Hence the vector primality is created with first element set to FALSE (since 1 is not considered prime) before the for-loop started.

There are a lot of built-in functions in R. The code above shows you a few and you can learn some of the most common functions from the “R Reference Card”.

Sum of the Most Consecutive Primes

Writing a program as above is useful for many projects but when you run into a larger problem, you may want a way to structure your program into functional blocks. R supports not only built-in functions, but also allows you to create your own function.

Let’s consider a slightly larger program. This is the problem 50 from Project Euler. You want to find the prime below one million that is a sum of the most consecutive primes. For example, the sum of the first 6 primes is 2+3+5+7+11+13=41 and 41 is a prime. The solution is 997651, which is the sum of 543 primes.

As you have a way to generate primes up to a million, you can scan the vector of primes and find the sum, then verify if the sum is a prime as well, up to the point that the sum is below one million. At the same time, you need to keep track of the longest sum that fits the criteria.

Following is how you can solve this problem in R:

# Project Euler #50

# return a vector of primes up to a limit
getprimes <- function(pmax) {
    primality <- rep(TRUE, pmax)
    primality[1] <- FALSE
    # run the Sieve of Eratosthenes
    for (i in 1:pmax) {
        if (!primality[i]) {
            next
        }
        if (i*i > pmax) {
            break
        }
        for (j in seq(i*i, pmax, by=i)) {
            primality[j] <- FALSE
        }
    }
    # return the indices that are TRUE
    return(which(primality))
}

# find the longest sum that is a prime
pmax <- 1000000
primes <- getprimes(pmax)
count_max = 0
ans <- -1
for (i in 1:(length(primes)-1)) {
    sum <- primes[i]
    count <- 1
    for (j in i+1:length(primes)) {
        sum <- sum + primes[j]
        count <- count + 1
        if (sum > pmax) {
            break
        }
        if ((sum %in% primes) && (count > count_max)) {
            ans <- primes[i:j]
            count_max <- count
        }
    }
}
print(ans)
print(length(ans))
print(sum(ans))

# Project Euler #50

# return a vector of primes up to a limit

getprimes <- function(pmax) {

primality <- rep(TRUE, pmax)

primality[1] <- FALSE

# run the Sieve of Eratosthenes

for (i in 1:pmax) {

if (!primality[i]) {

}

if (i*i > pmax) {

break

}

for (j in seq(i*i, pmax, by=i)) {

primality[j] <- FALSE

}

# return the indices that are TRUE

return(which(primality))

}

# find the longest sum that is a prime

pmax <- 1000000

primes <- getprimes(pmax)

count_max = 0

ans <- -1

for (i in 1:(length(primes)-1)) {

sum <- primes[i]

count <- 1

for (j in i+1:length(primes)) {

sum <- sum + primes[j]

count <- count + 1

if (sum > pmax) {

break

}

if ((sum %in% primes) && (count > count_max)) {

ans <- primes[i:j]

count_max <- count

}

print(ans)

print(length(ans))

print(sum(ans))

You can see that a custom function is built to return the list of all primes. A function is defined using the function() syntax and with a return(). When you call the function like primes <- getprimes(pmax), whatever passed back by return() is assigned to the variable.

The rest of the code above should be familiar to you: They are built with for-loop and if statements. You should also see how the answer is recorded and updated in the loop.

One subtle issue you should pay attention: In the for-loop on i, it is up to length(primes)-1 while the for-loop on j starts at i+1. This is to make sure we calculate the sum correctly because in R, it is possible to create a vector in a syntax such as 5:2 or 5:5, which is a descending sequence and a single element vector, respectively.

If you run the code correctly, you should see the following output:

  [1]    7   11   13   17   19   23   29   31   37   41   43   47   53   59   61
 [16]   67   71   73   79   83   89   97  101  103  107  109  113  127  131  137
 [31]  139  149  151  157  163  167  173  179  181  191  193  197  199  211  223
 [46]  227  229  233  239  241  251  257  263  269  271  277  281  283  293  307
 [61]  311  313  317  331  337  347  349  353  359  367  373  379  383  389  397
 [76]  401  409  419  421  431  433  439  443  449  457  461  463  467  479  487
 [91]  491  499  503  509  521  523  541  547  557  563  569  571  577  587  593
[106]  599  601  607  613  617  619  631  641  643  647  653  659  661  673  677
[121]  683  691  701  709  719  727  733  739  743  751  757  761  769  773  787
[136]  797  809  811  821  823  827  829  839  853  857  859  863  877  881  883
[151]  887  907  911  919  929  937  941  947  953  967  971  977  983  991  997
[166] 1009 1013 1019 1021 1031 1033 1039 1049 1051 1061 1063 1069 1087 1091 1093
[181] 1097 1103 1109 1117 1123 1129 1151 1153 1163 1171 1181 1187 1193 1201 1213
[196] 1217 1223 1229 1231 1237 1249 1259 1277 1279 1283 1289 1291 1297 1301 1303
[211] 1307 1319 1321 1327 1361 1367 1373 1381 1399 1409 1423 1427 1429 1433 1439
[226] 1447 1451 1453 1459 1471 1481 1483 1487 1489 1493 1499 1511 1523 1531 1543
[241] 1549 1553 1559 1567 1571 1579 1583 1597 1601 1607 1609 1613 1619 1621 1627
[256] 1637 1657 1663 1667 1669 1693 1697 1699 1709 1721 1723 1733 1741 1747 1753
[271] 1759 1777 1783 1787 1789 1801 1811 1823 1831 1847 1861 1867 1871 1873 1877
[286] 1879 1889 1901 1907 1913 1931 1933 1949 1951 1973 1979 1987 1993 1997 1999
[301] 2003 2011 2017 2027 2029 2039 2053 2063 2069 2081 2083 2087 2089 2099 2111
[316] 2113 2129 2131 2137 2141 2143 2153 2161 2179 2203 2207 2213 2221 2237 2239
[331] 2243 2251 2267 2269 2273 2281 2287 2293 2297 2309 2311 2333 2339 2341 2347
[346] 2351 2357 2371 2377 2381 2383 2389 2393 2399 2411 2417 2423 2437 2441 2447
[361] 2459 2467 2473 2477 2503 2521 2531 2539 2543 2549 2551 2557 2579 2591 2593
[376] 2609 2617 2621 2633 2647 2657 2659 2663 2671 2677 2683 2687 2689 2693 2699
[391] 2707 2711 2713 2719 2729 2731 2741 2749 2753 2767 2777 2789 2791 2797 2801
[406] 2803 2819 2833 2837 2843 2851 2857 2861 2879 2887 2897 2903 2909 2917 2927
[421] 2939 2953 2957 2963 2969 2971 2999 3001 3011 3019 3023 3037 3041 3049 3061
[436] 3067 3079 3083 3089 3109 3119 3121 3137 3163 3167 3169 3181 3187 3191 3203
[451] 3209 3217 3221 3229 3251 3253 3257 3259 3271 3299 3301 3307 3313 3319 3323
[466] 3329 3331 3343 3347 3359 3361 3371 3373 3389 3391 3407 3413 3433 3449 3457
[481] 3461 3463 3467 3469 3491 3499 3511 3517 3527 3529 3533 3539 3541 3547 3557
[496] 3559 3571 3581 3583 3593 3607 3613 3617 3623 3631 3637 3643 3659 3671 3673
[511] 3677 3691 3697 3701 3709 3719 3727 3733 3739 3761 3767 3769 3779 3793 3797
[526] 3803 3821 3823 3833 3847 3851 3853 3863 3877 3881 3889 3907 3911 3917 3919
[541] 3923 3929 3931
[1] 543
[1] 997651

[1] 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61

[16] 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137

[31] 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223

[46] 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307

[61] 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397

[76] 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487

[91] 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593

[106] 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677

[121] 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787

[136] 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883

[151] 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997

[166] 1009 1013 1019 1021 1031 1033 1039 1049 1051 1061 1063 1069 1087 1091 1093

[181] 1097 1103 1109 1117 1123 1129 1151 1153 1163 1171 1181 1187 1193 1201 1213

[196] 1217 1223 1229 1231 1237 1249 1259 1277 1279 1283 1289 1291 1297 1301 1303

[211] 1307 1319 1321 1327 1361 1367 1373 1381 1399 1409 1423 1427 1429 1433 1439

[226] 1447 1451 1453 1459 1471 1481 1483 1487 1489 1493 1499 1511 1523 1531 1543

[241] 1549 1553 1559 1567 1571 1579 1583 1597 1601 1607 1609 1613 1619 1621 1627

[256] 1637 1657 1663 1667 1669 1693 1697 1699 1709 1721 1723 1733 1741 1747 1753

[271] 1759 1777 1783 1787 1789 1801 1811 1823 1831 1847 1861 1867 1871 1873 1877

[286] 1879 1889 1901 1907 1913 1931 1933 1949 1951 1973 1979 1987 1993 1997 1999

[301] 2003 2011 2017 2027 2029 2039 2053 2063 2069 2081 2083 2087 2089 2099 2111

[316] 2113 2129 2131 2137 2141 2143 2153 2161 2179 2203 2207 2213 2221 2237 2239

[331] 2243 2251 2267 2269 2273 2281 2287 2293 2297 2309 2311 2333 2339 2341 2347

[346] 2351 2357 2371 2377 2381 2383 2389 2393 2399 2411 2417 2423 2437 2441 2447

[361] 2459 2467 2473 2477 2503 2521 2531 2539 2543 2549 2551 2557 2579 2591 2593

[376] 2609 2617 2621 2633 2647 2657 2659 2663 2671 2677 2683 2687 2689 2693 2699

[391] 2707 2711 2713 2719 2729 2731 2741 2749 2753 2767 2777 2789 2791 2797 2801

[406] 2803 2819 2833 2837 2843 2851 2857 2861 2879 2887 2897 2903 2909 2917 2927

[421] 2939 2953 2957 2963 2969 2971 2999 3001 3011 3019 3023 3037 3041 3049 3061

[436] 3067 3079 3083 3089 3109 3119 3121 3137 3163 3167 3169 3181 3187 3191 3203

[451] 3209 3217 3221 3229 3251 3253 3257 3259 3271 3299 3301 3307 3313 3319 3323

[466] 3329 3331 3343 3347 3359 3361 3371 3373 3389 3391 3407 3413 3433 3449 3457

[481] 3461 3463 3467 3469 3491 3499 3511 3517 3527 3529 3533 3539 3541 3547 3557

[496] 3559 3571 3581 3583 3593 3607 3613 3617 3623 3631 3637 3643 3659 3671 3673

[511] 3677 3691 3697 3701 3709 3719 3727 3733 3739 3761 3767 3769 3779 3793 3797

[526] 3803 3821 3823 3833 3847 3851 3853 3863 3877 3881 3889 3907 3911 3917 3919

[541] 3923 3929 3931

[1] 543

[1] 997651

Which tells you that 997651 is a sum of 543 primes.

Summary

In this post, you learned from examples on some R programming syntax and how to define your own R functions. Specifically, you learned

How to create loops and branches
How to control the flow in loops using next and break
How to create and use a custom function

Navigation

Logic, Flow Control, and Functions in R

Overview

Finding Primes

The Sieve of Eratosthenes

Sum of the Most Consecutive Primes

Further Readings

Website

Books

Summary

More On This Topic

No comments yet.

Leave a Reply Click here to cancel reply.