Chapter 2 Programming primers

Chapter lead author: Pepa Aran

2.1 Learning objectives

After you’ve gone over the lecture and solved the exercises, you should be able to:

  • Use loops, conditional statements and functions in your code
  • Write clean, stylized and structured code
  • Look for help

2.2 Tutorial

2.2.1 Programming basics

In this section, we will review the most basic programming elements (conditional statements, loops, functions…) for the R syntax.

2.2.1.1 Conditional statements

In cases where we want certain statements to be executed or not, depending on a criterion, we can use conditional statements if, else if, and else. Conditionals are an essential feature of programming and available in all languages. The R syntax for conditional statements looks like this:

if (temp < 0.0){
  is_frozen <- TRUE
}

The evaluation of the criterion inside the round brackets (here (temp < 0.0)) has to return either TRUE or FALSE. Whenever the statement between brackets is TRUE, the chunk of code between the subsequent curly brackets is executed. You can also write a conditional that covers all possibilities, like this:

temp <- -0.5
if (temp < 0.0){
  is_frozen <- TRUE
} else {
  is_frozen <- FALSE
}

When the temperature is below 0, the first chunk of code is executed. Whenever it is greater or equal that 0 (i.e. the condition returns FALSE) the second chunk of code is evaluated.

You can also write more than two conditions, covering several cases:

is_frozen <- FALSE
just_cold <- FALSE
if (temp < 0.0){
  is_frozen <- TRUE
} else if (temp < 10){
  just_cold <- TRUE
}

Note: In the code chunks above, an indentation was used to highlight which parts go together, which makes the code easy to understand. Indentations are not evaluated by R per se (unlike in other programming languages, e.g., Matlab, Python), but help to make the code easier to read.

2.2.1.2 Loops

Loops are essential for solving many common tasks. for and while loops let us repeatedly execute the same set of commands, while changing an index or counter variable to take a sequence of different values. The following example calculates the sum of elements in the vector vec_temp by iteratively adding them together.

vec_temp <- seq(10)  # equivalent to 1:10
temp_sum <- 0        # initialize sum
for (idx in seq(length(vec_temp))){
  temp_sum <- temp_sum + vec_temp[idx]
}
temp_sum
## [1] 55

Of course, this is equivalent to just using the sum() function.

sum(vec_temp)

Instead of directly telling R how many iterations it should do we can also define a condition and use a while-loop. As long as the condition is TRUE, R will continue iterating. As soon as it is FALSE, R stops the loop. The following lines of code do the same operation as the for loop above. What is different? What is the same?

idx = 1           # initialize counter
temp_sum <- 0     # initialize sum
while (idx <= 10){
  temp_sum <- temp_sum + vec_temp[idx]
  idx = idx + 1
}
temp_sum

2.2.1.3 Functions

Often, analyses require many steps and your scripts may get excessively long. An important aspect of good programming is to avoid duplicating code. If the same sequence of multiple statements or functions are to be applied repeatedly, then it is usually advisable to bundle them into a new function and apply this single function to each object. This also has the advantage that if some requirement or variable name changes, it has to be edited only in one place. A further advantage of writing functions is that you can give the function an intuitively understandable name, so that your code reads like a sequence of orders given to a human.

For example, the following code, converting temperature values provided in Fahrenheit to degrees Celsius, could be turned into a function.

# not advisable
temp_soil <- (temp_soil - 32) * 5 / 9
temp_air  <- (temp_air  - 32) * 5 / 9
temp_leaf <- (temp_leaf - 32) * 5 / 9

Functions are a set of instructions encapsulated within curly brackets ({}) that generate a desired outcome. Functions contain four main elements:

  • They start with a name to describe their purpose,
  • then they need arguments, which are a list of the objects being input,
  • enclosed by curly brackets function(x){ ... } for the code making up the body of the function,
  • and lastly, within the body, a return statement indicating the output of the function.

Below, we define our own function f2c():

# advisable
f2c <- function(temp_f){
  temp_c <- (temp_f - 32) * 5 / 9
  return(temp_c)
}

temp_soil <- f2c(temp_soil)
temp_air  <- f2c(temp_air)
temp_leaf <- f2c(temp_leaf)

Functions are essential for efficient programming. Functions have their own environment, which means that variables inside functions are only defined and usable within that function and are not saved to the global environment. Functions restrict the scope of the domain in which variables are defined. Information flows inside the function only through its arguments, and flows out of the function only through its returned variable.

Functions (particularly long ones) can be written to separate source files with a suffix .R and saved in your ./R directory - “written” as in copy and paste the function text as in the code chunk above into a text file with .R suffix. Preferably, the file has the same name as the function. We can save the previous function in a script ./R/f2c.R and load it later by running source("./R/f2c"). It’s good practice to keep one file per function, unless a function calls another function that is called nowhere else. In that case, the “sub-ordinate” function can be placed inside the same .R file.

2.2.2 Style your code

Nice code is clean, readable, consistent, and extensible (easily modified or adapted). Ugly code works, but is hard to work with. There is no right or wrong about coding style, but certain aspects make it easier to read and use code. Here are a few points to consider.

2.2.2.1 Spaces and breaks

Adding enough white spaces and line breaks in the right locations greatly helps the legibility of code. Cramming variables, operators, and brackets without spaces leaves an unintelligible sequence of characters and it will not be clear what parts go together. Therefore, consider the following points:

  • Use spaces around operators (=, +, -, <-, >, etc.).
  • Use <-, not =, for allocating a value to a variable.
  • Code inside curly brackets should be indented (recommended: two white spaces at the beginning of each line for each indentation level - don’t use tabs).

For example:

if (temp > 5.0){
  growth_temp <- growth_temp + temp  
}

2.2.2.2 Variable naming

It is recommended to use concise and descriptive variable names. Different variable naming styles are being used. In this course, we use lowercase letters, and underscores (_) to separate words within a variable name (_). Avoid (.) as they are reserved for certain types of objects in R. Also, avoid naming your objects with names of common functions and variables since your re-definition will mask already defined object names.

For example, df_daily is a data frame with data at a daily resolution. Or clean_daily is a function that cleans daily data. Note that a verb is used as a name for a function and an underscore (_) is used to separate words.

It is also recommended to avoid variable names consisting of only one character. Single-letter names make it practically impossible to search for that variable.

# Good
day_01

# Bad
DayOne
day.one
first_day_of_the_month
djm1

# Very bad
mean <- function(x) sum(x)/length(x) # mean() itself is already a function
T <- FALSE # T is an abbreviation of TRUE
c <- 10 # c() is used to create a vector (example <- c(1, 2, 3))

2.2.2.3 Script style

Load libraries at the very beginning of a script, followed, by reading data or functions from files. Functions should be defined in separate .R files, unless they are only a few lines long. Then, place the sequence of statements. The name of the script should be short and concise and indicate what the script does.

Use comments to describe in human-readable text what the code does. Comments are all that appears to the right of a # and are code parts that are not interpreted by R and not executed. Adding comments in the code greatly helps you and others to read your code, understand what it does, modify it, and resolve errors (bugs).

To visually separate parts of a script, use commented lines (e.g., #----). The RStudio text editor automatically recognizes ---- added to the right end of a commented line and interprets it as a block of content which can be navigated by using the document (Outline button in the top right corner of the editor panel).

Avoid reading entire workspace environments (e.g., load(old_environment.RData)), deleting environments rm(list=ls()), loading hidden dependencies (e.g., .Rprofile), or changing the working directory (setwd("~/other_dir/") as part of a script.

Note that information about the author of the script, its creation date, and modifications, etc. should not be added as commented text in the file. In Chapter 7, we will learn about the code versioning control system git, which keeps track of all this as meta information associated with files.

A good and comprehensive best practices guide is given by the tidyverse style guide.

2.2.3 Where to find help

The material covered in this course will give you a solid basis for your future projects. Even more so, it provides you with code examples that you can adapt to your own purposes. Naturally, you will face problems we did not cover in the course and you will learn more as you go. Different approaches to getting help can be taken for different types of problems and questions.

2.2.3.1 Within R

I know the name of a function that might help solve the problem but I do not know how to use it.” Typing a ? in front of the function will open the documentation of the function, giving information about a function’s purpose and method, arguments, the returned object, and examples. You have learned a few things about plots but you may not know how to make a boxplot:

?graphics::boxplot

Running the above code will open the information on making boxplots in R.

There must be a function that does task X but I do not know which one.” Typing ?? will call the function help.search(). Maybe you want to save a plot as a JPEG but you do not know how:

??jpeg

Note that it only looks through your installed packages.

2.2.3.2 Online

To search in the entire library of R go to the website rdocumentation.org or turn to a search engine of your choice. It will send you to the appropriate function documentation or a helpful forum where someone has already asked a similar question. Most of the time you will end up on stackoverflow.com, a forum where most questions have already been answered.

2.2.3.3 Error messages

If you do not understand the error message, start by searching for it on the web. Be aware that this is not always useful as developers rely on the error catching provided by R. To be more specific, add the name of the function and package you are using, to get a more detailed answer.

2.2.3.4 Worked examples

Worked examples are implementations of certain workflows that may serve as a template for your own purpose. It is often simpler to adjust existing code to fulfill your purpose than to write it from scratch. Vignettes are provided for many packages and serve as example workflows that demonstrate the utility of package functions. You can type …

vignette("caret", package = "caret")

… to get information about how to use the {caret} package in an easily digestible format. (You will learn more about caret in Chapters 9 and 10). Several blogs serve similar purposes and are a great entry point to learn about new topics. Examples are the Posit Blog (Posit is the company developing and maintaining RStudio and several R packages), R-bloggers, R-Ladies, etc.

2.2.3.5 Asking for help

If you cannot find a solution online, start by asking your friends and colleagues. Someone with more experience than you might be able and willing to help you. When asking for help it is important to think about how you state the problem. The key to receiving help is to make it as easy as possible to understand the issue you are facing. Try to reduce what does not work to a simple example. Reproduce a problem with a simple data frame instead of one with thousands of rows. Generalize it in a way that people who do not do research in your field can understand the problem. If you are asking a question online in a forum include the output of sessionInfo() (it provides information about the R version, packages your using,…) and other information that can be helpful to understand the problem. Stackoverflow has its own guidelines on how to ask a good question, which you should follow. Here’s a great template you should use for R-specific question. If your question is well crafted and has not been answered before you can sometimes get an answer within 5 minutes.

2.3 Exercises

Gauss variations

Use a for loop to compute the sum of all natural numbers from 1 to 100. Print the result to the screen. Repeat this exercise but use a while loop.

Add up all numbers between 1 and 100 that are at the same time a multiple of 3 and a multiple of 7. Print the result to the screen in the form of: The sum of multiples of 3 and 7 within 1-100 is: {your result}.

Nested loops

Given a matrix mymat and a vector myvec (see below), implement the following algorithm:

  1. Start with the first row in mymat.
  2. Fill all missing values in the current row of mymat with the maximum value in myvec.
  3. Drop the maximum value from myvec.
  4. Proceed to the next row of mymat and repeat steps 2-4.

mymat and myvec are defined as:

mymat <- matrix(c(6, 7, 3, NA, 15, 6, 7, 
              NA, 9, 12, 6, 11, NA, 3, 
              9, 4, 7, 3, 21, NA, 6, 
              rep(NA, 7)),
            nrow = 4, byrow = TRUE)
myvec <- c(8, 4, 12, 9, 15, 6)

Interpolation

Define a vector \(\vec{v}\) of length 100. Define the vector so that \(v_i = 6\), for \(i = 1 : 25\) and \(v_i = -20\), for \(i = 66 : 100\). Remaining elements are to be defined as ‘missing’. Linearly interpolate missing values that are not defined. Plot the values of \(\vec{v}\) using plot(vec).