if (temp < 0.0){
<- TRUE
is_frozen }
3 Programming primers
Chapter lead author: Pepa Aran
3.1 Learning objectives
After you’ve gone over the lecture and solved the exercises, you should be able to:
- Use loops, conditional statements and functions in your code
- Write clean, stylized and structured code
- Look for help
3.2 Tutorial
3.2.1 Programming basics
In this section, we will review the most basic programming elements (conditional statements, loops, functions…) for the R syntax.
3.2.1.1 Conditional statements
In cases where we want certain statements to be executed or not, depending on a criterion, we can use conditional statements if
, else if
, and else
. Conditionals are an essential feature of programming and available in all languages. The R syntax for conditional statements looks like this:
The evaluation of the criterion inside the round brackets (here (temp < 0.0)
) has to return either TRUE
or FALSE
. Whenever the statement between brackets is TRUE
, the chunk of code between the subsequent curly brackets is executed. You can also write a conditional that covers all possibilities, like this:
<- -0.5
temp if (temp < 0.0){
<- TRUE
is_frozen else {
} <- FALSE
is_frozen }
When the temperature is below 0, the first chunk of code is executed. Whenever it is greater or equal that 0 (i.e. the condition returns FALSE
) the second chunk of code is evaluated.
You can also write more than two conditions, covering several cases:
<- FALSE
is_frozen <- FALSE
just_cold if (temp < 0.0){
<- TRUE
is_frozen else if (temp < 10){
} <- TRUE
just_cold }
Note: In the code chunks above, an indentation was used to highlight which parts go together, which makes the code easy to understand. Indentations are not evaluated by R per se (unlike in other programming languages, e.g., Matlab, Python), but help to make the code easier to read.
3.2.1.2 Loops
Loops are essential for solving many common tasks. for
and while
loops let us repeatedly execute the same set of commands, while changing an index or counter variable to take a sequence of different values. The following example calculates the sum of elements in the vector vec_temp
by iteratively adding them together.
<- seq(10) # equivalent to 1:10
vec_temp <- 0 # initialize sum
temp_sum for (idx in seq(length(vec_temp))){
<- temp_sum + vec_temp[idx]
temp_sum
} temp_sum
[1] 55
Of course, this is equivalent to just using the sum()
function.
sum(vec_temp)
Instead of directly telling R how many iterations it should do we can also define a condition and use a while
-loop. As long as the condition is TRUE
, R will continue iterating. As soon as it is FALSE
, R stops the loop. The following lines of code do the same operation as the for
loop above. What is different? What is the same?
= 1 # initialize counter
idx <- 0 # initialize sum
temp_sum while (idx <= 10){
<- temp_sum + vec_temp[idx]
temp_sum = idx + 1
idx
} temp_sum
3.2.1.3 Functions
Often, analyses require many steps and your scripts may get excessively long. An important aspect of good programming is to avoid duplicating code. If the same sequence of multiple statements or functions are to be applied repeatedly, then it is usually advisable to bundle them into a new function and apply this single function to each object. This also has the advantage that if some requirement or variable name changes, it has to be edited only in one place. A further advantage of writing functions is that you can give the function an intuitively understandable name, so that your code reads like a sequence of orders given to a human.
For example, the following code, converting temperature values provided in Fahrenheit to degrees Celsius, could be turned into a function.
# not advisable
<- (temp_soil - 32) * 5 / 9
temp_soil <- (temp_air - 32) * 5 / 9
temp_air <- (temp_leaf - 32) * 5 / 9 temp_leaf
Functions are a set of instructions encapsulated within curly brackets ({}
) that generate a desired outcome. Functions contain four main elements:
- They start with a name to describe their purpose,
- then they need arguments, which are a list of the objects being input,
- enclosed by curly brackets
function(x){ ... }
for the code making up the body of the function, - and lastly, within the body, a return statement indicating the output of the function.
Below, we define our own function f2c()
:
# advisable
<- function(temp_f){
f2c <- (temp_f - 32) * 5 / 9
temp_c return(temp_c)
}
<- f2c(temp_soil)
temp_soil <- f2c(temp_air)
temp_air <- f2c(temp_leaf) temp_leaf
Functions are essential for efficient programming. Functions have their own environment, which means that variables inside functions are only defined and usable within that function and are not saved to the global environment. Functions restrict the scope of the domain in which variables are defined. Information flows inside the function only through its arguments, and flows out of the function only through its returned variable.
Functions (particularly long ones) can be written to separate source files with a suffix .R
and saved in your ./R
directory - “written” as in copy and paste the function text as in the code chunk above into a text file with .R
suffix. Preferably, the file has the same name as the function. We can save the previous function in a script ./R/f2c.R
and load it later by running source("./R/f2c")
. It’s good practice to keep one file per function, unless a function calls another function that is called nowhere else. In that case, the “sub-ordinate” function can be placed inside the same .R
file.
3.2.2 Style your code
Nice code is clean, readable, consistent, and extensible (easily modified or adapted). Ugly code works, but is hard to work with. There is no right or wrong about coding style, but certain aspects make it easier to read and use code. Here are a few points to consider.
3.2.2.1 Spaces and breaks
Adding enough white spaces and line breaks in the right locations greatly helps the legibility of code. Cramming variables, operators, and brackets without spaces leaves an unintelligible sequence of characters and it will not be clear what parts go together. Therefore, consider the following points:
- Use spaces around operators (
=
,+
,-
,<-
,>
, etc.). - Use
<-
, not=
, for allocating a value to a variable. - Code inside curly brackets should be indented (recommended: two white spaces at the beginning of each line for each indentation level - don’t use tabs).
For example:
if (temp > 5.0){
<- growth_temp + temp
growth_temp }
3.2.2.2 Variable naming
It is recommended to use concise and descriptive variable names. Different variable naming styles are being used. In this course, we use lowercase letters, and underscores (_
) to separate words within a variable name (_
). Avoid (.
) as they are reserved for certain types of objects in R. Also, avoid naming your objects with names of common functions and variables since your re-definition will mask already defined object names.
For example, df_daily
is a data frame with data at a daily resolution. Or clean_daily
is a function that cleans daily data. Note that a verb is used as a name for a function and an underscore (_
) is used to separate words.
It is also recommended to avoid variable names consisting of only one character. Single-letter names make it practically impossible to search for that variable.
# Good
day_01
# Bad
DayOne
day.one
first_day_of_the_month
djm1
# Very bad
<- function(x) sum(x)/length(x) # mean() itself is already a function
mean <- FALSE # T is an abbreviation of TRUE
T <- 10 # c() is used to create a vector (example <- c(1, 2, 3)) c
3.2.2.3 Script style
Load libraries at the very beginning of a script, followed, by reading data or functions from files. Functions should be defined in separate .R
files, unless they are only a few lines long. Then, place the sequence of statements. The name of the script should be short and concise and indicate what the script does.
Use comments to describe in human-readable text what the code does. Comments are all that appears to the right of a #
and are code parts that are not interpreted by R and not executed. Adding comments in the code greatly helps you and others to read your code, understand what it does, modify it, and resolve errors (bugs).
To visually separate parts of a script, use commented lines (e.g., #----
). The RStudio text editor automatically recognizes ----
added to the right end of a commented line and interprets it as a block of content which can be navigated by using the document (Outline button in the top right corner of the editor panel).
Avoid reading entire workspace environments (e.g., load(old_environment.RData)
), deleting environments rm(list=ls())
, loading hidden dependencies (e.g., .Rprofile
), or changing the working directory (setwd("~/other_dir/"
) as part of a script.
Note that information about the author of the script, its creation date, and modifications, etc. should not be added as commented text in the file. In Chapter 8, we will learn about the code versioning control system git, which keeps track of all this as meta information associated with files.
A good and comprehensive best practices guide is given by the tidyverse style guide.
3.2.3 Where to find help
The material covered in this course will give you a solid basis for your future projects. Even more so, it provides you with code examples that you can adapt to your own purposes. Naturally, you will face problems we did not cover in the course and you will learn more as you go. Different approaches to getting help can be taken for different types of problems and questions.
3.2.3.1 Within R
“I know the name of a function that might help solve the problem but I do not know how to use it.” Typing a ?
in front of the function will open the documentation of the function, giving information about a function’s purpose and method, arguments, the returned object, and examples. You have learned a few things about plots but you may not know how to make a boxplot:
::boxplot ?graphics
Running the above code will open the information on making boxplots in R.
“There must be a function that does task X but I do not know which one.” Typing ??
will call the function help.search()
. Maybe you want to save a plot as a JPEG but you do not know how:
??jpeg
Note that it only looks through your installed packages.
3.2.3.2 Online
To search in the entire library of R go to the website rdocumentation.org or turn to a search engine of your choice. It will send you to the appropriate function documentation or a helpful forum where someone has already asked a similar question. Most of the time you will end up on stackoverflow.com, a forum where most questions have already been answered.
3.2.3.3 Error messages
If you do not understand the error message, start by searching for it on the web. Be aware that this is not always useful as developers rely on the error catching provided by R. To be more specific, add the name of the function and package you are using, to get a more detailed answer.
3.2.3.4 Worked examples
Worked examples are implementations of certain workflows that may serve as a template for your own purpose. It is often simpler to adjust existing code to fulfill your purpose than to write it from scratch. Vignettes are provided for many packages and serve as example workflows that demonstrate the utility of package functions. You can type …
vignette("caret", package = "caret")
… to get information about how to use the {caret} package in an easily digestible format. (You will learn more about caret in Chapter 10 and Chapter 11). Several blogs serve similar purposes and are a great entry point to learn about new topics. Examples are the Posit Blog (Posit is the company developing and maintaining RStudio and several R packages), R-bloggers, R-Ladies, etc.
3.2.3.5 Asking for help
If you cannot find a solution online, start by asking your friends and colleagues. Someone with more experience than you might be able and willing to help you. When asking for help it is important to think about how you state the problem. The key to receiving help is to make it as easy as possible to understand the issue you are facing. Try to reduce what does not work to a simple example. Reproduce a problem with a simple data frame instead of one with thousands of rows. Generalize it in a way that people who do not do research in your field can understand the problem. If you are asking a question online in a forum include the output of sessionInfo()
(it provides information about the R version, packages your using,…) and other information that can be helpful to understand the problem. Stackoverflow has its own guidelines on how to ask a good question, which you should follow. Here’s a great template you should use for R-specific question. If your question is well crafted and has not been answered before you can sometimes get an answer within 5 minutes.
3.3 Exercises
Gauss variations
Use a for
loop to compute the sum of all natural numbers from 1 to 100. Print the result to the screen. Repeat this exercise but use a while
loop.
Add up all numbers between 1 and 100 that are at the same time a multiple of 3 and a multiple of 7. Print the result to the screen in the form of: The sum of multiples of 3 and 7 within 1-100 is: {your result}
.
Nested loops
Given a matrix mymat
and a vector myvec
(see below), implement the following algorithm:
- Start with the first row in
mymat
. - Fill all missing values in the current row of
mymat
with the maximum value inmyvec
. - Drop the maximum value from
myvec
. - Proceed to the next row of
mymat
and repeat steps 2-4.
mymat
and myvec
are defined as:
<- matrix(c(6, 7, 3, NA, 15, 6, 7,
mymat NA, 9, 12, 6, 11, NA, 3,
9, 4, 7, 3, 21, NA, 6,
rep(NA, 7)),
nrow = 4, byrow = TRUE)
<- c(8, 4, 12, 9, 15, 6) myvec
Interpolation
Define a vector \(\vec{v}\) of length 100. Define the vector so that \(v_i = 6\), for \(i = 1 : 25\) and \(v_i = -20\), for \(i = 66 : 100\). Remaining elements are to be defined as ‘missing’. Linearly interpolate missing values that are not defined. Plot the values of \(\vec{v}\) using plot(vec)
.