Chapter 1 R refresher

The objectives of this chapter is to review some R syntax, functions and data structures that will be needed for the following chapters.

1.1 Administration

Setting up an RStudio project
Install packages from CRAN and Bioconductor.

BiocManager::install("UCLouvain-CBIO/rWSBIM1322")

Avoid saving and loading your workspace (the .RData file).
UTF-8 character encoding.
Starting a markdown document.

1.2 Basic data structures and operations

vectors, generating and subsetting vectors.
Missing values.
Factors
Dataframes (and tibbles)
Matrices
Arrays
Lists

Summary

	number of dimensions	number of data types
vector	1 (length)	1
matrix	2	1
array	n	1
dataframe	2	n
list	1 (length)	n

1.3 Tidyverse

The dplyr package
Piping
Wide and long data, and their conversion with the pivot_longer and pivot_wider functions.

1.4 Saving and exporting

saveRDS() and readRDS() binary data.
Exporting data with write.csv and read.csv (or write_csv and read_csv) and same for other types of spreadsheets.
Saving figures (ggsave and file devices such as png(), pdf(), …).
Package versions: sessionInfo()

1.5 Programming

Writing functions
Conditionals if/else
Iteration: for loops and apply functions

1.6 Additional exercises

► Question

Complete the following function. It is supposed to take two inputs, x and y and, depending whether the x > y or x <= y, it generates the permutation sample(x, y) in the first case or draws a sample from rnorm(1000, x, y) in the second case. Finally, it returns the sum of all values.

fun <- function(x, y) {
    res <- NA
    if () { ## 1
        res <- sample(,) ## 2
    } else {
        res <- rnorm(, , ) ## 3
    }
    return() ## 4
}

► Question

Read the interro2.rds from the rWSBIM1207 package (version 0.1.9 of later) file into R. The path to the file can be found with the rWSBIM1207::interro2.rds() function.

This dataframe provides the scores for 4 tests for 10 students.

Write a function that calculates the average score for the 3 best tests.
Calculate this average score for the 10 students.

This can be done using the apply function or using dplyr functions. For the latter, see also rowwise().

Note the situation of students that have only presented 3 out of 4 tests (i.e they have a NA for one test). It is up to you to decide whether you simply take the mean of the 3, or whether you prefer to drop the worst of 3 and calculate the mean of the 2 best marks. Make sure you are aware of what your implementation returns and, ideally, state it explicitly in your response.

► Question

Create a matrix of dimenions 100 by 100 containing data from a normal distribution of mean 0 and standard deviation 1.
Compute the means of each row and each column using the apply() and rowMeans()/colMeans() functions. Confirm that both provide the same results.
Compute the difference between the column means and the row means. Does the result make sense?

► Question

Using the data kem2_se data from the rWSBIM1322 package, compute de delta values or each gene (delta is the difference between the highest and lowest values). To do this, write a function delta() that takes a vector of numerics as input and returns the delta value, and apply it on the object’s assay.
Re-use your delta() function and apply it on each sample.

Page built: 2024-12-09 using R version 4.4.1 (2024-06-14)

Chapter 1 R refresher

1.1 AdministrationCopy link

1.2 Basic data structures and operationsCopy link

1.3 TidyverseCopy link

1.4 Saving and exportingCopy link

1.5 ProgrammingCopy link

1.6 Additional exercisesCopy link

1.1 Administration

1.2 Basic data structures and operations

1.3 Tidyverse

1.4 Saving and exporting

1.5 Programming

1.6 Additional exercises