Chapter 1 R refresher

The objectives of this chapter is to review some R syntax, functions and data structures that will be needed for the following chapters.

1.1 Administration

BiocManager::install("UCLouvain-CBIO/rWSBIM1322")

1.2 Basic data structures and operations

Summary

number of dimensions number of data types
vector 1 (length) 1
matrix 2 1
array n 1
dataframe 2 n
list 1 (length) n

1.3 Tidyverse

  • The dplyr package
  • Piping
  • Wide and long data, and their conversion with the pivot_longer and pivot_wider functions.

1.4 Saving and exporting

  • saveRDS() and readRDS() binary data.
  • Exporting data with write.csv and read.csv (or write_csv and read_csv) and same for other types of spreadsheets.
  • Saving figures (ggsave and file devices such as png(), pdf(), …).
  • Package versions: sessionInfo()

1.5 Programming

1.6 Additional exercises

► Question

Complete the following function. It is supposed to take two inputs, x and y and, depending whether the x > y or x <= y, it generates the permutation sample(x, y) in the first case or draws a sample from rnorm(1000, x, y) in the second case. Finally, it returns the sum of all values.

fun <- function(x, y) {
    res <- NA
    if () { ## 1
        res <- sample(,) ## 2
    } else {
        res <- rnorm(, , ) ## 3
    }
    return() ## 4
}

► Question

Read the interro2.rds from the rWSBIM1207 package (version 0.1.9 of later) file into R. The path to the file can be found with the rWSBIM1207::interro2.rds() function.

This dataframe provides the scores for 4 tests for 10 students.

  • Write a function that calculates the average score for the 3 best tests.
  • Calculate this average score for the 10 students.

This can be done using the apply function or using dplyr functions. For the latter, see also rowwise().

Note the situation of students that have only presented 3 out of 4 tests (i.e they have a NA for one test). It is up to you to decide whether you simply take the mean of the 3, or whether you prefer to drop the worst of 3 and calculate the mean of the 2 best marks. Make sure you are aware of what your implementation returns and, ideally, state it explicitly in your response.

► Question

  • Create a matrix of dimenions 100 by 100 containing data from a normal distribution of mean 0 and standard deviation 1.

  • Compute the means of each row and each column using the apply() and rowMeans()/colMeans() functions. Confirm that both provide the same results.

  • Compute the difference between the column means and the row means. Does the result make sense?

► Question

  • Using the data kem2_se data from the rWSBIM1322 package, compute de delta values or each gene (delta is the difference between the highest and lowest values). To do this, write a function delta() that takes a vector of numerics as input and returns the delta value, and apply it on the object’s assay.

  • Re-use your delta() function and apply it on each sample.

Page built: 2024-12-09 using R version 4.4.1 (2024-06-14)