Preamble

Training aims

Do you want to get started with reproducible data analysis with R, one of the most used software for the analysis of high throughput biology data?

R is a free and open-source software. It is one of the most widely used in the bio-medical research field, likely due to the availability of numerous R/Bioconductor packages specifically dedicated to high throughput data.

The goal of this training is to initiate wet-lab scientists to reproducible data analysis with R and its RStudio integrated environment, focusing on data manipulation, data visualisation and basic data analysis.

Pre-requisites

This training doesn’t require any previous knowledge of R. There are no programming or technical pre-requisities for this course, other than basic computer usage, such as general knowledge about files (binary and text files) and folders and as well as downloading files. Familiarity with a spreadsheet editor is helpful for the first chapter.

Content

  • Discovering R, and the RStudio environment
  • Importance of tidy data in general and how it translates into dataframes in R
  • Data manipulation and analysis using R standard commands and the tidyverse packages
  • Data visualisation with ggplot2

Requirements

Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed:

  • Download R from the CRAN page: https://cloud.r-project.org/. At the top of that page, choose the Download R link corresponding to your operating system. If you use Windows, follow install R for the first time, then click the link to download R. The installation procedure is like any other software, and you can safely use all default options. If you use Mac (OS X), download the pkg installer that matches you OS version and install like any other software. Linux users are advised to use their package manager.

  • Download and install the Rstudio Desktop Open source edition: https://rstudio.com/products/rstudio/download/#download. Choose the installer for your operating system and version. Install as any other software.

For technical assistance https://moodle.uclouvain.be/course/view.php?id=4862

Provisional timetable

Day 1 Day 2
9h-10h30 Data organisation with Spreadsheets R and Rstudio Data visualization
10h45-12h45 Introduction to R Starting with data Data visualization (cont) and joining tables
13h45-15h45 Starting with data Manipulating and analyzing data with dplyr Summary exercise
16h-17h Manipulating and analyzing data with dplyr Further topics

References and credits

References are provided throughout the course. Several stand out however, as they cover large parts of the material or provide complementary resources.

The material for the first chapters, covering the Introduction to data science with R, is based on the Data Carpentry Ecology curiculum (Michonneau and Fournier 2019Michonneau, Francois, and Auriel Fournier, eds. 2019. Data Carpentry: R for Data Analysis and Visualization of Ecological Data.” https://doi.org/10.5281/zenodo.569338.).

General references for this course are R for Data Science (Grolemund and Wickham 2017Grolemund, Garrett, and Hadley Wickham. 2017. R for Data Science. O’Reilly Media. https://r4ds.had.co.nz/.) and Bioinformatics Data Skills (Buffalo 2015Buffalo, Vince. 2015. Bioinformatics Data Skills. O’Reilly Media, Inc.).

The RStudio Cheat Sheets are also a handy resource and readers will be pointed to specific sheets in the respective chapters.

This training is organised by the SMCS in partnership with laurent Gatto, from the CBIO Lab in the de Duve Institute and is being taught by Axelle Loriot and Manon Martin at the UCLouvain, Belgium.

About this course material

This material is written in R markdown (Allaire et al. 2021Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and Richard Iannone. 2021. Rmarkdown: Dynamic Documents for r. https://CRAN.R-project.org/package=rmarkdown.) and compiled as a book using knitr (Xie 2021bXie, Yihui. 2021b. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.) bookdown (Xie 2021aXie, Yihui. 2021a. Bookdown: Authoring Books and Technical Documents with r Markdown. https://CRAN.R-project.org/package=bookdown.). The source code is publicly available in a Github repository https://github.com/UCLouvain-CBIO/bioinfo-training-01-intro-r and the compiled material can be read at https://uclouvain-cbio.github.io/bioinfo-training-01-intro-r

License

This material is licensed under the Creative Commons Attribution-ShareAlike 4.0 License.

Setup

For chapter 1 about Data organisation with Spreadsheets, a spreadsheet programme is necessary.

We will be using the R environment for statistical computing as main data science language. We will also use the RStudio interface to interact with R and write scripts and reports. Both R and RStudio are easy to install and works on all major operating systems.

Once R and RStudio are installed, a set of packages will need to be installed. See section 9.1 for details.

Page built: 2021-10-01 using R version 4.1.1 (2021-08-10)