Content
- Discovering R, and the RStudio environment
- Importance of tidy data in general and how it translates into dataframes in R
- Data manipulation and analysis using R standard commands and the tidyverse packages
- Data visualisation with ggplot2
Do you want to get started with reproducible data analysis with R, one of the most used software for the analysis of high throughput biology data?
R is a free and open-source software. It is one of the most widely used in the bio-medical research field, likely due to the availability of numerous R/Bioconductor packages specifically dedicated to high throughput data.
The goal of this training is to initiate wet-lab scientists to reproducible data analysis with R and its RStudio integrated environment, focusing on data manipulation, data visualisation and basic data analysis.
This training doesn’t require any previous knowledge of R. There are no programming or technical pre-requisities for this course, other than basic computer usage, such as general knowledge about files (binary and text files) and folders and as well as downloading files. Familiarity with a spreadsheet editor is helpful for the first chapter.
Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed:
Download R from the CRAN page: https://cloud.r-project.org/. At the top of that page, choose the Download R link corresponding to your operating system. If you use Windows, follow install R for the first time, then click the link to download R. The installation procedure is like any other software, and you can safely use all default options. If you use Mac (OS X), download the pkg installer that matches you OS version and install like any other software. Linux users are advised to use their package manager.
Download and install the Rstudio Desktop Open source edition: https://rstudio.com/products/rstudio/download/#download. Choose the installer for your operating system and version. Install as any other software.
For technical assistance https://moodle.uclouvain.be/course/view.php?id=4862
Day 1 | Day 2 | |
---|---|---|
9h-10h30 | Data organisation with Spreadsheets R and Rstudio | Data visualization |
10h45-12h45 | Introduction to R Starting with data | Data visualization (cont) and joining tables |
13h45-15h45 | Starting with data Manipulating and analyzing data with dplyr | Summary exercise |
16h-17h | Manipulating and analyzing data with dplyr | Further topics |
References are provided throughout the course. Several stand out however, as they cover large parts of the material or provide complementary resources.
The material for the first chapters, covering the Introduction to data science with R, is based on the Data Carpentry Ecology curiculum (Michonneau and Fournier 2019Michonneau, Francois, and Auriel Fournier, eds. 2019. “Data Carpentry: R for Data Analysis and Visualization of Ecological Data.” https://doi.org/10.5281/zenodo.569338.).
General references for this course are R for Data Science (Grolemund and Wickham 2017Grolemund, Garrett, and Hadley Wickham. 2017. R for Data Science. O’Reilly Media. https://r4ds.had.co.nz/.) and Bioinformatics Data Skills (Buffalo 2015Buffalo, Vince. 2015. Bioinformatics Data Skills. O’Reilly Media, Inc.).
The RStudio Cheat Sheets are also a handy resource and readers will be pointed to specific sheets in the respective chapters.
This training is organised by the SMCS in partnership with laurent Gatto, from the CBIO Lab in the de Duve Institute and is being taught by Axelle Loriot and Manon Martin at the UCLouvain, Belgium.
This material is written in R markdown (Allaire et al. 2021Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and Richard Iannone. 2021. Rmarkdown: Dynamic Documents for r. https://CRAN.R-project.org/package=rmarkdown.) and compiled as
a book using knitr
(Xie 2021bXie, Yihui. 2021b. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.) bookdown
(Xie 2021aXie, Yihui. 2021a. Bookdown: Authoring Books and Technical Documents with r Markdown. https://CRAN.R-project.org/package=bookdown.). The source
code is publicly available in a Github repository
https://github.com/UCLouvain-CBIO/bioinfo-training-01-intro-r
and the compiled material can be read at
https://uclouvain-cbio.github.io/bioinfo-training-01-intro-r
This material is licensed under the Creative Commons Attribution-ShareAlike 4.0 License.
For chapter 1 about Data organisation with Spreadsheets, a spreadsheet programme is necessary.
We will be using the R environment for statistical computing as main data science language. We will also use the RStudio interface to interact with R and write scripts and reports. Both R and RStudio are easy to install and works on all major operating systems.
Once R and RStudio are installed, a set of packages will need to be installed. See section 9.1 for details.
Page built: 2021-10-01 using R version 4.1.1 (2021-08-10)