Preamble

The WSBIM1322 course teaches the basics of statistical data analysis applied to high throughput biology. It is aimed at biology and biomedical students that are already familiar with the R langauge (see the pre-requisites section below). The students will familiarise themselves with statistical learning concepts such as unsupervised and supervised learning, hypothesis testing, and extend their understanding and practice in R data structures and programming and the Bioconductor project.

The course will be followed by Omics data analysis (WSBIM2122).

Motivation

Today, it is difficult to overestimate the very broad importance and impact of data. Given the abundance of data around us, and the sophistication of tools for their analysis and interpretation that are readily available, data has become a tool of profound social change. Research in general, and biomedical research in particular, is at the centre of this evolution. And while bioinformatics has been playing a central role in bio-medical research for many years now, bioinformatics skills aren’t well integrated in life science curricula, limiting students in their career prospects and research horizon (Wilson Sayres et al. 2018Wilson Sayres, M A, C Hauser, M Sierk, S Robic, A G Rosenwald, T M Smith, E W Triplett, et al. 2018. “Bioinformatics Core Competencies for Undergraduate Life Sciences Education.” PLoS One 13 (6): e0196878. https://doi.org/10.1371/journal.pone.0196878.). It is important for young researchers to acquire quantitative, computational and data skills to address the challenges that lie ahead.

This course will focus on the application of data analysis methods and algorithms, and the interpretation of their outputs. We will be using the R language and environment (R Core Team 2019R Core Team. 2019. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.) and the RStudio integrated development environment to acquire these data skills. Other interactive language such as Python and the interactive jupyer notebooks would also have been a good fit. One motivation of this choice is the availability of numerous R/Bionductor packages (Huber et al. 2015Huber, W, V J Carey, R Gentleman, S Anders, M Carlson, B S Carvalho, H C Bravo, et al. 2015. “Orchestrating High-Throughput Genomic Analysis with Bioconductor.” Nat Methods 12 (2): 115–21. https://doi.org/10.1038/nmeth.3252.) for the analysis of high throughput biology data.

Below, you can find three short videos (in French, with subtitles in multiple languages) by PhD students that assist in the teaching of this course. They will provide you with real-work applications of some of the concepts taught in this course.

  • Julie Devis is pursuing a PhD in the Computational Biology and Bioinformatics Unit with Prof Laurent Gatto. She uses R and Bioconductor to study the methylation in cancer germ-line genes.
  • Valentine Robaux is pursuing a PhD in the Cardiovascular Research Unit with Profs Sandrine Horman and Christophe Beauloye. She uses R and Bioconductor to investigate platelet function.
  • Jean Fain is pursuing a PhD in the Epigenetics Unit with Prof Charles De Smet. He uses R and Bioconductor to study DNA methylation and its role in cancer.

References and credits

References are provided throughout the course. Several stand out however, as they cover large parts of the material or provide complementary resources.

  • Modern Statistics for Modern Biology, by Susan Holmes and Wolfgang Huber (Holmes and Huber 2019Holmes, Susan, and Wolfgang Huber. 2019. Modern Statistics for Modern Biology. Cambridge Univeristy Press.). A free online version of the book is available here.

  • An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani (James et al. 2014James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2014. An Introduction to Statistical Learning: With Applications in r. Springer Publishing Company, Incorporated.). A free pdf of the book is available here.

This course is being taught by Prof Laurent Gatto with invaluable assistance from Dr Axelle Loriot at the Faculty of Pharmacy and Biomedical Sciences (FASB) at the UCLouvain, Belgium.

Pre-requisites

Students taking this course should be familiar with data analysis and visualisation in R. A formal pre-requisite for students taking the class is the introductory course WSBIM1207. The first chapter provides a refresher of the R skills needed for the rest of the course.

Software requirements are documented in the Setup section below.

About this course material

This material is written in R markdown (Allaire et al. 2023Allaire, JJ, Yihui Xie, Christophe Dervieux, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, et al. 2023. Rmarkdown: Dynamic Documents for r. https://github.com/rstudio/rmarkdown.) and compiled as a book using knitr (Xie 2023bXie, Yihui. 2023b. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.) bookdown (Xie 2023aXie, Yihui. 2023a. Bookdown: Authoring Books and Technical Documents with r Markdown. https://github.com/rstudio/bookdown.). The source code is publicly available in a Github repository https://github.com/uclouvain-cbio/WSBIM1322 and the compiled material can be read at http://bit.ly/WSBIM1322.

Contributions to this material are welcome. The best way to contribute or contact the maintainers is by means of pull requests and issues. Please familiarise yourself with the code of conduct. By participating in this project you agree to abide by its terms.

Citation

If you use this course, please cite it as

Laurent Gatto. UCLouvain-CBIO/WSBIM1322: Bioinformatics. https://github.com/UCLouvain-CBIO/WSBIM1322.

License

This material is licensed under the Creative Commons Attribution-ShareAlike 4.0 License.

Setup

We will be using the R environment for statistical computing as main data science language. We will also use the RStudio interface to interact with R and write scripts and reports. Both R and RStudio are easy to install and works on all major operating systems.

Once R and RStudio are installed, a set of packages will need to be installed. See section 13.1 for details.

The rWSBIM1322 package provides some pre-formatted data used in this course. It can be installed with

BiocManager::install("UCLouvain-CBIO/rWSBIM1322")

and then loaded with

library("rWSBIM1322")

Page built: 2023-11-27 using R version 4.3.1 Patched (2023-07-10 r84676)