The WSBIM1322 course is teaches the basics of statistical data analysis applied to high throughput biology. It is aimed at biology and biomedical students that are already familiar with the R langauge (see the pre-requisits section below). The students will familiarise themselves with statitical learning concepts such as unsupervised and supervised learning, hypothesis testing, and extend their understanding and practive in R data structures and programming and the Bioconductor project.
The course will be followed by Omics data analysis (WSBIM2122).
Today, it is difficult to overestimate the very broad importance and impact of data. Given the abundance of data around us, and the sophistication of tools for their analysis and interpretation that are readily available, data has become a tool of profound social change. Resarch in general, and biomedical research in particular, is at the centre of this evolution. And while bioinformatics has been playing a central role in bio-medical research for many years now, bioinformatics skills aren’t well integrated in life science curricula, limiting students in their career prospects and research horizon (Wilson Sayres et al. 2018Wilson Sayres, M A, C Hauser, M Sierk, S Robic, A G Rosenwald, T M Smith, E W Triplett, et al. 2018. “Bioinformatics Core Competencies for Undergraduate Life Sciences Education.” PLoS One 13 (6): e0196878. https://doi.org/10.1371/journal.pone.0196878.). It is important for young researchers to acquire quantitative, computational and data skills to address the challenges that lie ahead.
This course will focus on the application of data analysis methods and algorithms, and the interpretation of their outputs. We will be using the R language and environment (R Core Team 2019R Core Team. 2019. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.) and the RStudio integrated development environment to acquire these data skills. Other interactive language such as Python and the interactive jupyer notebooks would also have been a good fit. One motivation of this choice is the availability of numerous R/Bionductor packages (Huber et al. 2015Huber, W, V J Carey, R Gentleman, S Anders, M Carlson, B S Carvalho, H C Bravo, et al. 2015. “Orchestrating High-Throughput Genomic Analysis with Bioconductor.” Nat Methods 12 (2): 115–21. https://doi.org/10.1038/nmeth.3252.) for the analysis of high throughput biology data.
References are provided throughout the course. Several stand out however, as they cover large parts of the material or provide complementary resources.
Modern Statistical for Modern Biology, by Susan Holmes and Wolfgang Huber (Holmes and Huber 2019Holmes, Susan, and Wolfgang Huber. 2019. Modern Statistics for Modern Biology. Cambridge Univeristy Press.). A free online version of the book is available here.
An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani (James et al. 2014James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2014. An Introduction to Statistical Learning: With Applications in r. Springer Publishing Company, Incorporated.). A free pdf of the book is available here.
This course is being tought by Prof Laurent Gatto with invaluable assistance from Dr Axelle Loriot at the Faculty of Pharmacy and Biomedical Sciences (FASB) at the UCLouvain, Belgium.
Students taking this course should be familiar with data analysis and visualisation in R. A formal pre-requisite for students taking the class is the introductory course WSBIM1207. The first chapter provides a refresher of the R skills needed for the rest of the course.
Software requirements are documented in the Setup section below.
This material is written in R markdown (Allaire et al. 2021Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and Richard Iannone. 2021. Rmarkdown: Dynamic Documents for r. https://CRAN.R-project.org/package=rmarkdown.) and compiled as a
knitr (Xie 2021bXie, Yihui. 2021b. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.)
bookdown (Xie 2021aXie, Yihui. 2021a. Bookdown: Authoring Books and Technical Documents with r Markdown. https://CRAN.R-project.org/package=bookdown.). The source
code is publicly available in a Github repository
and the compiled material can be read at http://bit.ly/WSBIM1322.
Contributions to this material are welcome. The best way to contribute or contact the maintainers is by means of pull requests and issues. Please familiarise yourself with the code of conduct. By participating in this project you agree to abide by its terms.
If you use this course, please cite it as
Laurent Gatto. UCLouvain-CBIO/WSBIM1322: Bioinformatics. https://github.com/UCLouvain-CBIO/WSBIM1322.
This material is licensed under the Creative Commons Attribution-ShareAlike 4.0 License.
We will be using the R environment for statistical computing as main data science language. We will also use the RStudio interface to interact with R and write scripts and reports. Both R and RStudio are easy to install and works on all major operating systems.
Once R and RStudio are installed, a set of packages will need to be installed. See section 13.1 for details.
rWSBIM1322 package provides some pre-formatted data used in this
course. It can be installed with
and then loaded with
To build this book, you’ll need
bookdown (Xie 2021aXie, Yihui. 2021a. Bookdown: Authoring Books and Technical Documents with r Markdown. https://CRAN.R-project.org/package=bookdown.) and a
fork1 https://github.com/lgatto/msmbstyle of
style (Smith 2021Smith, Mike. 2021. Msmbstyle: MSMB Styles for r Markdown Documents.).
In the course’s work directory, simply type
Page built: 2021-12-06 using R version 4.1.2 (2021-11-01)