Chapter 11 Conclusions

In this course, we have seen the importance of structured data. Good data structure starts with simple, tidy tabular data, whether it is manually encoded in spreadsheet, or handled in R as dataframes or tibbles. More complex data, that doesn’t fit in tabular data, can be modelled into dedicated objects that display specialised behaviour. Structured data allows us to reason on that data, without having to look it at. Reasoning on and generalisation of data in turn allows to manipulate and visualise it, i.e. to explore, analyse and understand it. The cherry on top of the data analysis cake is to be able to reproduce an analysis, either oneself or share it in a way that others can.

As mentioned in the preamble, the goal of this course is obviously not for students that take it to qualify as bioinformaticians at the end. However, what is important is to appreciate the importance of data and their analysis, and to become fluent in exploring, discussing and communicating around data. A shared appreciation of data and their complexity will hopefully reduce the distinction between bioinformaticians and experimental scientists. Indeed, at the end of the day, it’s useful to remember that

We are all biologists, in that we study biology. Some use wet lab experiments, others dry lab techniques.

11.1 Next steps

  • Statistics and machine learning (see your statistics courses and the follow course WSBIM1322).
  • Getting better at programming and data analysis. See (Grolemund and Wickham 2017Grolemund, Garrett, and Hadley Wickham. 2017. R for Data Science. O’Reilly Media. https://r4ds.had.co.nz/.) and (Wickham 2014aWickham, Hadley. 2014a. Advanced r. Chapman & Hall/CRC the r Series. Taylor & Francis.).
  • Evolving scripts into tools/packages (Wickham 2015Wickham, Hadley. 2015. R Packages. 1st ed. O’Reilly Media, Inc.).
  • Other tools: unix command line and git/GitHub (Perez-Riverol et al. 2016Perez-Riverol, Yasset, Laurent Gatto, Rui Wang, Timo Sachsenberg, Julian Uszkoreit, Felipe da Veiga Leprevost, Christian Fufezan, et al. 2016. “Ten Simple Rules for Taking Advantage of Git and GitHub.” PLOS Computational Biology 12 (7): 1–11. https://doi.org/10.1371/journal.pcbi.1004947.). See also this short tutorial.
  • Omics data analysis (see upcoming WSBIM2122 course).

11.2 Additional exercises

To answer the following exercises, you’ll need to resort to what you have learnt in various chapters.

► Question

Make sure you have rWSBIM1207 version >= 0.1.16 and load the 2022 Belgian road accidents statistics and the associated metadata, describing the variables. The path to the former as an rds file is available with road_accidents_be_2022.rds(). The road_accidents_be_meta.csv() returns the path to the metadata csv file.

The data provides the Number of killed, seriously injured, slightly injured and uninjured victims of road accidents, by age group, type of user, sex and various characteristics of the accident in Belgium in 20222.

  • Using the appropriate functions, load both files into R and familiarise yourself with the data.

  • Visualise the numbers for man and women over the hours of the day for all age classes. Ignore any unknown information. Do you see a difference between man and women?

  • Visualise the number of victims in the different provinces. Do this comparison for the different type of victims. Ignore any unknown information. Use lines and points for this visualisation.

  • Come up with additional visualisations that you could produce with these data. Use bar plots for this visualisation.

► Question

For this exercise, make sure you have rWSBIM1207 version >= 0.1.17. Using the population_be.csv() function, get the path to 35 files with the population numbers across multiple regions of Belgium up to 2023.

Tip: given that all files contain equivalent data (i.e. for the same variables), you can use read_csv() to load all files as once into a long table.

  • How many regions have been survey over the years?

  • What differences are there in terms of region?

Focusing on Belgium and the Brussels, Walloon and Flemish regions, and from 1991 on, generate one figure (possibly with multiple facets) that allows to answer the following questions:

  • Has the population increased since 1991?

  • Are there more women or men living in these regions? Have this changed since 1991?

  • Have the changes in population been driven by men, women or both equally?

  • What region has the biggest (lowest) population?

Page built: 2024-03-15 using R version 4.3.2 Patched (2023-12-27 r85757)