The result of a quantitative analysis is a list of peptide and/or protein abundances for every protein in different samples, or abundance ratios between the samples. In this tutorial we will describe a generic workflow for differential analysis of quantitative datasets with simple experimental designs.
Our first case-study is a subset of the data of the 6th study of the Clinical Proteomic Technology Assessment for Cancer (CPTAC). In this experiment, the authors spiked the Sigma Universal Protein Standard mixture 1 (UPS1) containing 48 different human proteins in a protein background of 60 ng/μL Saccharomyces cerevisiae strain BY4741 (MATa, leu2Δ0, met15Δ0, ura3Δ0, his3Δ1). Two different spike-in concentrations were used: 6A (0.25 fmol UPS1 proteins/μL) and 6B (0.74 fmol UPS1 proteins/μL) [5]. We limited ourselves to the data of LTQ-Orbitrap W at site 56. The data were searched with MaxQuant version 1.5.2.8, and detailed search settings were described in Goeminne et al. (2016) [1]. Three replicates are available for each concentration.
The study is a spike-in study for which we know the ground truth so we have the ability to evaluate the quality of the fold change estimates and the list of DE genes that we return with a method.
We first assess the quality of the fold change estimates for the median summarization.
An rmarkdown notebook for the analysis can be downloaded here: cptacAvsB_lab3_median.Rmd and cptacAvsB_lab3_median.html.
Save the script as cptac_lab3_robust.Rmd and alter the script so to summarize the results using robust summarization, i.e. replace the argument method="median"
in the combineFeatures
function to method=robust
.
combineFeatures(pepData, fcol = "Proteins", method = "robust")
Eighteen Estrogen Receptor Positive Breast cancer tissues from from patients treated with tamoxifen upon recurrence have been assessed in a proteomics study. Nine patients had a good outcome (or) and the other nine had a poor outcome (pd). The proteomes have been assessed using an LTQ-Orbitrap and the thermo output .RAW files were searched with MaxQuant (version 1.4.1.2) against the human proteome database (FASTA version 2012-09, human canonical proteome).
Three peptides txt files are available:
Adjust the Rmarkdown file.
Intensity.
use this string as the pattern
argument for the str_replace
function used in Section 1. Data.selectFeatureData(pepData)
function and alter the fcol
accordingly in Section 1. Data.substr(1,1)
statement at the end of Section 1. Data accordingly.Change the names of the filter variables according to the names you have selected in the selectFeatureData.
We have to alter the makeContrast
statement, because the factor condition now have different names for each of the levels.
What do you observe if you compare the output of the 3x3 and the 9x9 analyses, try to explain?
Duguet et al. 2017 compared the proteomes of mouse regulatory T cells (Treg) and conventional T cells (Tconv) in order to discover differentially regulated proteins between these two cell populations. For each biological repeat the proteomes were extracted for both Treg and Tconv cell pools, which were purified by flow cytometry. The data in data/quantification/mouseTcell on the pdaData repository are a subset of the data PXD004436 on PRIDE.
Three subsets of the data are avialable:
Alter the cancer_3x3 script for the analysis of the Mouse T-cell example.