1 Background

Duguet et al. 2017 compared the proteomes of mouse regulatory T cells (Treg) and conventional T cells (Tconv) in order to discover differentially regulated proteins between these two cell populations. For each biological repeat the proteomes were extracted for both Treg and Tconv cell pools, which were purified by flow cytometry. The data in data/quantification/mouseTcell on the pdaData repository are a subset of the data PXD004436 on PRIDE.

We will use a subset of the data with a randomized complete block (RCB) design, i.e. the dataset consists of four mice for which the proteome of both conventional and regulatory T cells are assessed.

2 Data

We first import the peptides.txt file. This is the file that contains your peptide-level intensities. For a MaxQuant search [6], this peptides.txt file can be found by default in the “path_to_raw_files/combined/txt/” folder from the MaxQuant output, with “path_to_raw_files” the folder where raw files were saved. In this tutorial, we will use a MaxQuant peptides file from MaxQuant that can be found on the pdaData repository. We will use the MSnbase package to import the data.

We generate the object peptideFile with the path to the peptides.txt file. In this file we will fetch the data from the github repository linked to the pda course: https://statomics.github.io/pda. You can also replace the peptideFile with a string that points to the path of a file on your local hard drive.

With the grepEcols function we find the columns that are containing the expression data of the peptides in the peptides.txt file.

library(MSqRob)
library(MSnbase)
library(tidyverse)
library(limma)
peptidesFile <- "https://raw.githubusercontent.com/statOmics/pda/data/quantification/mouseTcell/peptidesRCB.txt"
proteinGroupsFile <-"https://raw.githubusercontent.com/statOmics/pda/data/quantification/mouseTcell/proteinGroups.txt"
ecols<-grepEcols(peptidesFile, "Intensity ", split = "\t")

Next we import the peptide intensities

pepData<-readMSnSet2(peptidesFile,ecol=ecols,fnames="Sequence",sep="\t")
pepData

## MSnSet (storageMode: lockedEnvironment)
## assayData: 55814 features, 8 samples 
##   element names: exprs 
## protocolData: none
## phenoData: none
## featureData
##   featureNames: AAAAAAAAAAGAAGGR AAAAAAAAAAGDSDSWDADTFSMEDPVRK ...
##     YYYDKNIIHK (55814 total)
##   fvarLabels: Sequence N.term.cleavage.window ... MS.MS.Count (74
##     total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:  
## - - - Processing information - - -
##  MSnbase version: 2.10.1

The pepData object is an MSnSet, a container for the data, features information and experimental annotation. They can be accessed using the accessor functions ‘exprs’ (matrix of intensities, features in rows, samples in columns), ‘fData’ (properties for each feature, peptide or protein, on the rows) and ‘pData’ (properties for the samples on the columns).

We will make use from data wrangling functionalities from the tidyverse package. The %>% operator allows us to pipe the output of one function to the next function.

head(exprs(pepData))

##                               Intensity.Tconv.M12_2 Intensity.Tconv.M12_3
## AAAAAAAAAAGAAGGR                          111470000             101350000
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK              13303000              10160000
## AAAAAAAGGAAIAVSTGIETATIQK                         0                     0
## AAAAAAGAASGIPGPVAQGIK                             0               7197400
## AAAAAAGPEMVR                              104440000             101950000
## AAAAAAPGGGGGEPR                            11910000              12683000
##                               Intensity.Tconv.M5_inj1
## AAAAAAAAAAGAAGGR                               965070
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK                  235500
## AAAAAAAGGAAIAVSTGIETATIQK                       48225
## AAAAAAGAASGIPGPVAQGIK                          266620
## AAAAAAGPEMVR                                   947110
## AAAAAAPGGGGGEPR                                190550
##                               Intensity.Tconv.M6_inj1 Intensity.Treg.M12_2
## AAAAAAAAAAGAAGGR                                    0             80606000
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK                 6744500                    0
## AAAAAAAGGAAIAVSTGIETATIQK                      314120                    0
## AAAAAAGAASGIPGPVAQGIK                         2904800             12602000
## AAAAAAGPEMVR                                 12747000             44735000
## AAAAAAPGGGGGEPR                               2166100             12555000
##                               Intensity.Treg.M12_3 Intensity.Treg.M5_inj1
## AAAAAAAAAAGAAGGR                         120130000                1668800
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK             14499000                 437460
## AAAAAAAGGAAIAVSTGIETATIQK                        0                 173070
## AAAAAAGAASGIPGPVAQGIK                     14162000                 520550
## AAAAAAGPEMVR                              70512000                1604600
## AAAAAAPGGGGGEPR                           13873000                 361240
##                               Intensity.Treg.M6_inj1
## AAAAAAAAAAGAAGGR                                   0
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK                3881000
## AAAAAAAGGAAIAVSTGIETATIQK                     317830
## AAAAAAGAASGIPGPVAQGIK                        1608500
## AAAAAAGPEMVR                                 4689900
## AAAAAAPGGGGGEPR                              1168900

pepData %>% exprs %>% head

##                               Intensity.Tconv.M12_2 Intensity.Tconv.M12_3
## AAAAAAAAAAGAAGGR                          111470000             101350000
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK              13303000              10160000
## AAAAAAAGGAAIAVSTGIETATIQK                         0                     0
## AAAAAAGAASGIPGPVAQGIK                             0               7197400
## AAAAAAGPEMVR                              104440000             101950000
## AAAAAAPGGGGGEPR                            11910000              12683000
##                               Intensity.Tconv.M5_inj1
## AAAAAAAAAAGAAGGR                               965070
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK                  235500
## AAAAAAAGGAAIAVSTGIETATIQK                       48225
## AAAAAAGAASGIPGPVAQGIK                          266620
## AAAAAAGPEMVR                                   947110
## AAAAAAPGGGGGEPR                                190550
##                               Intensity.Tconv.M6_inj1 Intensity.Treg.M12_2
## AAAAAAAAAAGAAGGR                                    0             80606000
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK                 6744500                    0
## AAAAAAAGGAAIAVSTGIETATIQK                      314120                    0
## AAAAAAGAASGIPGPVAQGIK                         2904800             12602000
## AAAAAAGPEMVR                                 12747000             44735000
## AAAAAAPGGGGGEPR                               2166100             12555000
##                               Intensity.Treg.M12_3 Intensity.Treg.M5_inj1
## AAAAAAAAAAGAAGGR                         120130000                1668800
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK             14499000                 437460
## AAAAAAAGGAAIAVSTGIETATIQK                        0                 173070
## AAAAAAGAASGIPGPVAQGIK                     14162000                 520550
## AAAAAAGPEMVR                              70512000                1604600
## AAAAAAPGGGGGEPR                           13873000                 361240
##                               Intensity.Treg.M6_inj1
## AAAAAAAAAAGAAGGR                                   0
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK                3881000
## AAAAAAAGGAAIAVSTGIETATIQK                     317830
## AAAAAAGAASGIPGPVAQGIK                        1608500
## AAAAAAGPEMVR                                 4689900
## AAAAAAPGGGGGEPR                              1168900

pepData %>% sampleNames

## [1] "Intensity.Tconv.M12_2"   "Intensity.Tconv.M12_3"  
## [3] "Intensity.Tconv.M5_inj1" "Intensity.Tconv.M6_inj1"
## [5] "Intensity.Treg.M12_2"    "Intensity.Treg.M12_3"   
## [7] "Intensity.Treg.M5_inj1"  "Intensity.Treg.M6_inj1"

The sample names are rather long and contain information on the spike-in concentration and the repeat. We will remove “Intensity.” from the string

sampleNames(pepData) <- pepData %>% sampleNames %>% str_replace(., pattern="Intensity.", replacement="")

Next we will add information on the proteins to the feature data. The feature data of the object pepData contains information for each peptide in the experiment. This info on the proteins can be found in the proteingroups.txt file of maxquant. The peptides.txt file contains many data on each feature (peptide). We will only retain specified columns Protein, Sequence, Reverse and Potential.Contaminant, indicating the protein, peptide sequence, if the peptide is a decoy and if it is a contaminant, respectively.

In older and more recent versions of maxquant the names of the fields might vary. We can therefore explore the names we want to select with the shiny app that can be launched from the selectFeatureData from the MSnbase package, i.e. by running the function without an fcol argument that specifies the columns you would like to select.

The code can be found below, but is not executed by this Rmd script (note, that the flag eval=FALSE has been used as option for the R chunk).

pepData<-selectFeatureData(pepData)

If we know the columns, we can specify them as a vector in the fcol slot.

pepData<-selectFeatureData(pepData,fcol=c("Proteins","Protein.names","Gene.names", "Potential.contaminant","Reverse","Sequence"))

Calculate how many non zero intensities we have per peptide. This will be useful for filtering.

fData(pepData)$nNonZero<- rowSums(exprs(pepData) > 0)

Peptides with zero intensities are missing peptides and should be represent with a NA value instead of 0.

exprs(pepData)[exprs(pepData)==0]<-NA

Next we create the data on the experimental layout. We can do this based on the samplenames. We will add an effect for cell type and also for the animal (mouse ID).

pData(pepData)$cellType <- pepData %>% sampleNames %>% substr(1,4) %>% as.factor
pData(pepData)$mouseID <- factor(paste0("m",rep(1:4,2)))

We will also replace the sample names to make them shorter

sampleNames(pepData)<-pepData %>% sampleNames %>% substr(1,4) %>% paste0(.,"_m",rep(1:4,2))

2.1 Data exploration

We can inspect the missingness in our data with the plotNA() function provided with MSnbase. 38% of all peptide intensities are missing and for some peptides we don’t even measure a signal in any sample. The missingness is similar across samples. Note, that we plot the peptide data, so the label protein in the plot refers to peptides.

plotNA(pepData)

When we look at density plots of the intensities, we see that most intensities are low and that there is a long tail to the right. It is more useful to explore the data upon log transformation.

plotDensities(exprs(pepData))

nrow(pepData)

## [1] 55814

There are 55814 peptides before preprocessing.

3 Preprocessing

We will log transform, normalize, filter and summarize the data.

3.1 Log transform the data

pepData <- log(pepData, base = 2)
plotDensities(exprs(pepData))

When we look at the densities we see small shifts in location and shape of the density distributions. So normalisation will be required. But, we will first perform filtering.

3.2 Filtering

3.2.1 Handling overlapping protein groups

In our approach a peptide can map to multiple proteins, as long as there is none of these proteins present in a smaller subgroup.

pepData<-pepData[fData(pepData)$Proteins %in% smallestUniqueGroups(fData(pepData)$Proteins),]

3.2.2 Remove reverse sequences (decoys) and contaminants

We now remove the contaminants, peptides that map to decoy sequences and proteins, which were only identified by peptides with modifications.

pepData <- pepData[fData(pepData)$Reverse!="+",]
pepData <- pepData[fData(pepData)$Potential.contaminant!="+",]

3.2.3 Remove peptides of proteins that were only identified with modified peptides

Proteins for which all peptides are carrying modifications (PTMs) can be considered as unreliable. We will filter out these proteins. This information is included in the Only.identified.by.site column of proteinGroups.txt file of maxquant. The variable is containing a “+” character if the protein is only identified by modified peptides and an empty string “” if this is not the case. Sometimes an NA value is also present. We will replace these by the empty character string. The information of the peptides.txt file can be linked to the proteinGroups.txt file by using the Proteins column from the peptides.txt file and the Protein.IDs column in the proteinGroups.txt file. If NAs are included in the

proteinGroups<-read.table(proteinGroupsFile, sep = "\t", header = TRUE, quote = "", comment.char = "")
only_site <- proteinGroups$Only.identified.by.site
only_site[is.na(only_site)] <- ""
proteinsOnlyIdentifiedWithPTMs <- proteinGroups$Protein.IDs[only_site=="+"]
pepData<-pepData[!(fData(pepData)$Proteins %in% proteinsOnlyIdentifiedWithPTMs),]

We will now remove the proteinGroups object from the R session to free memory.

rm(proteinGroups,only_site)
gc()

##           used  (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
## Ncells 4158129 222.1    8603494 459.5         NA  5452427 291.2
## Vcells 8237701  62.9   24434430 186.5      16384 24430597 186.4

3.2.4 Drop peptides that were only identified in one sample

We want to keep peptide that were at least observed twice.

pepData<-pepData[fData(pepData)$nNonZero>=2,]
nrow(pepData)

## [1] 44396

We keep 44396 peptides upon filtering.

3.3 Quantile normalize the data

pepData <- normalise(pepData, "quantiles")

3.4 Explore quantile normalized data

Upon normalisation the density curves for all samples coincide.

plotDensities(exprs(pepData))

We can visualize our data using a Multi Dimensional Scaling plot, eg. as provided by the limma package.

plotMDS(exprs(pepData),col=as.double(pData(pepData)$cellType))

The first axis in the plot is showing the leading log fold changes (differences on the log scale) between the samples. We notice that the leading differences (log FC) in the peptide data seems to be driven by technical variability. Indeed the samples do not seem to be clearly separated according to the spike in condition. Because there are missing values for the peptide level data we use the na.rm=TRUE argument to summarize the data based on the observed peptides for every protein.

3.5 Summarization to protein level

protData<-combineFeatures(pepData,fcol="Proteins",method="robust",na.rm=TRUE)

## Your data contains missing values. Please read the relevant section
## in the combineFeatures manual page for details the effects of
## missing values on data aggregation.

If you summarize using robust summarization please refer to Sticker et al. (2019) https://www.biorxiv.org/content/10.1101/668863v1

We filter proteins for which we have at least two protein expression values in both conditions:

protData<-protData[
protData %>% exprs %>% is.na %>% `!` %>% rowSums>4,]

plotMDS(exprs(protData),col=as.double(pData(pepData)$cellType))

4 Data Analysis

4.1 Estimation

MSqRob is currently working with a format where we have one dataframe for each protein. This will be changed in the next release. Therefore we first have to reorganise the data.

Next the models are fitted. This is done using the fit.model function. We only have to model the data using the factor condition from the pheno data of the protein level MSnSet. The name of the factor variable is specified in the fixed argument (if multiple predictors have to be incorporated in the model, a vector of variable names has to be provided in this argument.). The argument shrinkage is used to specify if ridge regression has to be adopted. For the sake of speed we do not do this in the tutorial. The shrinkage has to be specified for each variable in the fixed effects. We also have to indicate this for the intercept (which we never shrink). So we specify it at c(0,0) to indicate that the intercept (first 0) and the parameters for the factor condition (second 0) are not penalized. We set the robust_var function equal to FALSE, this functionality will be removed from the package in the next release.

protMSqRob <- MSnSet2protdata(protData, "Proteins",annotations=c("Protein.names","Gene.names"))
models <- fit.model(protdata=protMSqRob, response="quant_value", fixed=c("cellType","mouseID"),shrinkage.fixed=c(0,0),robust_var=FALSE)

4.2 Inference

Many biologists have problems with the reference coding. In MSqRob we have opted to formulate contrasts using all levels of a factor. Internally, the contrasts are than recasted according to the factor level that is the reference class.

L <- makeContrast("cellTypeTreg - cellTypeTcon",  levels=c("cellTypeTreg","cellTypeTcon"))
tests <- test.contrast_adjust(models, L)
nSig <- sum(tests$signif,na.rm=TRUE)
head(tests,nSig)

##                                                                                                                   Protein.names
## P24452                                                                                               Macrophage-capping protein
## P21550                                                                                                             Beta-enolase
## P08207                                                                                                         Protein S100-A10
## O09131                                                                                        Glutathione S-transferase omega-1
## Q7TPR4                                                                                                          Alpha-actinin-1
## Q9D3P8                                                                                                Plasminogen receptor (KT)
## Q8BGW0                                                                                                           Protein THEMIS
## Q80SU7                                                                                   Interferon-induced very large GTPase 1
## P13020                                                                                                                 Gelsolin
## P07356                                                                                                               Annexin A2
## Q8BFP9                                           [Pyruvate dehydrogenase (acetyl-transferring)] kinase isozyme 1, mitochondrial
## Q9JJU8                                                                       SH3 domain-binding glutamic acid-rich-like protein
## Q9QXY6                                                                                           EH domain-containing protein 3
## Q61205                                                              Platelet-activating factor acetylhydrolase IB subunit gamma
## Q9CYL5                                                                    Golgi-associated plant pathogenesis-related protein 1
## Q61503                                                                                                           5-nucleotidase
## P07091                                                                                                          Protein S100-A4
## Q9DC16                                                           Endoplasmic reticulum-Golgi intermediate compartment protein 1
## P07742                                                                       Ribonucleoside-diphosphate reductase large subunit
## Q99N69                                                                                                                 Leupaxin
## P54071                                                                           Isocitrate dehydrogenase [NADP], mitochondrial
## P16045                                                                                                               Galectin-1
## Q60611                                                                                                DNA-binding protein SATB1
## Q9R1Q7                                                                                                    Proteolipid protein 2
## P70677                                                                    Caspase-3;Caspase-3 subunit p17;Caspase-3 subunit p12
## P30285                                                                                                Cyclin-dependent kinase 4
## Q9QYB5                                                                                                            Gamma-adducin
## Q4QQM4                                                                                   Tumor protein p53-inducible protein 11
## Q9WUU8                                                                                            TNFAIP3-interacting protein 1
## Q3UW53                                                                                                            Protein Niban
## P09055                                                                                                          Integrin beta-1
## Q8BGC4                                                           Zinc-binding alcohol dehydrogenase domain-containing protein 2
## Q64521                                                                        Glycerol-3-phosphate dehydrogenase, mitochondrial
## Q60710                                                                  Deoxynucleoside triphosphate triphosphohydrolase SAMHD1
## P57016                                                                                                                Ladinin-1
## Q03267                                                                                               DNA-binding protein Ikaros
## Q8CG47                                                                          Structural maintenance of chromosomes protein 4
## P42230                                                                      Signal transducer and activator of transcription 5A
## P15307                                                                                                     Proto-oncogene c-Rel
## Q9QXG4                                                                                Acetyl-coenzyme A synthetase, cytoplasmic
## Q9JMH9                                                                                             Unconventional myosin-XVIIIa
## P37913                                                                                                             DNA ligase 1
## Q00417                                                                                                   Transcription factor 7
## Q99KE1                                                                                NAD-dependent malic enzyme, mitochondrial
## Q8VCT3                                                                                                         Aminopeptidase B
## Q6PD03                                          Serine/threonine-protein phosphatase 2A 56 kDa regulatory subunit alpha isoform
## P48758                                                                                             Carbonyl reductase [NADPH] 1
## P45377                                                                                       Aldose reductase-related protein 2
## P49718                                                                                    DNA replication licensing factor MCM5
## O70404                                                                                    Vesicle-associated membrane protein 8
## Q9CQ62                                                                                 2,4-dienoyl-CoA reductase, mitochondrial
## O88508                                                                                    DNA (cytosine-5)-methyltransferase 3A
## P29452                                                                    Caspase-1;Caspase-1 subunit p20;Caspase-1 subunit p10
## P70236                                                               Dual specificity mitogen-activated protein kinase kinase 6
## Q9Z0S1                                                                                       3(2),5-bisphosphate nucleotidase 1
## P16546                                                                                 Spectrin alpha chain, non-erythrocytic 1
## Q8VCW8                                                                       Acyl-CoA synthetase family member 2, mitochondrial
## Q9WTK5                                             Nuclear factor NF-kappa-B p100 subunit;Nuclear factor NF-kappa-B p52 subunit
## O55101                                                                                                           Synaptogyrin-2
## P20444                                                                                              Protein kinase C alpha type
## Q8CFB4                                                                                              Guanylate-binding protein 5
## O70400                                                                                             PDZ and LIM domain protein 1
## Q8BP56                                                                                            Acid trehalase-like protein 1
## P19182                                                                             Interferon-related developmental regulator 1
## Q62261                                                                                  Spectrin beta chain, non-erythrocytic 1
## Q99MN9                                                                      Propionyl-CoA carboxylase beta chain, mitochondrial
## Q8C0L6                                                                      Peroxisomal N(1)-acetyl-spermine/spermidine oxidase
## P70302                                                                                           Stromal interaction molecule 1
## Q8C142                                                                       Low density lipoprotein receptor adapter protein 1
## Q8K157                                                                                                       Aldose 1-epimerase
## P00329                                                                                                  Alcohol dehydrogenase 1
## Q9QYC0                                                                                                            Alpha-adducin
## Q8BMD8                                                                    Calcium-binding mitochondrial carrier protein SCaMC-1
## Q80U28                                                                               MAP kinase-activating death domain protein
## Q04447                                                                                                   Creatine kinase B-type
## P47856                                                         Glutamine--fructose-6-phosphate aminotransferase [isomerizing] 1
## O88673                                                                                              Diacylglycerol kinase alpha
## Q5FWK3                                                                                          Rho GTPase-activating protein 1
## P83093                                                                                           Stromal interaction molecule 2
## P29391                                                                                                   Ferritin light chain 1
## Q05186                                                                                                         Reticulocalbin-1
## Q922J3                                                                               CAP-Gly domain-containing linker protein 1
## Q8BIG7                                                                 Catechol O-methyltransferase domain-containing protein 1
## P25799                                             Nuclear factor NF-kappa-B p105 subunit;Nuclear factor NF-kappa-B p50 subunit
## Q8BH59                                                                    Calcium-binding mitochondrial carrier protein Aralar1
## Q8K296                                                                                           Myotubularin-related protein 3
## P97310                                                                                    DNA replication licensing factor MCM2
## Q3TBT3                                                                                   Stimulator of interferon genes protein
## Q62422                                                                                          Osteoclast-stimulating factor 1
## P50431                                                                               Serine hydroxymethyltransferase, cytosolic
## Q8R4N0                                                                   Citrate lyase subunit beta-like protein, mitochondrial
## P14094                                                                      Sodium/potassium-transporting ATPase subunit beta-1
## Q8CFK6                                                                                        DENN domain-containing protein 1C
## P50096                                                                                  Inosine-5-monophosphate dehydrogenase 1
## Q91VV4                                                                                        DENN domain-containing protein 2D
## P61028                                                                                               Ras-related protein Rab-8B
## O08807                                                                                                          Peroxiredoxin-4
## O70293                                                                                      G protein-coupled receptor kinase 6
## Q8BUV3                                           Gephyrin;Molybdopterin adenylyltransferase;Molybdopterin molybdenumtransferase
## P54310                                                                                                 Hormone-sensitive lipase
## Q9QUG9                                                                                           RAS guanyl-releasing protein 2
## Q9Z2L7                                                                                          Cytokine receptor-like factor 3
## Q80XN0                                                                      D-beta-hydroxybutyrate dehydrogenase, mitochondrial
## Q61107                                                                                              Guanylate-binding protein 4
## O70172                                                                   Phosphatidylinositol 5-phosphate 4-kinase type-2 alpha
## Q9JIY5                                                                                     Serine protease HTRA2, mitochondrial
## O70370                                                                                                              Cathepsin S
## Q3UDE2                                                                                 Tubulin--tyrosine ligase-like protein 12
## P56395                                                                                                            Cytochrome b5
## Q6Q899                                                                                Probable ATP-dependent RNA helicase DDX58
## Q9WU84                                                                                Copper chaperone for superoxide dismutase
## P18654                                                                                      Ribosomal protein S6 kinase alpha-3
## Q3UN02                                                                                        Lysocardiolipin acyltransferase 1
## Q8QZT1                                                                              Acetyl-CoA acetyltransferase, mitochondrial
## Q9DBX6                                                                                                      Cytochrome P450 2S1
## Q7TNG5                                                                         Echinoderm microtubule-associated protein-like 2
## Q61462                                                                                             Cytochrome b-245 light chain
## P54227                                                                                                                 Stathmin
## Q8R001                                                                     Microtubule-associated protein RP/EB family member 2
## Q6P9R4                                                                                Rho guanine nucleotide exchange factor 18
## Q8R502                                                                            Volume-regulated anion channel subunit LRRC8C
## C3VPR6                                                                                                            Protein NLRC5
## P28867 Protein kinase C delta type;Protein kinase C delta type regulatory subunit;Protein kinase C delta type catalytic subunit
## P49717                                                                                    DNA replication licensing factor MCM4
## Q68FL4                                                                                        Putative adenosylhomocysteinase 3
## P51855                                                                                                   Glutathione synthetase
## Q9CXJ1                                                                           Probable glutamate--tRNA ligase, mitochondrial
## Q99ME2                                                                                           WD repeat-containing protein 6
## P97311                                                                                    DNA replication licensing factor MCM6
## P10630                                                                                       Eukaryotic initiation factor 4A-II
## Q921I1                                                                                                          Serotransferrin
## Q9D0K2                                                          Succinyl-CoA:3-ketoacid coenzyme A transferase 1, mitochondrial
##        Gene.names   estimate        se       df       Tval         pval
## P24452       Capg  2.8383340 0.1076143 5.549307  26.375060 4.790894e-07
## P21550       Eno3  2.6243616 0.1328974 5.549307  19.747280 2.351907e-06
## P08207    S100a10  2.6113767 0.1642877 5.549307  15.895148 7.699407e-06
## O09131      Gsto1  3.1603951 0.1446574 4.549307  21.847456 8.802266e-06
## Q7TPR4      Actn1 -1.5843624 0.1132674 5.549307 -13.987806 1.541693e-05
## Q9D3P8     Plgrkt  1.5430021 0.1143314 5.549307  13.495873 1.871294e-05
## Q8BGW0     Themis -2.0237546 0.1537470 5.549307 -13.162886 2.141744e-05
## Q80SU7      Gvin1  1.4842622 0.1150625 5.549307  12.899614 2.388478e-05
## P13020        Gsn -1.6201620 0.1268165 5.549307 -12.775641 2.516151e-05
## P07356      Anxa2  2.2241620 0.1826479 5.549307  12.177319 3.257437e-05
## Q8BFP9       Pdk1 -1.5355360 0.1304300 5.549307 -11.772878 3.905206e-05
## Q9JJU8    Sh3bgrl  1.5178392 0.1289359 5.549307  11.772047 3.906684e-05
## Q9QXY6       Ehd3 -1.2763696 0.1104727 5.549307 -11.553709 4.319080e-05
## Q61205   Pafah1b3  1.2694876 0.1117895 5.549307  11.356059 4.737080e-05
## Q9CYL5     Glipr2  1.1747199 0.1060810 5.549307  11.073802 5.419292e-05
## Q61503       Nt5e  2.6069756 0.2385794 5.549307  10.927079 5.819204e-05
## P07091     S100a4  4.8972495 0.3441184 4.549307  14.231293 6.036953e-05
## Q9DC16     Ergic1  2.2588641 0.2183354 5.549307  10.345842 7.784621e-05
## P07742       Rrm1  1.1504309 0.1155599 5.549307   9.955273 9.546869e-05
## Q99N69       Lpxn  1.3028427 0.1309800 5.549307   9.946883 9.589571e-05
## P54071       Idh2 -1.3555077 0.1391533 5.549307  -9.741108 1.071046e-04
## P16045     Lgals1  1.5402209 0.1611947 5.549307   9.555032 1.185817e-04
## Q60611      Satb1 -1.7864732 0.1922832 5.549307  -9.290844 1.374473e-04
## Q9R1Q7       Plp2  1.7130527 0.1852884 5.549307   9.245332 1.410415e-04
## P70677      Casp3  1.5191831 0.1650921 5.549307   9.202034 1.445633e-04
## P30285       Cdk4  1.0762549 0.1204506 5.549307   8.935239 1.686818e-04
## Q9QYB5       Add3 -0.9784364 0.1107801 5.549307  -8.832240 1.792319e-04
## Q4QQM4   Trp53i11  1.2595495 0.1427441 5.549307   8.823827 1.801272e-04
## Q9WUU8      Tnip1  0.9415172 0.1083451 5.549307   8.689981 1.950992e-04
## Q3UW53    Fam129a  3.0426788 0.3520071 5.549307   8.643799 2.005993e-04
## P09055      Itgb1  1.5348990 0.1448128 4.549307  10.599192 2.230232e-04
## Q8BGC4      Zadh2 -0.9068988 0.1075316 5.549307  -8.433785 2.280144e-04
## Q64521       Gpd2 -0.9872796 0.1189136 5.549307  -8.302494 2.473780e-04
## Q60710     Samhd1  0.9902074 0.1217576 5.549307   8.132616 2.753559e-04
## P57016       Lad1  2.6804451 0.2653725 4.549307  10.100690 2.755727e-04
## Q03267      Ikzf1 -1.0968334 0.1355547 5.549307  -8.091443 2.826828e-04
## Q8CG47       Smc4  1.2206429 0.1510405 5.549307   8.081560 2.844754e-04
## P42230     Stat5a  0.9271934 0.1157509 5.549307   8.010251 2.978091e-04
## P15307        Rel  1.1098621 0.1390703 5.549307   7.980583 3.035704e-04
## Q9QXG4      Acss2 -1.2122020 0.1530582 5.549307  -7.919878 3.157694e-04
## Q9JMH9     Myo18a  0.8297900 0.1053944 5.549307   7.873185 3.255427e-04
## P37913       Lig1  1.5605331 0.2015234 5.549307   7.743680 3.545519e-04
## Q00417       Tcf7 -1.5769633 0.2043234 5.549307  -7.717978 3.606619e-04
## Q99KE1        Me2 -0.8654281 0.1127037 5.549307  -7.678791 3.702155e-04
## Q8VCT3      Rnpep  0.8396310 0.1098586 5.549307   7.642835 3.792419e-04
## Q6PD03    Ppp2r5a -1.0306822 0.1367435 5.549307  -7.537342 4.072427e-04
## P48758       Cbr1 -0.9314375 0.1245758 5.549307  -7.476871 4.243779e-04
## P45377     Akr1b8  1.2163102 0.1633175 5.549307   7.447517 4.329979e-04
## P49718       Mcm5  0.9299717 0.1248775 5.549307   7.447074 4.331296e-04
## O70404      Vamp8  0.9619197 0.1295017 5.549307   7.427853 4.388865e-04
## Q9CQ62      Decr1  1.4516462 0.1960095 5.549307   7.405997 4.455412e-04
## O88508     Dnmt3a  1.1899880 0.1612455 5.549307   7.379978 4.536174e-04
## P29452      Casp1  1.8363470 0.2066048 4.549307   8.888212 4.814175e-04
## P70236     Map2k6 -0.7930610 0.1088488 5.549307  -7.285898 4.842772e-04
## Q9Z0S1      Bpnt1 -1.0172732 0.1411493 5.549307  -7.207070 5.118361e-04
## P16546     Sptan1 -0.7807077 0.1086930 5.549307  -7.182685 5.207278e-04
## Q8VCW8      Acsf2 -0.9994857 0.1399699 5.549307  -7.140719 5.364552e-04
## Q9WTK5      Nfkb2  0.9884250 0.1388409 5.549307   7.119120 5.447650e-04
## O55101     Syngr2  1.1233790 0.1586745 5.549307   7.079772 5.602916e-04
## P20444      Prkca  1.1677163 0.1653710 5.549307   7.061194 5.678008e-04
## Q8CFB4       Gbp5  1.5198757 0.2170343 5.549307   7.002928 5.921227e-04
## O70400     Pdlim1 -1.2582275 0.1805901 5.549307  -6.967313 6.075865e-04
## Q8BP56      Athl1  0.7428528 0.1066601 5.549307   6.964673 6.087509e-04
## P19182      Ifrd1 -0.9693650 0.1397040 5.549307  -6.938707 6.203473e-04
## Q62261     Sptbn1 -0.8185595 0.1185810 5.549307  -6.902954 6.367371e-04
## Q99MN9       Pccb -0.7485517 0.1088581 5.549307  -6.876397 6.492377e-04
## Q8C0L6       Paox -1.0745498 0.1296667 4.549307  -8.287013 6.517121e-04
## P70302      Stim1 -0.9557104 0.1395146 5.549307  -6.850253 6.618234e-04
## Q8C142    Ldlrap1 -1.2298925 0.1820952 5.549307  -6.754118 7.106062e-04
## Q8K157       Galm  1.3399325 0.1985987 5.549307   6.746936 7.144150e-04
## P00329       Adh1 -1.4507621 0.2150497 5.549307  -6.746170 7.148228e-04
## Q9QYC0       Add1 -0.8166762 0.1212541 5.549307  -6.735247 7.206655e-04
## Q8BMD8   Slc25a24  0.9945078 0.1478739 5.549307   6.725378 7.259923e-04
## Q80U28       Madd -0.9075153 0.1358187 5.549307  -6.681815 7.500589e-04
## Q04447        Ckb -1.6509200 0.2479997 5.549307  -6.656944 7.642142e-04
## P47856      Gfpt1  0.6698891 0.1012165 5.549307   6.618379 7.867809e-04
## O88673       Dgka -0.8466005 0.1279824 5.549307  -6.614973 7.888111e-04
## Q5FWK3    Arhgap1 -0.7072659 0.1069889 5.549307  -6.610646 7.913990e-04
## P83093      Stim2  1.0711944 0.1627140 5.549307   6.583295 8.079859e-04
## P29391       Ftl1  0.8231906 0.1251454 5.549307   6.577874 8.113210e-04
## Q05186       Rcn1  1.6357822 0.2077568 4.549307   7.873545 8.120381e-04
## Q922J3      Clip1 -0.6976948 0.1065395 5.549307  -6.548695 8.295511e-04
## Q8BIG7     Comtd1  0.9264415 0.1416112 5.549307   6.542150 8.337049e-04
## P25799      Nfkb1  0.6929377 0.1067542 5.549307   6.490963 8.670386e-04
## Q8BH59   Slc25a12 -0.7292302 0.1123708 5.549307  -6.489500 8.680135e-04
## Q8K296      Mtmr3  0.9333997 0.1446205 5.549307   6.454130 8.919827e-04
## P97310       Mcm2  0.8293663 0.1291465 5.549307   6.421901 9.144945e-04
## Q3TBT3    Tmem173  0.7944146 0.1249954 5.549307   6.355552 9.629460e-04
## Q62422      Ostf1 -0.8096174 0.1281406 5.549307  -6.318196 9.915334e-04
## P50431      Shmt1  0.7224515 0.1150009 5.549307   6.282138 1.020064e-03
## Q8R4N0      Clybl -1.5088235 0.2023098 4.549307  -7.457984 1.023671e-03
## P14094     Atp1b1 -0.9954441 0.1587919 5.549307  -6.268860 1.030809e-03
## Q8CFK6    Dennd1c  0.9327610 0.1490587 5.549307   6.257678 1.039960e-03
## P50096     Impdh1 -0.6924848 0.1107307 5.549307  -6.253776 1.043175e-03
## Q91VV4    Dennd2d -1.0439585 0.1674659 5.549307  -6.233858 1.059769e-03
## P61028      Rab8b  0.9333800 0.1504512 5.549307   6.203871 1.085331e-03
## O08807      Prdx4  1.1311052 0.1824665 5.549307   6.198975 1.089573e-03
## O70293       Grk6 -0.8775149 0.1421745 5.549307  -6.172098 1.113199e-03
## Q8BUV3       Gphn  1.3092097 0.1796337 4.549307   7.288220 1.128931e-03
## P54310       Lipe  1.1598976 0.1594427 4.549307   7.274701 1.137861e-03
## Q9QUG9    Rasgrp2 -0.9495821 0.1545883 5.549307  -6.142650 1.139771e-03
## Q9Z2L7      Crlf3 -0.7126385 0.1163220 5.549307  -6.126430 1.154720e-03
## Q80XN0       Bdh1 -0.7489501 0.1223013 5.549307  -6.123811 1.157154e-03
## Q61107       Gbp4  1.9964914 0.3263826 5.549307   6.117028 1.163489e-03
## O70172    Pip4k2a -0.6864011 0.1122274 5.549307  -6.116166 1.164297e-03
## Q9JIY5      Htra2  0.8426850 0.1382621 5.549307   6.094836 1.184495e-03
## O70370       Ctss  1.0286773 0.1700837 5.549307   6.048065 1.230220e-03
## Q3UDE2     Ttll12 -0.8441482 0.1396677 5.549307  -6.043976 1.234314e-03
## P56395      Cyb5a  0.7798652 0.1293522 5.549307   6.029005 1.249438e-03
## Q6Q899      Ddx58 -0.7756916 0.1294978 5.549307  -5.989998 1.289862e-03
## Q9WU84        Ccs  0.9903560 0.1671171 5.549307   5.926119 1.359376e-03
## P18654    Rps6ka3 -0.7744090 0.1306868 5.549307  -5.925688 1.359859e-03
## Q3UN02     Lclat1  2.1035903 0.3020149 4.549307   6.965187 1.367649e-03
## Q8QZT1      Acat1 -0.6469405 0.1098099 5.549307  -5.891457 1.398905e-03
## Q9DBX6     Cyp2s1 -1.9296569 0.2125997 3.549307  -9.076481 1.400696e-03
## Q7TNG5       Eml2  0.6824168 0.1164865 5.549307   5.858333 1.437927e-03
## Q61462       Cyba  1.5971867 0.1776290 3.549307   8.991699 1.446487e-03
## P54227      Stmn1  1.2559476 0.2155718 5.549307   5.826122 1.477085e-03
## Q8R001     Mapre2  0.7542737 0.1296315 5.549307   5.818597 1.486409e-03
## Q6P9R4   Arhgef18 -0.6849842 0.1188530 5.549307  -5.763290 1.557071e-03
## Q8R502     Lrrc8c  0.7556299 0.1311365 5.549307   5.762165 1.558549e-03
## C3VPR6      Nlrc5  1.4753272 0.2573410 5.549307   5.732965 1.597459e-03
## P28867      Prkcd  0.6725316 0.1173186 5.549307   5.732524 1.598056e-03
## P49717       Mcm4  0.7159698 0.1253413 5.549307   5.712163 1.625865e-03
## Q68FL4     Ahcyl2  0.9211146 0.1613346 5.549307   5.709344 1.629758e-03
## P51855        Gss -0.7912540 0.1386555 5.549307  -5.706617 1.633535e-03
## Q9CXJ1      Ears2 -1.4370819 0.2524433 5.549307  -5.692693 1.652981e-03
## Q99ME2       Wdr6  0.7189244 0.1265785 5.549307   5.679673 1.671404e-03
## P97311       Mcm6  0.7420898 0.1307145 5.549307   5.677180 1.674960e-03
## P10630     Eif4a2 -0.9235078 0.1634430 5.549307  -5.650335 1.713797e-03
## Q921I1         Tf -1.7166178 0.2007176 3.549307  -8.552402 1.716823e-03
## Q9D0K2      Oxct1 -0.7469502 0.1323772 5.549307  -5.642590 1.725194e-03
##               qval signif
## P24452 0.001828205   TRUE
## P21550 0.004487439   TRUE
## P08207 0.008397362   TRUE
## O09131 0.008397362   TRUE
## Q7TPR4 0.010668481   TRUE
## Q9D3P8 0.010668481   TRUE
## Q8BGW0 0.010668481   TRUE
## Q80SU7 0.010668481   TRUE
## P13020 0.010668481   TRUE
## P07356 0.012423255   TRUE
## Q8BFP9 0.012423255   TRUE
## Q9JJU8 0.012423255   TRUE
## Q9QXY6 0.012678160   TRUE
## Q61205 0.012911928   TRUE
## Q9CYL5 0.013551183   TRUE
## Q61503 0.013551183   TRUE
## P07091 0.013551183   TRUE
## Q9DC16 0.016503397   TRUE
## P07742 0.018296902   TRUE
## Q99N69 0.018296902   TRUE
## P54071 0.019462434   TRUE
## P16045 0.020568528   TRUE
## Q60611 0.022066135   TRUE
## Q9R1Q7 0.022066135   TRUE
## P70677 0.022066135   TRUE
## P30285 0.024548759   TRUE
## Q9QYB5 0.024548759   TRUE
## Q4QQM4 0.024548759   TRUE
## Q9WUU8 0.025516228   TRUE
## Q3UW53 0.025516228   TRUE
## P09055 0.027190721   TRUE
## Q8BGC4 0.027190721   TRUE
## Q64521 0.028605891   TRUE
## Q60710 0.029339404   TRUE
## P57016 0.029339404   TRUE
## Q03267 0.029339404   TRUE
## Q8CG47 0.029339404   TRUE
## P42230 0.029703195   TRUE
## P15307 0.029703195   TRUE
## Q9QXG4 0.030124402   TRUE
## Q9JMH9 0.030299293   TRUE
## P37913 0.032006650   TRUE
## Q00417 0.032006650   TRUE
## Q99KE1 0.032107785   TRUE
## Q8VCT3 0.032159712   TRUE
## Q6PD03 0.033288539   TRUE
## P48758 0.033288539   TRUE
## P45377 0.033288539   TRUE
## P49718 0.033288539   TRUE
## O70404 0.033288539   TRUE
## Q9CQ62 0.033288539   TRUE
## O88508 0.033288539   TRUE
## P29452 0.034222252   TRUE
## P70236 0.034222252   TRUE
## Q9Z0S1 0.035483880   TRUE
## P16546 0.035483880   TRUE
## Q8VCW8 0.035841781   TRUE
## Q9WTK5 0.035841781   TRUE
## O55101 0.036112133   TRUE
## P20444 0.036112133   TRUE
## Q8CFB4 0.036872909   TRUE
## O70400 0.036872909   TRUE
## Q8BP56 0.036872909   TRUE
## P19182 0.036988210   TRUE
## Q62261 0.037118407   TRUE
## Q99MN9 0.037118407   TRUE
## Q8C0L6 0.037118407   TRUE
## P70302 0.037139973   TRUE
## Q8C142 0.037950502   TRUE
## Q8K157 0.037950502   TRUE
## P00329 0.037950502   TRUE
## Q9QYC0 0.037950502   TRUE
## Q8BMD8 0.037950502   TRUE
## Q80U28 0.038256016   TRUE
## Q04447 0.038256016   TRUE
## P47856 0.038256016   TRUE
## O88673 0.038256016   TRUE
## Q5FWK3 0.038256016   TRUE
## P83093 0.038256016   TRUE
## P29391 0.038256016   TRUE
## Q05186 0.038256016   TRUE
## Q922J3 0.038330335   TRUE
## Q8BIG7 0.038330335   TRUE
## P25799 0.038968701   TRUE
## Q8BH59 0.038968701   TRUE
## Q8K296 0.039579138   TRUE
## P97310 0.040111620   TRUE
## Q3TBT3 0.041756840   TRUE
## Q62422 0.042313879   TRUE
## P50431 0.042313879   TRUE
## Q8R4N0 0.042313879   TRUE
## P14094 0.042313879   TRUE
## Q8CFK6 0.042313879   TRUE
## P50096 0.042313879   TRUE
## Q91VV4 0.042313879   TRUE
## P61028 0.042313879   TRUE
## O08807 0.042313879   TRUE
## O70293 0.042313879   TRUE
## Q8BUV3 0.042313879   TRUE
## P54310 0.042313879   TRUE
## Q9QUG9 0.042313879   TRUE
## Q9Z2L7 0.042313879   TRUE
## Q80XN0 0.042313879   TRUE
## Q61107 0.042313879   TRUE
## O70172 0.042313879   TRUE
## Q9JIY5 0.042641826   TRUE
## O70370 0.043612434   TRUE
## Q3UDE2 0.043612434   TRUE
## P56395 0.043741794   TRUE
## Q6Q899 0.044746499   TRUE
## Q9WU84 0.046185377   TRUE
## P18654 0.046185377   TRUE
## Q3UN02 0.046185377   TRUE
## Q8QZT1 0.046478734   TRUE
## Q9DBX6 0.046478734   TRUE
## Q7TNG5 0.047177737   TRUE
## Q61462 0.047177737   TRUE
## P54227 0.047665023   TRUE
## Q8R001 0.047665023   TRUE
## Q6P9R4 0.049152247   TRUE
## Q8R502 0.049152247   TRUE
## C3VPR6 0.049472783   TRUE
## P28867 0.049472783   TRUE
## P49717 0.049472783   TRUE
## Q68FL4 0.049472783   TRUE
## P51855 0.049472783   TRUE
## Q9CXJ1 0.049547668   TRUE
## Q99ME2 0.049547668   TRUE
## P97311 0.049547668   TRUE
## P10630 0.049873794   TRUE
## Q921I1 0.049873794   TRUE
## Q9D0K2 0.049873794   TRUE

There are 132 proteins with a significant effect at the 5% FDR level.

4.3 Plots

4.3.1 Volcano-plot

volcano<- ggplot(tests,aes(x=estimate,y=-log10(pval),color=signif)) + geom_point() + scale_color_manual(values=c("black","red"))
volcano

An interactive volcano plot can be obtained by using the plotly library. We alter the object tests to add the protein name to the data frame in ggplot, this will allow us to see the protein name in the interactive plot.

library(plotly)
volcano<- ggplot(tests %>% rownames_to_column("protein"), aes(x=estimate,y=-log10(pval),color=signif,protein=protein,annotation=Protein.names,gene=Gene.names)) + geom_point() + scale_color_manual(values=c("black","red"))
ggplotly(volcano)

4.3.2 Heatmap

There are no significant proteins so we will plot a heatmap of the top 10 proteins, instead. Note, that these results are not statistically significant.

sigNames<- tests %>% rownames_to_column("protein") %>% filter(signif) %>% pull(protein)
heatmap(exprs(protData)[sigNames,])

4.3.3 Detail plot for the top 5 proteins.

We first extract the normalized protein expression values for a particular protein and we create a dataframe that contains the protein expression values and the relevant variables of the experimental design.

for (protName in rownames(tests)[1:5])
{
plotDat<-pepData[fData(pepData)$Proteins==protName] %>% exprs
plotDataStack<-data.frame(quant_value=c(plotDat),sequence=rep(rownames(plotDat),rep=ncol(plotDat)),sample=rep(colnames(plotDat),each=nrow(plotDat)),condition=rep(pData(pepData)$cellType,each=nrow(plotDat)))
plot1 <- ggplot(plotDataStack, aes(x=sample, y=quant_value,fill=condition))
print(plot1 +geom_boxplot(outlier.shape=NA) + geom_point(position=position_jitter(width=.1),aes(shape=sequence)) + scale_shape_manual(values=1:nrow(plotDat)) +labs(title = protName, x="sample", y="Peptide intensity (log2)"))
}

Proteomics data analysis: mouse Tcell example with RCB designs

5 July 2019

Contents