1 Background

Eighteen Estrogen Receptor Positive Breast cancer tissues from from patients treated with tamoxifen upon recurrence have been assessed in a proteomics study. Nine patients had a good outcome (or) and the other nine had a poor outcome (pd). The proteomes have been assessed using an LTQ-Orbitrap and the thermo output .RAW files were searched with MaxQuant (version 1.4.1.2) against the human proteome database (FASTA version 2012-09, human canonical proteome).

2 Data

We first import the peptides.txt file. This is the file that contains your peptide-level intensities. For a MaxQuant search [6], this peptides.txt file can be found by default in the “path_to_raw_files/combined/txt/” folder from the MaxQuant output, with “path_to_raw_files” the folder where raw files were saved. In this tutorial, we will use a MaxQuant peptides file from MaxQuant that can be found on the pdaData repository. We will use the MSnbase package to import the data.

We generate the object peptideFile with the path to the peptides.txt file. In this file we will fetch the data from the github repository linked to the pda course: https://statomics.github.io/pda. You can also replace the peptideFile with a string that points to the path of a file on your local hard drive.

With the grepEcols function we find the columns that are containing the expression data of the peptides in the peptides.txt file.

library(MSqRob)
library(MSnbase)
library(tidyverse)
library(limma)
peptidesFile <- "https://raw.githubusercontent.com/statOmics/pda/data/quantification/cancer/peptides9vs9.txt"
proteinGroupsFile <-"https://raw.githubusercontent.com/statOmics/pda/data/quantification/cancer/proteinGroups.txt"
ecols<-grepEcols(peptidesFile, "Intensity ", split = "\t")

Next we import the peptide intensities

pepData<-readMSnSet2(peptidesFile,ecol=ecols,fnames="Sequence",sep="\t")
pepData

## MSnSet (storageMode: lockedEnvironment)
## assayData: 34205 features, 18 samples 
##   element names: exprs 
## protocolData: none
## phenoData: none
## featureData
##   featureNames: AAAAAAAAAAAAAAAGAGAGAK AAAAAAAAAAGAAGGR ...
##     YYYDGKDYIEFNK (34205 total)
##   fvarLabels: Sequence Proteins ... Oxidation..M..site.IDs (44
##     total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:  
## - - - Processing information - - -
##  MSnbase version: 2.10.1

The pepData object is an MSnSet, a container for the data, features information and experimental annotation. They can be accessed using the accessor functions ‘exprs’ (matrix of intensities, features in rows, samples in columns), ‘fData’ (properties for each feature, peptide or protein, on the rows) and ‘pData’ (properties for the samples on the columns).

We will make use from data wrangling functionalities from the tidyverse package. The %>% operator allows us to pipe the output of one function to the next function.

head(exprs(pepData))

##                            Intensity.OR.01 Intensity.OR.04 Intensity.OR.07
## AAAAAAAAAAAAAAAGAGAGAK                   0        17923000         6237900
## AAAAAAAAAAGAAGGR                   8439600        15859000         5783100
## AAAAAAAGDSDSWDADAFSVEDPVRK         3352800         6592900         5295700
## AAAAAAALQAK                              0        22380000        14302000
## AAAAAAGAASGLPGPVAQGLK              3867400        13661000         3764300
## AAAAAAGAGPEMVR                     4113700         4431500         4944400
##                            Intensity.OR.09 Intensity.OR.10 Intensity.OR.13
## AAAAAAAAAAAAAAAGAGAGAK                   0               0         2977200
## AAAAAAAAAAGAAGGR                  13629000         8121000         2910900
## AAAAAAAGDSDSWDADAFSVEDPVRK        16849000         7795800         3653500
## AAAAAAALQAK                       31097000        23003000               0
## AAAAAAGAASGLPGPVAQGLK              6857300         3517000         3149100
## AAAAAAGAGPEMVR                     8187500         2625000         4473100
##                            Intensity.OR.20 Intensity.OR.23 Intensity.OR.25
## AAAAAAAAAAAAAAAGAGAGAK                   0               0         5067300
## AAAAAAAAAAGAAGGR                   7792100         5757700         2342000
## AAAAAAAGDSDSWDADAFSVEDPVRK        10771000        12880000         6918700
## AAAAAAALQAK                       21494000        31017000        13190000
## AAAAAAGAASGLPGPVAQGLK             14372000         3131300         8151000
## AAAAAAGAGPEMVR                     5150300         3830300         5399800
##                            Intensity.PD.02 Intensity.PD.03 Intensity.PD.04
## AAAAAAAAAAAAAAAGAGAGAK            11510000         3003800               0
## AAAAAAAAAAGAAGGR                  11056000         4491600         5673700
## AAAAAAAGDSDSWDADAFSVEDPVRK         4721100         6766500         5119600
## AAAAAAALQAK                       31208000        15994000        18562000
## AAAAAAGAASGLPGPVAQGLK              5036100         9125600         2497600
## AAAAAAGAGPEMVR                           0         3424900         2745700
##                            Intensity.PD.06 Intensity.PD.07 Intensity.PD.08
## AAAAAAAAAAAAAAAGAGAGAK             5056200         6605000               0
## AAAAAAAAAAGAAGGR                   4104500               0        12253000
## AAAAAAAGDSDSWDADAFSVEDPVRK         5098200         1946300         3137200
## AAAAAAALQAK                       15672000               0        19137000
## AAAAAAGAASGLPGPVAQGLK              6499900         2249100         8652700
## AAAAAAGAGPEMVR                     4143800         1792700         4349800
##                            Intensity.PD.09 Intensity.PD.10 Intensity.PD.11
## AAAAAAAAAAAAAAAGAGAGAK                   0               0         2947100
## AAAAAAAAAAGAAGGR                   7015300               0         3825800
## AAAAAAAGDSDSWDADAFSVEDPVRK         2254200         7580700         5205500
## AAAAAAALQAK                        8536800        21467000         9194100
## AAAAAAGAASGLPGPVAQGLK              2519600         3397200         8505100
## AAAAAAGAGPEMVR                           0         2373000         1479100

pepData %>% exprs %>% head

##                            Intensity.OR.01 Intensity.OR.04 Intensity.OR.07
## AAAAAAAAAAAAAAAGAGAGAK                   0        17923000         6237900
## AAAAAAAAAAGAAGGR                   8439600        15859000         5783100
## AAAAAAAGDSDSWDADAFSVEDPVRK         3352800         6592900         5295700
## AAAAAAALQAK                              0        22380000        14302000
## AAAAAAGAASGLPGPVAQGLK              3867400        13661000         3764300
## AAAAAAGAGPEMVR                     4113700         4431500         4944400
##                            Intensity.OR.09 Intensity.OR.10 Intensity.OR.13
## AAAAAAAAAAAAAAAGAGAGAK                   0               0         2977200
## AAAAAAAAAAGAAGGR                  13629000         8121000         2910900
## AAAAAAAGDSDSWDADAFSVEDPVRK        16849000         7795800         3653500
## AAAAAAALQAK                       31097000        23003000               0
## AAAAAAGAASGLPGPVAQGLK              6857300         3517000         3149100
## AAAAAAGAGPEMVR                     8187500         2625000         4473100
##                            Intensity.OR.20 Intensity.OR.23 Intensity.OR.25
## AAAAAAAAAAAAAAAGAGAGAK                   0               0         5067300
## AAAAAAAAAAGAAGGR                   7792100         5757700         2342000
## AAAAAAAGDSDSWDADAFSVEDPVRK        10771000        12880000         6918700
## AAAAAAALQAK                       21494000        31017000        13190000
## AAAAAAGAASGLPGPVAQGLK             14372000         3131300         8151000
## AAAAAAGAGPEMVR                     5150300         3830300         5399800
##                            Intensity.PD.02 Intensity.PD.03 Intensity.PD.04
## AAAAAAAAAAAAAAAGAGAGAK            11510000         3003800               0
## AAAAAAAAAAGAAGGR                  11056000         4491600         5673700
## AAAAAAAGDSDSWDADAFSVEDPVRK         4721100         6766500         5119600
## AAAAAAALQAK                       31208000        15994000        18562000
## AAAAAAGAASGLPGPVAQGLK              5036100         9125600         2497600
## AAAAAAGAGPEMVR                           0         3424900         2745700
##                            Intensity.PD.06 Intensity.PD.07 Intensity.PD.08
## AAAAAAAAAAAAAAAGAGAGAK             5056200         6605000               0
## AAAAAAAAAAGAAGGR                   4104500               0        12253000
## AAAAAAAGDSDSWDADAFSVEDPVRK         5098200         1946300         3137200
## AAAAAAALQAK                       15672000               0        19137000
## AAAAAAGAASGLPGPVAQGLK              6499900         2249100         8652700
## AAAAAAGAGPEMVR                     4143800         1792700         4349800
##                            Intensity.PD.09 Intensity.PD.10 Intensity.PD.11
## AAAAAAAAAAAAAAAGAGAGAK                   0               0         2947100
## AAAAAAAAAAGAAGGR                   7015300               0         3825800
## AAAAAAAGDSDSWDADAFSVEDPVRK         2254200         7580700         5205500
## AAAAAAALQAK                        8536800        21467000         9194100
## AAAAAAGAASGLPGPVAQGLK              2519600         3397200         8505100
## AAAAAAGAGPEMVR                           0         2373000         1479100

pepData %>% sampleNames

##  [1] "Intensity.OR.01" "Intensity.OR.04" "Intensity.OR.07" "Intensity.OR.09"
##  [5] "Intensity.OR.10" "Intensity.OR.13" "Intensity.OR.20" "Intensity.OR.23"
##  [9] "Intensity.OR.25" "Intensity.PD.02" "Intensity.PD.03" "Intensity.PD.04"
## [13] "Intensity.PD.06" "Intensity.PD.07" "Intensity.PD.08" "Intensity.PD.09"
## [17] "Intensity.PD.10" "Intensity.PD.11"

The sample names are rather long and contain information on the spike-in concentration and the repeat. We will remove “Intensity.” from the string

sampleNames(pepData) <- pepData %>% sampleNames %>% str_replace(., pattern="Intensity.", replacement="")

Next we will add information on the proteins to the feature data. The feature data of the object pepData contains information for each peptide in the experiment. This info on the proteins can be found in the proteingroups.txt file of maxquant. The peptides.txt file contains many data on each feature (peptide). We will only retain specified columns Protein, Sequence, Reverse and Potential.Contaminant, indicating the protein, peptide sequence, if the peptide is a decoy and if it is a contaminant, respectively.

In older and more recent versions of maxquant the names of the fields might vary. We can therefore explore the names we want to select with the shiny app that can be launched from the selectFeatureData from the MSnbase package, i.e. by running the function without an fcol argument that specifies the columns you would like to select.

The code can be found below, but is not executed by this Rmd script (note, that the flag eval=FALSE has been used as option for the R chunk).

pepData<-selectFeatureData(pepData)

If we know the columns, we can specify them as a vector in the fcol slot.

pepData<-selectFeatureData(pepData,fcol=c("Proteins","Protein.names","Gene.names", "Contaminant","Reverse","Sequence"))

Calculate how many non zero intensities we have per peptide. This will be useful for filtering.

fData(pepData)$nNonZero<- rowSums(exprs(pepData) > 0)

Peptides with zero intensities are missing peptides and should be represent with a NA value instead of 0.

exprs(pepData)[exprs(pepData)==0]<-NA

Next we create the data on the experimental layout. We can do this based on the samplenames. The information on the spike in condition is available as the first letter of the sample name. We extract this information and recast the character vector in a factor.

pData(pepData)$condition <- pepData %>% sampleNames %>% substr(1,2) %>% as.factor

2.1 Data exploration

We can inspect the missingness in our data with the plotNA() function provided with MSnbase. 47% of all peptide intensities are missing and for some peptides we don’t even measure a signal in any sample. The missingness is similar across samples. Note, that we plot the peptide data, so the label protein in the plot refers to peptides.

plotNA(pepData)

When we look at density plots of the intensities, we see that most intensities are low and that there is a long tail to the right. It is more useful to explore the data upon log transformation.

plotDensities(exprs(pepData))

nrow(pepData)

## [1] 34205

There are 34205 peptides before preprocessing.

3 Preprocessing

We will log transform, normalize, filter and summarize the data.

3.1 Log transform the data

pepData <- log(pepData, base = 2)
plotDensities(exprs(pepData))

When we look at the densities we see small shifts in location and shape of the density distributions. So normalisation will be required. But, we will first perform filtering.

3.2 Filtering

3.2.1 Handling overlapping protein groups

In our approach a peptide can map to multiple proteins, as long as there is none of these proteins present in a smaller subgroup.

pepData<-pepData[fData(pepData)$Proteins %in% smallestUniqueGroups(fData(pepData)$Proteins),]

3.2.2 Remove reverse sequences (decoys) and contaminants

We now remove the contaminants, peptides that map to decoy sequences and proteins, which were only identified by peptides with modifications.

pepData <- pepData[fData(pepData)$Reverse!="+",]
pepData <- pepData[fData(pepData)$Contaminant!="+",]

3.2.3 Remove peptides of proteins that were only identified with modified peptides

Proteins for which all peptides are carrying modifications (PTMs) can be considered as unreliable. We will filter out these proteins. This information is included in the Only.identified.by.site column of proteinGroups.txt file of maxquant. The variable is containing a “+” character if the protein is only identified by modified peptides and an empty string “” if this is not the case. Sometimes an NA value is also present. We will replace these by the empty character string. The information of the peptides.txt file can be linked to the proteinGroups.txt file by using the Proteins column from the peptides.txt file and the Protein.IDs column in the proteinGroups.txt file. If NAs are included in the

proteinGroups<-read.table(proteinGroupsFile, sep = "\t", header = TRUE, quote = "", comment.char = "")
only_site <- proteinGroups$Only.identified.by.site
only_site[is.na(only_site)] <- ""
proteinsOnlyIdentifiedWithPTMs <- proteinGroups$Protein.IDs[only_site=="+"]
pepData<-pepData[!(fData(pepData)$Proteins %in% proteinsOnlyIdentifiedWithPTMs),]

We will now remove the proteinGroups object from the R session to free memory.

rm(proteinGroups,only_site)
gc()

##           used  (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
## Ncells 4134795 220.9    8535498 455.9         NA  5361388 286.4
## Vcells 7927960  60.5   20266106 154.7      16384 20263743 154.7

3.2.4 Drop peptides that were only identified in one sample

We want to keep peptide that were at least observed twice.

pepData<-pepData[fData(pepData)$nNonZero>=2,]
nrow(pepData)

## [1] 26689

We keep 26689 peptides upon filtering.

3.3 Quantile normalize the data

pepData <- normalise(pepData, "quantiles")

3.4 Explore quantile normalized data

Upon normalisation the density curves for all samples coincide.

plotDensities(exprs(pepData))

We can visualize our data using a Multi Dimensional Scaling plot, eg. as provided by the limma package.

plotMDS(exprs(pepData),col=as.double(pData(pepData)$condition))

The first axis in the plot is showing the leading log fold changes (differences on the log scale) between the samples. We notice that the leading differences (log FC) in the peptide data seems to be driven by technical variability. Indeed the samples do not seem to be clearly separated according to the spike in condition. Because there are missing values for the peptide level data we use the na.rm=TRUE argument to summarize the data based on the observed peptides for every protein.

3.5 Summarization to protein level

protData<-combineFeatures(pepData,fcol="Proteins",method="robust",na.rm=TRUE)

## Your data contains missing values. Please read the relevant section
## in the combineFeatures manual page for details the effects of
## missing values on data aggregation.

If you summarize using robust summarization please refer to Sticker et al. (2019) https://www.biorxiv.org/content/10.1101/668863v1

We filter proteins for which we have at least two protein expression values in both conditions:

protData<-protData[
protData %>% exprs %>% is.na %>% `!` %>% rowSums>4,]

plotMDS(exprs(protData),col=as.double(pData(pepData)$condition))

4 Data Analysis

4.1 Estimation

MSqRob is currently working with a format where we have one dataframe for each protein. This will be changed in the next release. Therefore we first have to reorganise the data.

Next the models are fitted. This is done using the fit.model function. We only have to model the data using the factor condition from the pheno data of the protein level MSnSet. The name of the factor variable is specified in the fixed argument (if multiple predictors have to be incorporated in the model, a vector of variable names has to be provided in this argument.). The argument shrinkage is used to specify if ridge regression has to be adopted. For the sake of speed we do not do this in the tutorial. The shrinkage has to be specified for each variable in the fixed effects. We also have to indicate this for the intercept (which we never shrink). So we specify it at c(0,0) to indicate that the intercept (first 0) and the parameters for the factor condition (second 0) are not penalized. We set the robust_var function equal to FALSE, this functionality will be removed from the package in the next release.

protMSqRob <- MSnSet2protdata(protData, "Proteins",annotations=c("Protein.names","Gene.names"))
models <- fit.model(protdata=protMSqRob, response="quant_value", fixed="condition",shrinkage.fixed=c(0,0),robust_var=FALSE)

4.2 Inference

Many biologists have problems with the reference coding. In MSqRob we have opted to formulate contrasts using all levels of a factor. Internally, the contrasts are than recasted according to the factor level that is the reference class.

L <- makeContrast("conditionPD - conditionOR",  levels=c("conditionPD","conditionOR"))
tests <- test.contrast_adjust(models, L)
nSig <- sum(tests$signif,na.rm=TRUE)
head(tests,nSig)

##                                                                                             Protein.names
## P61604                                                           10 kDa heat shock protein, mitochondrial
## P06454                                                                 Prothymosin alpha;Thymosin alpha-1
## Q13765                                               Nascent polypeptide-associated complex subunit alpha
## P49006                                                                             MARCKS-related protein
## Q96K17                                                                Transcription factor BTF3 homolog 4
## P07108                                                                           Acyl-CoA-binding protein
## O43396                                                                         Thioredoxin-like protein 1
## Q9GZT3                                       SRA stem-loop-interacting RNA-binding protein, mitochondrial
## P09496                                                                             Clathrin light chain A
## Q9H299                                               SH3 domain-binding glutamic acid-rich-like protein 3
## P54727                                                         UV excision repair protein RAD23 homolog B
## P63313                                                                                   Thymosin beta-10
## Q8NC51                                              Plasminogen activator inhibitor 1 RNA-binding protein
## P30084                                                                 Enoyl-CoA hydratase, mitochondrial
## P27797                                                                                       Calreticulin
## O43765                             Small glutamine-rich tetratricopeptide repeat-containing protein alpha
## O60635                                                                                      Tetraspanin-1
## Q9NXG2                                                                  THUMP domain-containing protein 1
## P09012                                                               U1 small nuclear ribonucleoprotein A
## P55327                                                                                  Tumor protein D52
## P51858                                                                     Hepatoma-derived growth factor
## Q9UBE0                                                                   SUMO-activating enzyme subunit 1
## P20042                                               Eukaryotic translation initiation factor 2 subunit 2
## P43490                                                             Nicotinamide phosphoribosyltransferase
## Q15758                                                                Neutral amino acid transporter B(0)
## Q9NRX4                                                                14 kDa phosphohistidine phosphatase
## P46108                                                                               Adapter molecule crk
## P19338                                                                                          Nucleolin
## Q16543                                                                           Hsp90 co-chaperone Cdc37
## Q15370                                                    Transcription elongation factor B polypeptide 2
## O75223                                                                     Gamma-glutamylcyclotransferase
## Q14980                                                                Nuclear mitotic apparatus protein 1
## P30085                                                                                     UMP-CMP kinase
## O75368                                                 SH3 domain-binding glutamic acid-rich-like protein
## O95571                                                                       Protein ETHE1, mitochondrial
## O60888                                                                                       Protein CutA
## P52434                                         DNA-directed RNA polymerases I, II, and III subunit RPABC3
## Q6IA17                                                                    Single Ig IL-1-related receptor
## P42345                                                               Serine/threonine-protein kinase mTOR
## P07858                                        Cathepsin B;Cathepsin B light chain;Cathepsin B heavy chain
## P22307                                                                Non-specific lipid-transfer protein
## P02787                                                                                    Serotransferrin
## Q9Y266                                                                     Nuclear migration protein nudC
## Q9UMX5                                                                                           Neudesin
## Q99497                                                                                       Protein DJ-1
## P52565                                                                   Rho GDP-dissociation inhibitor 1
## Q9BT22                                              Chitobiosyldiphosphodolichol beta-mannosyltransferase
## P62318                                                              Small nuclear ribonucleoprotein Sm D3
## P20810                                                                                        Calpastatin
## P20962                                                                                       Parathymosin
## Q06830                                                                                    Peroxiredoxin-1
## P22061                                            Protein-L-isoaspartate(D-aspartate) O-methyltransferase
## Q9NP79                                           Vacuolar protein sorting-associated protein VTA1 homolog
## P49321                                                                Nuclear autoantigenic sperm protein
## P07305                                                                                       Histone H1.0
## Q92734                                                                                        Protein TFG
## P61086                                                                  Ubiquitin-conjugating enzyme E2 K
## P05387                                                                    60S acidic ribosomal protein P2
## P06730                                                        Eukaryotic translation initiation factor 4E
## P10412                                                                                       Histone H1.4
## Q15293                                                                                   Reticulocalbin-1
## Q9Y2Q5                                                                  Ragulator complex protein LAMTOR2
## Q02978                                                Mitochondrial 2-oxoglutarate/malate carrier protein
## P14314                                                                         Glucosidase 2 subunit beta
## Q92522                                                                                        Histone H1x
## P04080                                                                                         Cystatin-B
## Q8TE68                                 Epidermal growth factor receptor kinase substrate 8-like protein 1
## Q16513                                                                 Serine/threonine-protein kinase N2
## P21333                                                                                          Filamin-A
## P12268                                                            Inosine-5-monophosphate dehydrogenase 2
## Q9NR45                                                                               Sialic acid synthase
## Q9UHV9                                                                                Prefoldin subunit 2
## Q15181                                                                          Inorganic pyrophosphatase
## P29966                                                      Myristoylated alanine-rich C-kinase substrate
## P12004                                                                 Proliferating cell nuclear antigen
## Q13451                                                          Peptidyl-prolyl cis-trans isomerase FKBP5
## P62310                                                           U6 snRNA-associated Sm-like protein LSm3
## Q05682                                                                                          Caldesmon
## Q9UHQ9                                                                     NADH-cytochrome b5 reductase 1
## P19971                                                                            Thymidine phosphorylase
## Q9H936                                                                  Mitochondrial glutamate carrier 1
## O15327                                                    Type II inositol 3,4-bisphosphate 4-phosphatase
## P04040                                                                                           Catalase
## P61960                                                                          Ubiquitin-fold modifier 1
## Q96KP4                                                                 Cytosolic non-specific dipeptidase
## Q9NZJ7                                                                    Mitochondrial carrier homolog 1
## P30086        Phosphatidylethanolamine-binding protein 1;Hippocampal cholinergic neurostimulating peptide
## P67936                                                                          Tropomyosin alpha-4 chain
## O43237                                                    Cytoplasmic dynein 1 light intermediate chain 2
## Q5T4S7                                                                   E3 ubiquitin-protein ligase UBR4
## A6NL28                                                    Putative tropomyosin alpha-3 chain-like protein
## O95471                                                                                          Claudin-7
## Q15185                                                                         Prostaglandin E synthase 3
## Q9UGI8                                                                                             Testin
## Q9HB71                                                                          Calcyclin-binding protein
## Q02818                                                                                     Nucleobindin-1
## P02790                                                                                          Hemopexin
## P06331;P01825                           Ig heavy chain V-II region ARH-77;Ig heavy chain V-II region NEWM
## P16949                                                                                           Stathmin
## P51553                                        Isocitrate dehydrogenase [NAD] subunit gamma, mitochondrial
## O00233                                                     26S proteasome non-ATPase regulatory subunit 9
## Q99426                                                                         Tubulin-folding cofactor B
## Q9HAB8                                                               Phosphopantothenate--cysteine ligase
## P30044                                                                     Peroxiredoxin-5, mitochondrial
## P31937                                                  3-hydroxyisobutyrate dehydrogenase, mitochondrial
## Q6I9Y2                                                                      THO complex subunit 7 homolog
## P07741                                                                  Adenine phosphoribosyltransferase
## O95881                                                           Thioredoxin domain-containing protein 12
## P15170                                  Eukaryotic peptide chain release factor GTP-binding subunit ERF3A
##               Gene.names   estimate        se       df      Tval
## P61604             HSPE1  2.5592734 0.3123860 21.11742  8.192664
## P06454              PTMA  3.5347330 0.4492307 21.11742  7.868414
## Q13765              NACA  1.4505256 0.1957771 21.11742  7.409067
## P49006          MARCKSL1  2.1881063 0.2968877 19.11742  7.370149
## Q96K17            BTF3L4  1.6512774 0.2156977 17.11742  7.655518
## P07108               DBI  4.3219261 0.4884625 12.11742  8.848021
## O43396             TXNL1  1.8716784 0.2805144 21.11742  6.672308
## Q9GZT3             SLIRP  2.0996831 0.3149371 21.11742  6.666991
## P09496              CLTA  1.4139193 0.2184747 21.11742  6.471774
## Q9H299          SH3BGRL3  1.9496508 0.3031351 21.11742  6.431623
## P54727            RAD23B  1.5583760 0.2444316 21.11742  6.375509
## P63313            TMSB10  3.3690447 0.4668663 15.11742  7.216295
## Q8NC51            SERBP1  1.6681795 0.2670389 21.11742  6.246952
## P30084             ECHS1  1.9146274 0.3183384 21.11742  6.014441
## P27797              CALR  1.4148822 0.2363851 21.11742  5.985498
## O43765              SGTA  1.1014428 0.1865749 21.11742  5.903489
## O60635            TSPAN1 -1.8691707 0.2624083 13.11742 -7.123140
## Q9NXG2           THUMPD1  1.5396595 0.2634810 21.11742  5.843532
## P09012             SNRPA  1.8345287 0.3056167 18.11742  6.002710
## P55327             TPD52  1.7182370 0.3078550 21.11742  5.581318
## P51858              HDGF  1.9604606 0.3558898 20.11742  5.508617
## Q9UBE0              SAE1  1.1223951 0.2087599 21.11742  5.376489
## P20042            EIF2S2  0.9934941 0.1933373 21.11742  5.138657
## P43490             NAMPT  1.3535312 0.2638740 21.11742  5.129461
## Q15758            SLC1A5 -1.4336978 0.2735523 18.11742 -5.241036
## Q9NRX4             PHPT1  1.8402235 0.3512179 18.11742  5.239549
## P46108               CRK  1.5708413 0.2981835 17.11742  5.268036
## P19338               NCL  1.1259343 0.2261039 21.11742  4.979720
## Q16543             CDC37  1.0484856 0.2110960 21.11742  4.966867
## Q15370             TCEB2  1.3142814 0.2615779 20.11742  5.024436
## O75223              GGCT  1.7772065 0.3637474 21.11742  4.885826
## Q14980             NUMA1 -1.1291421 0.2312966 21.11742 -4.881792
## P30085             CMPK1  1.1509752 0.2358411 21.11742  4.880300
## O75368           SH3BGRL  2.2060894 0.4565541 21.11742  4.832043
## O95571             ETHE1  1.5460854 0.3061063 16.11742  5.050813
## O60888              CUTA  1.1259257 0.2365392 19.11742  4.759997
## P52434            POLR2H  1.0396618 0.2276945 21.11742  4.566039
## Q6IA17            SIGIRR -1.0253851 0.2181514 17.11742 -4.700337
## P42345              MTOR -2.2740172 0.4033124 10.11742 -5.638352
## P07858              CTSB  1.3433419 0.2941021 19.11742  4.567604
## P22307              SCP2  1.2424767 0.2782961 21.11742  4.464586
## P02787                TF  1.4893302 0.3374040 21.11742  4.414085
## Q9Y266              NUDC  0.9351525 0.2123652 21.11742  4.403511
## Q9UMX5              NENF  1.3761685 0.2940443 16.11742  4.680140
## Q99497             PARK7  1.2953983 0.2943649 21.11742  4.400654
## P52565           ARHGDIA  1.6594679 0.3783601 21.11742  4.385948
## Q9BT22              ALG1 -1.0379660 0.2322491 19.11742 -4.469193
## P62318            SNRPD3  0.8169473 0.1886125 21.11742  4.331353
## P20810              CAST  1.3723052 0.3148262 20.11742  4.358930
## P20962              PTMS  2.5183460 0.5503382 16.11742  4.575997
## Q06830             PRDX1  0.8253918 0.1918505 21.11742  4.302265
## P22061             PCMT1  1.1947252 0.2778026 21.11742  4.300627
## Q9NP79              VTA1  0.8770644 0.2054590 21.11742  4.268805
## P49321              NASP  1.1415431 0.2678475 21.11742  4.261915
## P07305              H1F0  1.3082684 0.2898209 16.11742  4.514058
## Q92734               TFG  1.0270438 0.2429410 21.11742  4.227545
## P61086             UBE2K  0.9569681 0.2270417 21.11742  4.214945
## P05387             RPLP2  1.1108529 0.2640546 21.11742  4.206907
## P06730             EIF4E  1.7050960 0.3694762 14.11742  4.614901
## P10412          HIST1H1E  1.4081105 0.3199354 16.11742  4.401233
## Q15293              RCN1  1.9214754 0.4634895 20.11742  4.145672
## Q9Y2Q5           LAMTOR2  1.0583258 0.2529308 19.11742  4.184250
## Q02978          SLC25A11 -1.0482598 0.2567983 21.11742 -4.082035
## P14314            PRKCSH  0.8195419 0.2015128 21.11742  4.066947
## Q92522              H1FX  1.5712942 0.3539711 14.11742  4.439047
## P04080              CSTB  1.3639992 0.3362218 21.11742  4.056843
## Q8TE68            EPS8L1  1.3286881 0.3012425 14.11742  4.410692
## Q16513              PKN2  1.9551677 0.4146200 11.11742  4.715565
## P21333              FLNA  1.3398967 0.3335814 21.11742  4.016701
## P12268            IMPDH2  0.8712732 0.2176929 21.11742  4.002303
## Q9NR45              NANS  1.2928056 0.3233026 21.11742  3.998748
## Q9UHV9             PFDN2  0.8523008 0.2123892 20.11742  4.012919
## Q15181              PPA1  1.5567884 0.3886169 20.11742  4.005972
## P29966            MARCKS  1.5499687 0.3834901 19.11742  4.041744
## P12004              PCNA  1.1667833 0.2945770 21.11742  3.960877
## Q13451             FKBP5  1.5069044 0.3732050 18.11742  4.037739
## P62310              LSM3  1.3331794 0.3223234 16.11742  4.136155
## Q05682             CALD1  1.2063519 0.3047726 20.11742  3.958204
## Q9UHQ9            CYB5R1 -0.9987964 0.2555546 21.11742 -3.908348
## P19971              TYMP  1.8459088 0.4728170 21.11742  3.904066
## Q9H936          SLC25A22 -0.9448232 0.2434157 21.11742 -3.881521
## O15327            INPP4B  1.1928744 0.3040420 19.11742  3.923388
## P04040               CAT  1.0076194 0.2619447 21.11742  3.846687
## P61960              UFM1  1.1882509 0.3073106 20.11742  3.866612
## Q96KP4             CNDP2  1.0827400 0.2835639 21.11742  3.818328
## Q9NZJ7             MTCH1 -0.9997059 0.2421078 14.11742 -4.129176
## P30086             PEBP1 -1.4050950 0.3720807 21.11742 -3.776318
## P67936              TPM4  1.5197110 0.4025275 21.11742  3.775421
## O43237          DYNC1LI2  0.8001891 0.2126480 21.11742  3.762975
## Q5T4S7              UBR4 -0.7998037 0.2126559 21.11742 -3.761023
## A6NL28                    0.8836174 0.2360921 21.11742  3.742681
## O95471             CLDN7 -1.7505708 0.4404022 15.11742 -3.974937
## Q15185            PTGES3  0.9750688 0.2613472 21.11742  3.730932
## Q9UGI8               TES  1.4872202 0.3768812 15.11742  3.946124
## Q9HB71            CACYBP  0.9409158 0.2533127 20.11742  3.714444
## Q02818             NUCB1  1.0305185 0.2805836 21.11742  3.672769
## P02790               HPX  1.2526114 0.3388362 20.11742  3.696805
## P06331;P01825             1.8447009 0.4743805 15.11742  3.888653
## P16949             STMN1  1.6465449 0.4351696 17.11742  3.783685
## P51553             IDH3G -0.9963271 0.2690001 19.11742 -3.703817
## O00233             PSMD9  1.1337745 0.2970251 16.11742  3.817099
## Q99426              TBCB  1.1515715 0.3110882 19.11742  3.701753
## Q9HAB8              PPCS  0.7949434 0.2180909 21.11742  3.645010
## P30044             PRDX5  0.8034359 0.2204515 21.11742  3.644503
## P31937            HIBADH  0.7244339 0.1982152 20.11742  3.654785
## Q6I9Y2             THOC7  2.4902814 0.6144630 12.11742  4.052777
## P07741              APRT  1.2005078 0.3288858 20.11742  3.650227
## O95881           TXNDC12  1.0496663 0.2740157 15.11742  3.830679
## P15170             GSPT1  1.0795850 0.2966670 20.11742  3.639047
##                       pval         qval signif
## P61604        5.383583e-08 0.0001704142   TRUE
## P06454        1.032501e-07 0.0001704142   TRUE
## Q13765        2.660805e-07 0.0002927772   TRUE
## P49006        5.335100e-07 0.0004174311   TRUE
## Q96K17        6.322797e-07 0.0004174311   TRUE
## P07108        1.230123e-06 0.0005377809   TRUE
## O43396        1.288233e-06 0.0005377809   TRUE
## Q9GZT3        1.303316e-06 0.0005377809   TRUE
## P09496        2.003454e-06 0.0007229229   TRUE
## Q9H299        2.190012e-06 0.0007229229   TRUE
## P54727        2.481036e-06 0.0007445363   TRUE
## P63313        2.860298e-06 0.0007868202   TRUE
## Q8NC51        3.306851e-06 0.0008396859   TRUE
## P30084        5.588687e-06 0.0013134908   TRUE
## P27797        5.968604e-06 0.0013134908   TRUE
## O43765        7.195002e-06 0.0014372405   TRUE
## O60635        7.401723e-06 0.0014372405   TRUE
## Q9NXG2        8.252307e-06 0.0015133815   TRUE
## P09012        1.090482e-05 0.0018945681   TRUE
## P55327        1.510017e-05 0.0024922833   TRUE
## P51858        2.116281e-05 0.0033265917   TRUE
## Q9UBE0        2.432651e-05 0.0036500828   TRUE
## P20042        4.252360e-05 0.0059770063   TRUE
## P43490        4.345597e-05 0.0059770063   TRUE
## Q15758        5.413168e-05 0.0068946333   TRUE
## Q9NRX4        5.430490e-05 0.0068946333   TRUE
## P46108        6.139711e-05 0.0070313993   TRUE
## P19338        6.191795e-05 0.0070313993   TRUE
## Q16543        6.383342e-05 0.0070313993   TRUE
## Q15370        6.390245e-05 0.0070313993   TRUE
## O75223        7.737166e-05 0.0078418078   TRUE
## Q14980        7.811674e-05 0.0078418078   TRUE
## P30085        7.839432e-05 0.0078418078   TRUE
## O75368        8.792588e-05 0.0085365688   TRUE
## O95571        1.155712e-04 0.0109000178   TRUE
## O60888        1.340708e-04 0.0122935488   TRUE
## P52434        1.658632e-04 0.0147976865   TRUE
## Q6IA17        2.024819e-04 0.0170240093   TRUE
## P42345        2.067556e-04 0.0170240093   TRUE
## P07858        2.073595e-04 0.0170240093   TRUE
## P22307        2.114463e-04 0.0170240093   TRUE
## P02787        2.386357e-04 0.0180777697   TRUE
## Q9Y266        2.447590e-04 0.0180777697   TRUE
## Q9UMX5        2.462204e-04 0.0180777697   TRUE
## Q99497        2.464404e-04 0.0180777697   TRUE
## P52565        2.552800e-04 0.0182189425   TRUE
## Q9BT22        2.594033e-04 0.0182189425   TRUE
## P62318        2.909704e-04 0.0198830351   TRUE
## P20810        3.004008e-04 0.0198830351   TRUE
## P20962        3.052852e-04 0.0198830351   TRUE
## Q06830        3.119855e-04 0.0198830351   TRUE
## P22061        3.132135e-04 0.0198830351   TRUE
## Q9NP79        3.380467e-04 0.0208321142   TRUE
## P49321        3.436786e-04 0.0208321142   TRUE
## P07305        3.470967e-04 0.0208321142   TRUE
## Q92734        3.732032e-04 0.0219620229   TRUE
## P61086        3.846515e-04 0.0219620229   TRUE
## P05387        3.921373e-04 0.0219620229   TRUE
## P06730        3.925354e-04 0.0219620229   TRUE
## P10412        4.388976e-04 0.0241466818   TRUE
## Q15293        4.951692e-04 0.0264766621   TRUE
## Q9Y2Q5        4.972896e-04 0.0264766621   TRUE
## Q02978        5.290018e-04 0.0277180137   TRUE
## P14314        5.484806e-04 0.0279436777   TRUE
## Q92522        5.502390e-04 0.0279436777   TRUE
## P04080        5.619230e-04 0.0281046654   TRUE
## Q8TE68        5.812129e-04 0.0286355771   TRUE
## Q16513        6.159407e-04 0.0295965815   TRUE
## P21333        6.186501e-04 0.0295965815   TRUE
## P12268        6.403571e-04 0.0300266956   TRUE
## Q9NR45        6.458332e-04 0.0300266956   TRUE
## Q9UHV9        6.759937e-04 0.0307421088   TRUE
## Q15181        6.870937e-04 0.0307421088   TRUE
## P29966        6.891597e-04 0.0307421088   TRUE
## P12004        7.071323e-04 0.0311232496   TRUE
## Q13451        7.633879e-04 0.0325233350   TRUE
## P62310        7.646409e-04 0.0325233350   TRUE
## Q05682        7.685005e-04 0.0325233350   TRUE
## Q9UHQ9        8.018417e-04 0.0334265797   TRUE
## P19971        8.100958e-04 0.0334265797   TRUE
## Q9H936        8.549678e-04 0.0348425762   TRUE
## O15327        9.037833e-04 0.0363827892   TRUE
## P04040        9.291932e-04 0.0369550211   TRUE
## P61960        9.524189e-04 0.0374277948   TRUE
## Q96KP4        9.943118e-04 0.0385899106   TRUE
## Q9NZJ7        1.005372e-03 0.0385899106   TRUE
## P30086        1.099172e-03 0.0413197051   TRUE
## P67936        1.101525e-03 0.0413197051   TRUE
## O43237        1.134712e-03 0.0418129681   TRUE
## Q5T4S7        1.140008e-03 0.0418129681   TRUE
## A6NL28        1.190960e-03 0.0431659823   TRUE
## O95471        1.203051e-03 0.0431659823   TRUE
## Q15185        1.224775e-03 0.0434729344   TRUE
## Q9UGI8        1.276049e-03 0.0448110374   TRUE
## Q9HB71        1.359483e-03 0.0472384632   TRUE
## Q02818        1.406665e-03 0.0477516245   TRUE
## P02790        1.416637e-03 0.0477516245   TRUE
## P06331;P01825 1.435346e-03 0.0477516245   TRUE
## P16949        1.466969e-03 0.0477516245   TRUE
## P51553        1.493832e-03 0.0477516245   TRUE
## O00233        1.499453e-03 0.0477516245   TRUE
## Q99426        1.500896e-03 0.0477516245   TRUE
## Q9HAB8        1.502631e-03 0.0477516245   TRUE
## P30044        1.504444e-03 0.0477516245   TRUE
## P31937        1.562550e-03 0.0487204296   TRUE
## Q6I9Y2        1.571747e-03 0.0487204296   TRUE
## P07741        1.579244e-03 0.0487204296   TRUE
## O95881        1.616458e-03 0.0490893501   TRUE
## P15170        1.620945e-03 0.0490893501   TRUE

There are 109 proteins with a significant effect at the 5% FDR level.

4.3 Plots

4.3.1 Volcano-plot

volcano<- ggplot(tests,aes(x=estimate,y=-log10(pval),color=signif)) + geom_point() + scale_color_manual(values=c("black","red"))
volcano

An interactive volcano plot can be obtained by using the plotly library. We alter the object tests to add the protein name to the data frame in ggplot, this will allow us to see the protein name in the interactive plot.

library(plotly)
volcano<- ggplot(tests %>% rownames_to_column("protein"), aes(x=estimate,y=-log10(pval),color=signif,protein=protein,annotation=Protein.names,gene=Gene.names)) + geom_point() + scale_color_manual(values=c("black","red"))
ggplotly(volcano)

4.3.2 Heatmap

We first select the names of the significant proteins.

sigNames<- tests %>% rownames_to_column("protein") %>% filter(signif) %>% pull(protein)
heatmap(exprs(protData)[sigNames,])

4.3.3 Detail plot for the top 5 proteins.

We first extract the normalized protein expression values for a particular protein and we create a dataframe that contains the protein expression values and the relevant variables of the experimental design.

for (protName in rownames(tests)[1:5])
{
plotDat<-pepData[fData(pepData)$Proteins==protName] %>% exprs
plotDataStack<-data.frame(quant_value=c(plotDat),sequence=rep(rownames(plotDat),rep=ncol(plotDat)),sample=rep(colnames(plotDat),each=nrow(plotDat)),condition=rep(pData(pepData)$condition,each=nrow(plotDat)))
plot1 <- ggplot(plotDataStack, aes(x=sample, y=quant_value,fill=condition))
print(plot1 +geom_boxplot(outlier.shape=NA) + geom_point(position=position_jitter(width=.1),aes(shape=sequence)) + scale_shape_manual(values=1:nrow(plotDat)) +labs(title = protName, x="sample", y="Peptide intensity (log2)"))
}

Proteomics data analysis: cancer example 9x9

5 July 2019

Contents