Eighteen Estrogen Receptor Positive Breast cancer tissues from from patients treated with tamoxifen upon recurrence have been assessed in a proteomics study. Nine patients had a good outcome (or) and the other nine had a poor outcome (pd). The proteomes have been assessed using an LTQ-Orbitrap and the thermo output .RAW files were searched with MaxQuant (version 1.4.1.2) against the human proteome database (FASTA version 2012-09, human canonical proteome).
We first import the peptides.txt file. This is the file that contains your peptide-level intensities. For a MaxQuant search [6], this peptides.txt file can be found by default in the “path_to_raw_files/combined/txt/” folder from the MaxQuant output, with “path_to_raw_files” the folder where raw files were saved. In this tutorial, we will use a MaxQuant peptides file from MaxQuant that can be found on the pdaData repository. We will use the MSnbase package to import the data.
We generate the object peptideFile with the path to the peptides.txt file. In this file we will fetch the data from the github repository linked to the pda course: https://statomics.github.io/pda. You can also replace the peptideFile with a string that points to the path of a file on your local hard drive.
With the grepEcols function we find the columns that are containing the expression data of the peptides in the peptides.txt file.
library(MSqRob)
library(MSnbase)
library(tidyverse)
library(limma)
peptidesFile <- "https://raw.githubusercontent.com/statOmics/pda/data/quantification/cancer/peptides9vs9.txt"
proteinGroupsFile <-"https://raw.githubusercontent.com/statOmics/pda/data/quantification/cancer/proteinGroups.txt"
ecols<-grepEcols(peptidesFile, "Intensity ", split = "\t")
Next we import the peptide intensities
pepData<-readMSnSet2(peptidesFile,ecol=ecols,fnames="Sequence",sep="\t")
pepData
## MSnSet (storageMode: lockedEnvironment)
## assayData: 34205 features, 18 samples
## element names: exprs
## protocolData: none
## phenoData: none
## featureData
## featureNames: AAAAAAAAAAAAAAAGAGAGAK AAAAAAAAAAGAAGGR ...
## YYYDGKDYIEFNK (34205 total)
## fvarLabels: Sequence Proteins ... Oxidation..M..site.IDs (44
## total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
## - - - Processing information - - -
## MSnbase version: 2.10.1
The pepData object is an MSnSet, a container for the data, features information and experimental annotation. They can be accessed using the accessor functions ‘exprs’ (matrix of intensities, features in rows, samples in columns), ‘fData’ (properties for each feature, peptide or protein, on the rows) and ‘pData’ (properties for the samples on the columns).
We will make use from data wrangling functionalities from the tidyverse package. The %>% operator allows us to pipe the output of one function to the next function.
head(exprs(pepData))
## Intensity.OR.01 Intensity.OR.04 Intensity.OR.07
## AAAAAAAAAAAAAAAGAGAGAK 0 17923000 6237900
## AAAAAAAAAAGAAGGR 8439600 15859000 5783100
## AAAAAAAGDSDSWDADAFSVEDPVRK 3352800 6592900 5295700
## AAAAAAALQAK 0 22380000 14302000
## AAAAAAGAASGLPGPVAQGLK 3867400 13661000 3764300
## AAAAAAGAGPEMVR 4113700 4431500 4944400
## Intensity.OR.09 Intensity.OR.10 Intensity.OR.13
## AAAAAAAAAAAAAAAGAGAGAK 0 0 2977200
## AAAAAAAAAAGAAGGR 13629000 8121000 2910900
## AAAAAAAGDSDSWDADAFSVEDPVRK 16849000 7795800 3653500
## AAAAAAALQAK 31097000 23003000 0
## AAAAAAGAASGLPGPVAQGLK 6857300 3517000 3149100
## AAAAAAGAGPEMVR 8187500 2625000 4473100
## Intensity.OR.20 Intensity.OR.23 Intensity.OR.25
## AAAAAAAAAAAAAAAGAGAGAK 0 0 5067300
## AAAAAAAAAAGAAGGR 7792100 5757700 2342000
## AAAAAAAGDSDSWDADAFSVEDPVRK 10771000 12880000 6918700
## AAAAAAALQAK 21494000 31017000 13190000
## AAAAAAGAASGLPGPVAQGLK 14372000 3131300 8151000
## AAAAAAGAGPEMVR 5150300 3830300 5399800
## Intensity.PD.02 Intensity.PD.03 Intensity.PD.04
## AAAAAAAAAAAAAAAGAGAGAK 11510000 3003800 0
## AAAAAAAAAAGAAGGR 11056000 4491600 5673700
## AAAAAAAGDSDSWDADAFSVEDPVRK 4721100 6766500 5119600
## AAAAAAALQAK 31208000 15994000 18562000
## AAAAAAGAASGLPGPVAQGLK 5036100 9125600 2497600
## AAAAAAGAGPEMVR 0 3424900 2745700
## Intensity.PD.06 Intensity.PD.07 Intensity.PD.08
## AAAAAAAAAAAAAAAGAGAGAK 5056200 6605000 0
## AAAAAAAAAAGAAGGR 4104500 0 12253000
## AAAAAAAGDSDSWDADAFSVEDPVRK 5098200 1946300 3137200
## AAAAAAALQAK 15672000 0 19137000
## AAAAAAGAASGLPGPVAQGLK 6499900 2249100 8652700
## AAAAAAGAGPEMVR 4143800 1792700 4349800
## Intensity.PD.09 Intensity.PD.10 Intensity.PD.11
## AAAAAAAAAAAAAAAGAGAGAK 0 0 2947100
## AAAAAAAAAAGAAGGR 7015300 0 3825800
## AAAAAAAGDSDSWDADAFSVEDPVRK 2254200 7580700 5205500
## AAAAAAALQAK 8536800 21467000 9194100
## AAAAAAGAASGLPGPVAQGLK 2519600 3397200 8505100
## AAAAAAGAGPEMVR 0 2373000 1479100
pepData %>% exprs %>% head
## Intensity.OR.01 Intensity.OR.04 Intensity.OR.07
## AAAAAAAAAAAAAAAGAGAGAK 0 17923000 6237900
## AAAAAAAAAAGAAGGR 8439600 15859000 5783100
## AAAAAAAGDSDSWDADAFSVEDPVRK 3352800 6592900 5295700
## AAAAAAALQAK 0 22380000 14302000
## AAAAAAGAASGLPGPVAQGLK 3867400 13661000 3764300
## AAAAAAGAGPEMVR 4113700 4431500 4944400
## Intensity.OR.09 Intensity.OR.10 Intensity.OR.13
## AAAAAAAAAAAAAAAGAGAGAK 0 0 2977200
## AAAAAAAAAAGAAGGR 13629000 8121000 2910900
## AAAAAAAGDSDSWDADAFSVEDPVRK 16849000 7795800 3653500
## AAAAAAALQAK 31097000 23003000 0
## AAAAAAGAASGLPGPVAQGLK 6857300 3517000 3149100
## AAAAAAGAGPEMVR 8187500 2625000 4473100
## Intensity.OR.20 Intensity.OR.23 Intensity.OR.25
## AAAAAAAAAAAAAAAGAGAGAK 0 0 5067300
## AAAAAAAAAAGAAGGR 7792100 5757700 2342000
## AAAAAAAGDSDSWDADAFSVEDPVRK 10771000 12880000 6918700
## AAAAAAALQAK 21494000 31017000 13190000
## AAAAAAGAASGLPGPVAQGLK 14372000 3131300 8151000
## AAAAAAGAGPEMVR 5150300 3830300 5399800
## Intensity.PD.02 Intensity.PD.03 Intensity.PD.04
## AAAAAAAAAAAAAAAGAGAGAK 11510000 3003800 0
## AAAAAAAAAAGAAGGR 11056000 4491600 5673700
## AAAAAAAGDSDSWDADAFSVEDPVRK 4721100 6766500 5119600
## AAAAAAALQAK 31208000 15994000 18562000
## AAAAAAGAASGLPGPVAQGLK 5036100 9125600 2497600
## AAAAAAGAGPEMVR 0 3424900 2745700
## Intensity.PD.06 Intensity.PD.07 Intensity.PD.08
## AAAAAAAAAAAAAAAGAGAGAK 5056200 6605000 0
## AAAAAAAAAAGAAGGR 4104500 0 12253000
## AAAAAAAGDSDSWDADAFSVEDPVRK 5098200 1946300 3137200
## AAAAAAALQAK 15672000 0 19137000
## AAAAAAGAASGLPGPVAQGLK 6499900 2249100 8652700
## AAAAAAGAGPEMVR 4143800 1792700 4349800
## Intensity.PD.09 Intensity.PD.10 Intensity.PD.11
## AAAAAAAAAAAAAAAGAGAGAK 0 0 2947100
## AAAAAAAAAAGAAGGR 7015300 0 3825800
## AAAAAAAGDSDSWDADAFSVEDPVRK 2254200 7580700 5205500
## AAAAAAALQAK 8536800 21467000 9194100
## AAAAAAGAASGLPGPVAQGLK 2519600 3397200 8505100
## AAAAAAGAGPEMVR 0 2373000 1479100
pepData %>% sampleNames
## [1] "Intensity.OR.01" "Intensity.OR.04" "Intensity.OR.07" "Intensity.OR.09"
## [5] "Intensity.OR.10" "Intensity.OR.13" "Intensity.OR.20" "Intensity.OR.23"
## [9] "Intensity.OR.25" "Intensity.PD.02" "Intensity.PD.03" "Intensity.PD.04"
## [13] "Intensity.PD.06" "Intensity.PD.07" "Intensity.PD.08" "Intensity.PD.09"
## [17] "Intensity.PD.10" "Intensity.PD.11"
The sample names are rather long and contain information on the spike-in concentration and the repeat. We will remove “Intensity.” from the string
sampleNames(pepData) <- pepData %>% sampleNames %>% str_replace(., pattern="Intensity.", replacement="")
Next we will add information on the proteins to the feature data. The feature data of the object pepData contains information for each peptide in the experiment. This info on the proteins can be found in the proteingroups.txt file of maxquant. The peptides.txt file contains many data on each feature (peptide). We will only retain specified columns Protein, Sequence, Reverse and Potential.Contaminant, indicating the protein, peptide sequence, if the peptide is a decoy and if it is a contaminant, respectively.
In older and more recent versions of maxquant the names of the fields might vary. We can therefore explore the names we want to select with the shiny app that can be launched from the selectFeatureData from the MSnbase package, i.e. by running the function without an fcol argument that specifies the columns you would like to select.
The code can be found below, but is not executed by this Rmd script (note, that the flag eval=FALSE has been used as option for the R chunk).
pepData<-selectFeatureData(pepData)
If we know the columns, we can specify them as a vector in the fcol slot.
pepData<-selectFeatureData(pepData,fcol=c("Proteins","Protein.names","Gene.names", "Contaminant","Reverse","Sequence"))
Calculate how many non zero intensities we have per peptide. This will be useful for filtering.
fData(pepData)$nNonZero<- rowSums(exprs(pepData) > 0)
Peptides with zero intensities are missing peptides and should be represent with a NA
value instead of 0
.
exprs(pepData)[exprs(pepData)==0]<-NA
Next we create the data on the experimental layout. We can do this based on the samplenames. The information on the spike in condition is available as the first letter of the sample name. We extract this information and recast the character vector in a factor.
pData(pepData)$condition <- pepData %>% sampleNames %>% substr(1,2) %>% as.factor
We can inspect the missingness in our data with the plotNA()
function provided with MSnbase
. 47% of all peptide intensities are missing and for some peptides we don’t even measure a signal in any sample. The missingness is similar across samples. Note, that we plot the peptide data, so the label protein in the plot refers to peptides.
plotNA(pepData)
When we look at density plots of the intensities, we see that most intensities are low and that there is a long tail to the right. It is more useful to explore the data upon log transformation.
plotDensities(exprs(pepData))
nrow(pepData)
## [1] 34205
There are 34205 peptides before preprocessing.
We will log transform, normalize, filter and summarize the data.
pepData <- log(pepData, base = 2)
plotDensities(exprs(pepData))
When we look at the densities we see small shifts in location and shape of the density distributions. So normalisation will be required. But, we will first perform filtering.
In our approach a peptide can map to multiple proteins, as long as there is none of these proteins present in a smaller subgroup.
pepData<-pepData[fData(pepData)$Proteins %in% smallestUniqueGroups(fData(pepData)$Proteins),]
We now remove the contaminants, peptides that map to decoy sequences and proteins, which were only identified by peptides with modifications.
pepData <- pepData[fData(pepData)$Reverse!="+",]
pepData <- pepData[fData(pepData)$Contaminant!="+",]
Proteins for which all peptides are carrying modifications (PTMs) can be considered as unreliable. We will filter out these proteins. This information is included in the Only.identified.by.site column of proteinGroups.txt file of maxquant. The variable is containing a “+” character if the protein is only identified by modified peptides and an empty string “” if this is not the case. Sometimes an NA value is also present. We will replace these by the empty character string. The information of the peptides.txt file can be linked to the proteinGroups.txt file by using the Proteins column from the peptides.txt file and the Protein.IDs column in the proteinGroups.txt file. If NAs are included in the
proteinGroups<-read.table(proteinGroupsFile, sep = "\t", header = TRUE, quote = "", comment.char = "")
only_site <- proteinGroups$Only.identified.by.site
only_site[is.na(only_site)] <- ""
proteinsOnlyIdentifiedWithPTMs <- proteinGroups$Protein.IDs[only_site=="+"]
pepData<-pepData[!(fData(pepData)$Proteins %in% proteinsOnlyIdentifiedWithPTMs),]
We will now remove the proteinGroups object from the R session to free memory.
rm(proteinGroups,only_site)
gc()
## used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells 4134795 220.9 8535498 455.9 NA 5361388 286.4
## Vcells 7927960 60.5 20266106 154.7 16384 20263743 154.7
We want to keep peptide that were at least observed twice.
pepData<-pepData[fData(pepData)$nNonZero>=2,]
nrow(pepData)
## [1] 26689
We keep 26689 peptides upon filtering.
pepData <- normalise(pepData, "quantiles")
Upon normalisation the density curves for all samples coincide.
plotDensities(exprs(pepData))
We can visualize our data using a Multi Dimensional Scaling plot, eg. as provided by the limma package.
plotMDS(exprs(pepData),col=as.double(pData(pepData)$condition))
The first axis in the plot is showing the leading log fold changes (differences on the log scale) between the samples. We notice that the leading differences (log FC) in the peptide data seems to be driven by technical variability. Indeed the samples do not seem to be clearly separated according to the spike in condition. Because there are missing values for the peptide level data we use the na.rm=TRUE
argument to summarize the data based on the observed peptides for every protein.
protData<-combineFeatures(pepData,fcol="Proteins",method="robust",na.rm=TRUE)
## Your data contains missing values. Please read the relevant section
## in the combineFeatures manual page for details the effects of
## missing values on data aggregation.
If you summarize using robust summarization please refer to Sticker et al. (2019) https://www.biorxiv.org/content/10.1101/668863v1
We filter proteins for which we have at least two protein expression values in both conditions:
protData<-protData[
protData %>% exprs %>% is.na %>% `!` %>% rowSums>4,]
plotMDS(exprs(protData),col=as.double(pData(pepData)$condition))
MSqRob is currently working with a format where we have one dataframe for each protein. This will be changed in the next release. Therefore we first have to reorganise the data.
Next the models are fitted. This is done using the fit.model function. We only have to model the data using the factor condition from the pheno data of the protein level MSnSet. The name of the factor variable is specified in the fixed argument (if multiple predictors have to be incorporated in the model, a vector of variable names has to be provided in this argument.). The argument shrinkage is used to specify if ridge regression has to be adopted. For the sake of speed we do not do this in the tutorial. The shrinkage has to be specified for each variable in the fixed effects. We also have to indicate this for the intercept (which we never shrink). So we specify it at c(0,0) to indicate that the intercept (first 0) and the parameters for the factor condition (second 0) are not penalized. We set the robust_var function equal to FALSE, this functionality will be removed from the package in the next release.
protMSqRob <- MSnSet2protdata(protData, "Proteins",annotations=c("Protein.names","Gene.names"))
models <- fit.model(protdata=protMSqRob, response="quant_value", fixed="condition",shrinkage.fixed=c(0,0),robust_var=FALSE)
Many biologists have problems with the reference coding. In MSqRob we have opted to formulate contrasts using all levels of a factor. Internally, the contrasts are than recasted according to the factor level that is the reference class.
L <- makeContrast("conditionPD - conditionOR", levels=c("conditionPD","conditionOR"))
tests <- test.contrast_adjust(models, L)
nSig <- sum(tests$signif,na.rm=TRUE)
head(tests,nSig)
## Protein.names
## P61604 10 kDa heat shock protein, mitochondrial
## P06454 Prothymosin alpha;Thymosin alpha-1
## Q13765 Nascent polypeptide-associated complex subunit alpha
## P49006 MARCKS-related protein
## Q96K17 Transcription factor BTF3 homolog 4
## P07108 Acyl-CoA-binding protein
## O43396 Thioredoxin-like protein 1
## Q9GZT3 SRA stem-loop-interacting RNA-binding protein, mitochondrial
## P09496 Clathrin light chain A
## Q9H299 SH3 domain-binding glutamic acid-rich-like protein 3
## P54727 UV excision repair protein RAD23 homolog B
## P63313 Thymosin beta-10
## Q8NC51 Plasminogen activator inhibitor 1 RNA-binding protein
## P30084 Enoyl-CoA hydratase, mitochondrial
## P27797 Calreticulin
## O43765 Small glutamine-rich tetratricopeptide repeat-containing protein alpha
## O60635 Tetraspanin-1
## Q9NXG2 THUMP domain-containing protein 1
## P09012 U1 small nuclear ribonucleoprotein A
## P55327 Tumor protein D52
## P51858 Hepatoma-derived growth factor
## Q9UBE0 SUMO-activating enzyme subunit 1
## P20042 Eukaryotic translation initiation factor 2 subunit 2
## P43490 Nicotinamide phosphoribosyltransferase
## Q15758 Neutral amino acid transporter B(0)
## Q9NRX4 14 kDa phosphohistidine phosphatase
## P46108 Adapter molecule crk
## P19338 Nucleolin
## Q16543 Hsp90 co-chaperone Cdc37
## Q15370 Transcription elongation factor B polypeptide 2
## O75223 Gamma-glutamylcyclotransferase
## Q14980 Nuclear mitotic apparatus protein 1
## P30085 UMP-CMP kinase
## O75368 SH3 domain-binding glutamic acid-rich-like protein
## O95571 Protein ETHE1, mitochondrial
## O60888 Protein CutA
## P52434 DNA-directed RNA polymerases I, II, and III subunit RPABC3
## Q6IA17 Single Ig IL-1-related receptor
## P42345 Serine/threonine-protein kinase mTOR
## P07858 Cathepsin B;Cathepsin B light chain;Cathepsin B heavy chain
## P22307 Non-specific lipid-transfer protein
## P02787 Serotransferrin
## Q9Y266 Nuclear migration protein nudC
## Q9UMX5 Neudesin
## Q99497 Protein DJ-1
## P52565 Rho GDP-dissociation inhibitor 1
## Q9BT22 Chitobiosyldiphosphodolichol beta-mannosyltransferase
## P62318 Small nuclear ribonucleoprotein Sm D3
## P20810 Calpastatin
## P20962 Parathymosin
## Q06830 Peroxiredoxin-1
## P22061 Protein-L-isoaspartate(D-aspartate) O-methyltransferase
## Q9NP79 Vacuolar protein sorting-associated protein VTA1 homolog
## P49321 Nuclear autoantigenic sperm protein
## P07305 Histone H1.0
## Q92734 Protein TFG
## P61086 Ubiquitin-conjugating enzyme E2 K
## P05387 60S acidic ribosomal protein P2
## P06730 Eukaryotic translation initiation factor 4E
## P10412 Histone H1.4
## Q15293 Reticulocalbin-1
## Q9Y2Q5 Ragulator complex protein LAMTOR2
## Q02978 Mitochondrial 2-oxoglutarate/malate carrier protein
## P14314 Glucosidase 2 subunit beta
## Q92522 Histone H1x
## P04080 Cystatin-B
## Q8TE68 Epidermal growth factor receptor kinase substrate 8-like protein 1
## Q16513 Serine/threonine-protein kinase N2
## P21333 Filamin-A
## P12268 Inosine-5-monophosphate dehydrogenase 2
## Q9NR45 Sialic acid synthase
## Q9UHV9 Prefoldin subunit 2
## Q15181 Inorganic pyrophosphatase
## P29966 Myristoylated alanine-rich C-kinase substrate
## P12004 Proliferating cell nuclear antigen
## Q13451 Peptidyl-prolyl cis-trans isomerase FKBP5
## P62310 U6 snRNA-associated Sm-like protein LSm3
## Q05682 Caldesmon
## Q9UHQ9 NADH-cytochrome b5 reductase 1
## P19971 Thymidine phosphorylase
## Q9H936 Mitochondrial glutamate carrier 1
## O15327 Type II inositol 3,4-bisphosphate 4-phosphatase
## P04040 Catalase
## P61960 Ubiquitin-fold modifier 1
## Q96KP4 Cytosolic non-specific dipeptidase
## Q9NZJ7 Mitochondrial carrier homolog 1
## P30086 Phosphatidylethanolamine-binding protein 1;Hippocampal cholinergic neurostimulating peptide
## P67936 Tropomyosin alpha-4 chain
## O43237 Cytoplasmic dynein 1 light intermediate chain 2
## Q5T4S7 E3 ubiquitin-protein ligase UBR4
## A6NL28 Putative tropomyosin alpha-3 chain-like protein
## O95471 Claudin-7
## Q15185 Prostaglandin E synthase 3
## Q9UGI8 Testin
## Q9HB71 Calcyclin-binding protein
## Q02818 Nucleobindin-1
## P02790 Hemopexin
## P06331;P01825 Ig heavy chain V-II region ARH-77;Ig heavy chain V-II region NEWM
## P16949 Stathmin
## P51553 Isocitrate dehydrogenase [NAD] subunit gamma, mitochondrial
## O00233 26S proteasome non-ATPase regulatory subunit 9
## Q99426 Tubulin-folding cofactor B
## Q9HAB8 Phosphopantothenate--cysteine ligase
## P30044 Peroxiredoxin-5, mitochondrial
## P31937 3-hydroxyisobutyrate dehydrogenase, mitochondrial
## Q6I9Y2 THO complex subunit 7 homolog
## P07741 Adenine phosphoribosyltransferase
## O95881 Thioredoxin domain-containing protein 12
## P15170 Eukaryotic peptide chain release factor GTP-binding subunit ERF3A
## Gene.names estimate se df Tval
## P61604 HSPE1 2.5592734 0.3123860 21.11742 8.192664
## P06454 PTMA 3.5347330 0.4492307 21.11742 7.868414
## Q13765 NACA 1.4505256 0.1957771 21.11742 7.409067
## P49006 MARCKSL1 2.1881063 0.2968877 19.11742 7.370149
## Q96K17 BTF3L4 1.6512774 0.2156977 17.11742 7.655518
## P07108 DBI 4.3219261 0.4884625 12.11742 8.848021
## O43396 TXNL1 1.8716784 0.2805144 21.11742 6.672308
## Q9GZT3 SLIRP 2.0996831 0.3149371 21.11742 6.666991
## P09496 CLTA 1.4139193 0.2184747 21.11742 6.471774
## Q9H299 SH3BGRL3 1.9496508 0.3031351 21.11742 6.431623
## P54727 RAD23B 1.5583760 0.2444316 21.11742 6.375509
## P63313 TMSB10 3.3690447 0.4668663 15.11742 7.216295
## Q8NC51 SERBP1 1.6681795 0.2670389 21.11742 6.246952
## P30084 ECHS1 1.9146274 0.3183384 21.11742 6.014441
## P27797 CALR 1.4148822 0.2363851 21.11742 5.985498
## O43765 SGTA 1.1014428 0.1865749 21.11742 5.903489
## O60635 TSPAN1 -1.8691707 0.2624083 13.11742 -7.123140
## Q9NXG2 THUMPD1 1.5396595 0.2634810 21.11742 5.843532
## P09012 SNRPA 1.8345287 0.3056167 18.11742 6.002710
## P55327 TPD52 1.7182370 0.3078550 21.11742 5.581318
## P51858 HDGF 1.9604606 0.3558898 20.11742 5.508617
## Q9UBE0 SAE1 1.1223951 0.2087599 21.11742 5.376489
## P20042 EIF2S2 0.9934941 0.1933373 21.11742 5.138657
## P43490 NAMPT 1.3535312 0.2638740 21.11742 5.129461
## Q15758 SLC1A5 -1.4336978 0.2735523 18.11742 -5.241036
## Q9NRX4 PHPT1 1.8402235 0.3512179 18.11742 5.239549
## P46108 CRK 1.5708413 0.2981835 17.11742 5.268036
## P19338 NCL 1.1259343 0.2261039 21.11742 4.979720
## Q16543 CDC37 1.0484856 0.2110960 21.11742 4.966867
## Q15370 TCEB2 1.3142814 0.2615779 20.11742 5.024436
## O75223 GGCT 1.7772065 0.3637474 21.11742 4.885826
## Q14980 NUMA1 -1.1291421 0.2312966 21.11742 -4.881792
## P30085 CMPK1 1.1509752 0.2358411 21.11742 4.880300
## O75368 SH3BGRL 2.2060894 0.4565541 21.11742 4.832043
## O95571 ETHE1 1.5460854 0.3061063 16.11742 5.050813
## O60888 CUTA 1.1259257 0.2365392 19.11742 4.759997
## P52434 POLR2H 1.0396618 0.2276945 21.11742 4.566039
## Q6IA17 SIGIRR -1.0253851 0.2181514 17.11742 -4.700337
## P42345 MTOR -2.2740172 0.4033124 10.11742 -5.638352
## P07858 CTSB 1.3433419 0.2941021 19.11742 4.567604
## P22307 SCP2 1.2424767 0.2782961 21.11742 4.464586
## P02787 TF 1.4893302 0.3374040 21.11742 4.414085
## Q9Y266 NUDC 0.9351525 0.2123652 21.11742 4.403511
## Q9UMX5 NENF 1.3761685 0.2940443 16.11742 4.680140
## Q99497 PARK7 1.2953983 0.2943649 21.11742 4.400654
## P52565 ARHGDIA 1.6594679 0.3783601 21.11742 4.385948
## Q9BT22 ALG1 -1.0379660 0.2322491 19.11742 -4.469193
## P62318 SNRPD3 0.8169473 0.1886125 21.11742 4.331353
## P20810 CAST 1.3723052 0.3148262 20.11742 4.358930
## P20962 PTMS 2.5183460 0.5503382 16.11742 4.575997
## Q06830 PRDX1 0.8253918 0.1918505 21.11742 4.302265
## P22061 PCMT1 1.1947252 0.2778026 21.11742 4.300627
## Q9NP79 VTA1 0.8770644 0.2054590 21.11742 4.268805
## P49321 NASP 1.1415431 0.2678475 21.11742 4.261915
## P07305 H1F0 1.3082684 0.2898209 16.11742 4.514058
## Q92734 TFG 1.0270438 0.2429410 21.11742 4.227545
## P61086 UBE2K 0.9569681 0.2270417 21.11742 4.214945
## P05387 RPLP2 1.1108529 0.2640546 21.11742 4.206907
## P06730 EIF4E 1.7050960 0.3694762 14.11742 4.614901
## P10412 HIST1H1E 1.4081105 0.3199354 16.11742 4.401233
## Q15293 RCN1 1.9214754 0.4634895 20.11742 4.145672
## Q9Y2Q5 LAMTOR2 1.0583258 0.2529308 19.11742 4.184250
## Q02978 SLC25A11 -1.0482598 0.2567983 21.11742 -4.082035
## P14314 PRKCSH 0.8195419 0.2015128 21.11742 4.066947
## Q92522 H1FX 1.5712942 0.3539711 14.11742 4.439047
## P04080 CSTB 1.3639992 0.3362218 21.11742 4.056843
## Q8TE68 EPS8L1 1.3286881 0.3012425 14.11742 4.410692
## Q16513 PKN2 1.9551677 0.4146200 11.11742 4.715565
## P21333 FLNA 1.3398967 0.3335814 21.11742 4.016701
## P12268 IMPDH2 0.8712732 0.2176929 21.11742 4.002303
## Q9NR45 NANS 1.2928056 0.3233026 21.11742 3.998748
## Q9UHV9 PFDN2 0.8523008 0.2123892 20.11742 4.012919
## Q15181 PPA1 1.5567884 0.3886169 20.11742 4.005972
## P29966 MARCKS 1.5499687 0.3834901 19.11742 4.041744
## P12004 PCNA 1.1667833 0.2945770 21.11742 3.960877
## Q13451 FKBP5 1.5069044 0.3732050 18.11742 4.037739
## P62310 LSM3 1.3331794 0.3223234 16.11742 4.136155
## Q05682 CALD1 1.2063519 0.3047726 20.11742 3.958204
## Q9UHQ9 CYB5R1 -0.9987964 0.2555546 21.11742 -3.908348
## P19971 TYMP 1.8459088 0.4728170 21.11742 3.904066
## Q9H936 SLC25A22 -0.9448232 0.2434157 21.11742 -3.881521
## O15327 INPP4B 1.1928744 0.3040420 19.11742 3.923388
## P04040 CAT 1.0076194 0.2619447 21.11742 3.846687
## P61960 UFM1 1.1882509 0.3073106 20.11742 3.866612
## Q96KP4 CNDP2 1.0827400 0.2835639 21.11742 3.818328
## Q9NZJ7 MTCH1 -0.9997059 0.2421078 14.11742 -4.129176
## P30086 PEBP1 -1.4050950 0.3720807 21.11742 -3.776318
## P67936 TPM4 1.5197110 0.4025275 21.11742 3.775421
## O43237 DYNC1LI2 0.8001891 0.2126480 21.11742 3.762975
## Q5T4S7 UBR4 -0.7998037 0.2126559 21.11742 -3.761023
## A6NL28 0.8836174 0.2360921 21.11742 3.742681
## O95471 CLDN7 -1.7505708 0.4404022 15.11742 -3.974937
## Q15185 PTGES3 0.9750688 0.2613472 21.11742 3.730932
## Q9UGI8 TES 1.4872202 0.3768812 15.11742 3.946124
## Q9HB71 CACYBP 0.9409158 0.2533127 20.11742 3.714444
## Q02818 NUCB1 1.0305185 0.2805836 21.11742 3.672769
## P02790 HPX 1.2526114 0.3388362 20.11742 3.696805
## P06331;P01825 1.8447009 0.4743805 15.11742 3.888653
## P16949 STMN1 1.6465449 0.4351696 17.11742 3.783685
## P51553 IDH3G -0.9963271 0.2690001 19.11742 -3.703817
## O00233 PSMD9 1.1337745 0.2970251 16.11742 3.817099
## Q99426 TBCB 1.1515715 0.3110882 19.11742 3.701753
## Q9HAB8 PPCS 0.7949434 0.2180909 21.11742 3.645010
## P30044 PRDX5 0.8034359 0.2204515 21.11742 3.644503
## P31937 HIBADH 0.7244339 0.1982152 20.11742 3.654785
## Q6I9Y2 THOC7 2.4902814 0.6144630 12.11742 4.052777
## P07741 APRT 1.2005078 0.3288858 20.11742 3.650227
## O95881 TXNDC12 1.0496663 0.2740157 15.11742 3.830679
## P15170 GSPT1 1.0795850 0.2966670 20.11742 3.639047
## pval qval signif
## P61604 5.383583e-08 0.0001704142 TRUE
## P06454 1.032501e-07 0.0001704142 TRUE
## Q13765 2.660805e-07 0.0002927772 TRUE
## P49006 5.335100e-07 0.0004174311 TRUE
## Q96K17 6.322797e-07 0.0004174311 TRUE
## P07108 1.230123e-06 0.0005377809 TRUE
## O43396 1.288233e-06 0.0005377809 TRUE
## Q9GZT3 1.303316e-06 0.0005377809 TRUE
## P09496 2.003454e-06 0.0007229229 TRUE
## Q9H299 2.190012e-06 0.0007229229 TRUE
## P54727 2.481036e-06 0.0007445363 TRUE
## P63313 2.860298e-06 0.0007868202 TRUE
## Q8NC51 3.306851e-06 0.0008396859 TRUE
## P30084 5.588687e-06 0.0013134908 TRUE
## P27797 5.968604e-06 0.0013134908 TRUE
## O43765 7.195002e-06 0.0014372405 TRUE
## O60635 7.401723e-06 0.0014372405 TRUE
## Q9NXG2 8.252307e-06 0.0015133815 TRUE
## P09012 1.090482e-05 0.0018945681 TRUE
## P55327 1.510017e-05 0.0024922833 TRUE
## P51858 2.116281e-05 0.0033265917 TRUE
## Q9UBE0 2.432651e-05 0.0036500828 TRUE
## P20042 4.252360e-05 0.0059770063 TRUE
## P43490 4.345597e-05 0.0059770063 TRUE
## Q15758 5.413168e-05 0.0068946333 TRUE
## Q9NRX4 5.430490e-05 0.0068946333 TRUE
## P46108 6.139711e-05 0.0070313993 TRUE
## P19338 6.191795e-05 0.0070313993 TRUE
## Q16543 6.383342e-05 0.0070313993 TRUE
## Q15370 6.390245e-05 0.0070313993 TRUE
## O75223 7.737166e-05 0.0078418078 TRUE
## Q14980 7.811674e-05 0.0078418078 TRUE
## P30085 7.839432e-05 0.0078418078 TRUE
## O75368 8.792588e-05 0.0085365688 TRUE
## O95571 1.155712e-04 0.0109000178 TRUE
## O60888 1.340708e-04 0.0122935488 TRUE
## P52434 1.658632e-04 0.0147976865 TRUE
## Q6IA17 2.024819e-04 0.0170240093 TRUE
## P42345 2.067556e-04 0.0170240093 TRUE
## P07858 2.073595e-04 0.0170240093 TRUE
## P22307 2.114463e-04 0.0170240093 TRUE
## P02787 2.386357e-04 0.0180777697 TRUE
## Q9Y266 2.447590e-04 0.0180777697 TRUE
## Q9UMX5 2.462204e-04 0.0180777697 TRUE
## Q99497 2.464404e-04 0.0180777697 TRUE
## P52565 2.552800e-04 0.0182189425 TRUE
## Q9BT22 2.594033e-04 0.0182189425 TRUE
## P62318 2.909704e-04 0.0198830351 TRUE
## P20810 3.004008e-04 0.0198830351 TRUE
## P20962 3.052852e-04 0.0198830351 TRUE
## Q06830 3.119855e-04 0.0198830351 TRUE
## P22061 3.132135e-04 0.0198830351 TRUE
## Q9NP79 3.380467e-04 0.0208321142 TRUE
## P49321 3.436786e-04 0.0208321142 TRUE
## P07305 3.470967e-04 0.0208321142 TRUE
## Q92734 3.732032e-04 0.0219620229 TRUE
## P61086 3.846515e-04 0.0219620229 TRUE
## P05387 3.921373e-04 0.0219620229 TRUE
## P06730 3.925354e-04 0.0219620229 TRUE
## P10412 4.388976e-04 0.0241466818 TRUE
## Q15293 4.951692e-04 0.0264766621 TRUE
## Q9Y2Q5 4.972896e-04 0.0264766621 TRUE
## Q02978 5.290018e-04 0.0277180137 TRUE
## P14314 5.484806e-04 0.0279436777 TRUE
## Q92522 5.502390e-04 0.0279436777 TRUE
## P04080 5.619230e-04 0.0281046654 TRUE
## Q8TE68 5.812129e-04 0.0286355771 TRUE
## Q16513 6.159407e-04 0.0295965815 TRUE
## P21333 6.186501e-04 0.0295965815 TRUE
## P12268 6.403571e-04 0.0300266956 TRUE
## Q9NR45 6.458332e-04 0.0300266956 TRUE
## Q9UHV9 6.759937e-04 0.0307421088 TRUE
## Q15181 6.870937e-04 0.0307421088 TRUE
## P29966 6.891597e-04 0.0307421088 TRUE
## P12004 7.071323e-04 0.0311232496 TRUE
## Q13451 7.633879e-04 0.0325233350 TRUE
## P62310 7.646409e-04 0.0325233350 TRUE
## Q05682 7.685005e-04 0.0325233350 TRUE
## Q9UHQ9 8.018417e-04 0.0334265797 TRUE
## P19971 8.100958e-04 0.0334265797 TRUE
## Q9H936 8.549678e-04 0.0348425762 TRUE
## O15327 9.037833e-04 0.0363827892 TRUE
## P04040 9.291932e-04 0.0369550211 TRUE
## P61960 9.524189e-04 0.0374277948 TRUE
## Q96KP4 9.943118e-04 0.0385899106 TRUE
## Q9NZJ7 1.005372e-03 0.0385899106 TRUE
## P30086 1.099172e-03 0.0413197051 TRUE
## P67936 1.101525e-03 0.0413197051 TRUE
## O43237 1.134712e-03 0.0418129681 TRUE
## Q5T4S7 1.140008e-03 0.0418129681 TRUE
## A6NL28 1.190960e-03 0.0431659823 TRUE
## O95471 1.203051e-03 0.0431659823 TRUE
## Q15185 1.224775e-03 0.0434729344 TRUE
## Q9UGI8 1.276049e-03 0.0448110374 TRUE
## Q9HB71 1.359483e-03 0.0472384632 TRUE
## Q02818 1.406665e-03 0.0477516245 TRUE
## P02790 1.416637e-03 0.0477516245 TRUE
## P06331;P01825 1.435346e-03 0.0477516245 TRUE
## P16949 1.466969e-03 0.0477516245 TRUE
## P51553 1.493832e-03 0.0477516245 TRUE
## O00233 1.499453e-03 0.0477516245 TRUE
## Q99426 1.500896e-03 0.0477516245 TRUE
## Q9HAB8 1.502631e-03 0.0477516245 TRUE
## P30044 1.504444e-03 0.0477516245 TRUE
## P31937 1.562550e-03 0.0487204296 TRUE
## Q6I9Y2 1.571747e-03 0.0487204296 TRUE
## P07741 1.579244e-03 0.0487204296 TRUE
## O95881 1.616458e-03 0.0490893501 TRUE
## P15170 1.620945e-03 0.0490893501 TRUE
There are 109 proteins with a significant effect at the 5% FDR level.
volcano<- ggplot(tests,aes(x=estimate,y=-log10(pval),color=signif)) + geom_point() + scale_color_manual(values=c("black","red"))
volcano
An interactive volcano plot can be obtained by using the plotly library. We alter the object tests to add the protein name to the data frame in ggplot, this will allow us to see the protein name in the interactive plot.
library(plotly)
volcano<- ggplot(tests %>% rownames_to_column("protein"), aes(x=estimate,y=-log10(pval),color=signif,protein=protein,annotation=Protein.names,gene=Gene.names)) + geom_point() + scale_color_manual(values=c("black","red"))
ggplotly(volcano)
We first select the names of the significant proteins.
sigNames<- tests %>% rownames_to_column("protein") %>% filter(signif) %>% pull(protein)
heatmap(exprs(protData)[sigNames,])
We first extract the normalized protein expression values for a particular protein and we create a dataframe that contains the protein expression values and the relevant variables of the experimental design.
for (protName in rownames(tests)[1:5])
{
plotDat<-pepData[fData(pepData)$Proteins==protName] %>% exprs
plotDataStack<-data.frame(quant_value=c(plotDat),sequence=rep(rownames(plotDat),rep=ncol(plotDat)),sample=rep(colnames(plotDat),each=nrow(plotDat)),condition=rep(pData(pepData)$condition,each=nrow(plotDat)))
plot1 <- ggplot(plotDataStack, aes(x=sample, y=quant_value,fill=condition))
print(plot1 +geom_boxplot(outlier.shape=NA) + geom_point(position=position_jitter(width=.1),aes(shape=sequence)) + scale_shape_manual(values=1:nrow(plotDat)) +labs(title = protName, x="sample", y="Peptide intensity (log2)"))
}