Duguet et al. 2017 compared the proteomes of mouse regulatory T cells (Treg) and conventional T cells (Tconv) in order to discover differentially regulated proteins between these two cell populations. For each biological repeat the proteomes were extracted for both Treg and Tconv cell pools, which were purified by flow cytometry. The data in data/quantification/mouseTcell on the pdaData repository are a subset of the data PXD004436 on PRIDE.
We will use a subset of the data with a randomized complete block (RCB) design, i.e. the dataset consists of four mice for which the proteome of both conventional and regulatory T cells are assessed.
We first import the peptides.txt file. This is the file that contains your peptide-level intensities. For a MaxQuant search [6], this peptides.txt file can be found by default in the “path_to_raw_files/combined/txt/” folder from the MaxQuant output, with “path_to_raw_files” the folder where raw files were saved. In this tutorial, we will use a MaxQuant peptides file from MaxQuant that can be found on the pdaData repository. We will use the MSnbase package to import the data.
We generate the object peptideFile with the path to the peptides.txt file. In this file we will fetch the data from the github repository linked to the pda course: https://statomics.github.io/pda. You can also replace the peptideFile with a string that points to the path of a file on your local hard drive.
With the grepEcols function we find the columns that are containing the expression data of the peptides in the peptides.txt file.
library(MSqRob)
library(MSnbase)
library(tidyverse)
library(limma)
peptidesFile <- "https://raw.githubusercontent.com/statOmics/pda/data/quantification/mouseTcell/peptidesRCB.txt"
proteinGroupsFile <-"https://raw.githubusercontent.com/statOmics/pda/data/quantification/mouseTcell/proteinGroups.txt"
ecols<-grepEcols(peptidesFile, "Intensity ", split = "\t")
Next we import the peptide intensities
pepData<-readMSnSet2(peptidesFile,ecol=ecols,fnames="Sequence",sep="\t")
pepData
## MSnSet (storageMode: lockedEnvironment)
## assayData: 55814 features, 8 samples
## element names: exprs
## protocolData: none
## phenoData: none
## featureData
## featureNames: AAAAAAAAAAGAAGGR AAAAAAAAAAGDSDSWDADTFSMEDPVRK ...
## YYYDKNIIHK (55814 total)
## fvarLabels: Sequence N.term.cleavage.window ... MS.MS.Count (74
## total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
## - - - Processing information - - -
## MSnbase version: 2.10.1
The pepData object is an MSnSet, a container for the data, features information and experimental annotation. They can be accessed using the accessor functions ‘exprs’ (matrix of intensities, features in rows, samples in columns), ‘fData’ (properties for each feature, peptide or protein, on the rows) and ‘pData’ (properties for the samples on the columns).
We will make use from data wrangling functionalities from the tidyverse package. The %>% operator allows us to pipe the output of one function to the next function.
head(exprs(pepData))
## Intensity.Tconv.M12_2 Intensity.Tconv.M12_3
## AAAAAAAAAAGAAGGR 111470000 101350000
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK 13303000 10160000
## AAAAAAAGGAAIAVSTGIETATIQK 0 0
## AAAAAAGAASGIPGPVAQGIK 0 7197400
## AAAAAAGPEMVR 104440000 101950000
## AAAAAAPGGGGGEPR 11910000 12683000
## Intensity.Tconv.M5_inj1
## AAAAAAAAAAGAAGGR 965070
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK 235500
## AAAAAAAGGAAIAVSTGIETATIQK 48225
## AAAAAAGAASGIPGPVAQGIK 266620
## AAAAAAGPEMVR 947110
## AAAAAAPGGGGGEPR 190550
## Intensity.Tconv.M6_inj1 Intensity.Treg.M12_2
## AAAAAAAAAAGAAGGR 0 80606000
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK 6744500 0
## AAAAAAAGGAAIAVSTGIETATIQK 314120 0
## AAAAAAGAASGIPGPVAQGIK 2904800 12602000
## AAAAAAGPEMVR 12747000 44735000
## AAAAAAPGGGGGEPR 2166100 12555000
## Intensity.Treg.M12_3 Intensity.Treg.M5_inj1
## AAAAAAAAAAGAAGGR 120130000 1668800
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK 14499000 437460
## AAAAAAAGGAAIAVSTGIETATIQK 0 173070
## AAAAAAGAASGIPGPVAQGIK 14162000 520550
## AAAAAAGPEMVR 70512000 1604600
## AAAAAAPGGGGGEPR 13873000 361240
## Intensity.Treg.M6_inj1
## AAAAAAAAAAGAAGGR 0
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK 3881000
## AAAAAAAGGAAIAVSTGIETATIQK 317830
## AAAAAAGAASGIPGPVAQGIK 1608500
## AAAAAAGPEMVR 4689900
## AAAAAAPGGGGGEPR 1168900
pepData %>% exprs %>% head
## Intensity.Tconv.M12_2 Intensity.Tconv.M12_3
## AAAAAAAAAAGAAGGR 111470000 101350000
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK 13303000 10160000
## AAAAAAAGGAAIAVSTGIETATIQK 0 0
## AAAAAAGAASGIPGPVAQGIK 0 7197400
## AAAAAAGPEMVR 104440000 101950000
## AAAAAAPGGGGGEPR 11910000 12683000
## Intensity.Tconv.M5_inj1
## AAAAAAAAAAGAAGGR 965070
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK 235500
## AAAAAAAGGAAIAVSTGIETATIQK 48225
## AAAAAAGAASGIPGPVAQGIK 266620
## AAAAAAGPEMVR 947110
## AAAAAAPGGGGGEPR 190550
## Intensity.Tconv.M6_inj1 Intensity.Treg.M12_2
## AAAAAAAAAAGAAGGR 0 80606000
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK 6744500 0
## AAAAAAAGGAAIAVSTGIETATIQK 314120 0
## AAAAAAGAASGIPGPVAQGIK 2904800 12602000
## AAAAAAGPEMVR 12747000 44735000
## AAAAAAPGGGGGEPR 2166100 12555000
## Intensity.Treg.M12_3 Intensity.Treg.M5_inj1
## AAAAAAAAAAGAAGGR 120130000 1668800
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK 14499000 437460
## AAAAAAAGGAAIAVSTGIETATIQK 0 173070
## AAAAAAGAASGIPGPVAQGIK 14162000 520550
## AAAAAAGPEMVR 70512000 1604600
## AAAAAAPGGGGGEPR 13873000 361240
## Intensity.Treg.M6_inj1
## AAAAAAAAAAGAAGGR 0
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK 3881000
## AAAAAAAGGAAIAVSTGIETATIQK 317830
## AAAAAAGAASGIPGPVAQGIK 1608500
## AAAAAAGPEMVR 4689900
## AAAAAAPGGGGGEPR 1168900
pepData %>% sampleNames
## [1] "Intensity.Tconv.M12_2" "Intensity.Tconv.M12_3"
## [3] "Intensity.Tconv.M5_inj1" "Intensity.Tconv.M6_inj1"
## [5] "Intensity.Treg.M12_2" "Intensity.Treg.M12_3"
## [7] "Intensity.Treg.M5_inj1" "Intensity.Treg.M6_inj1"
The sample names are rather long and contain information on the spike-in concentration and the repeat. We will remove “Intensity.” from the string
sampleNames(pepData) <- pepData %>% sampleNames %>% str_replace(., pattern="Intensity.", replacement="")
Next we will add information on the proteins to the feature data. The feature data of the object pepData contains information for each peptide in the experiment. This info on the proteins can be found in the proteingroups.txt file of maxquant. The peptides.txt file contains many data on each feature (peptide). We will only retain specified columns Protein, Sequence, Reverse and Potential.Contaminant, indicating the protein, peptide sequence, if the peptide is a decoy and if it is a contaminant, respectively.
In older and more recent versions of maxquant the names of the fields might vary. We can therefore explore the names we want to select with the shiny app that can be launched from the selectFeatureData from the MSnbase package, i.e. by running the function without an fcol argument that specifies the columns you would like to select.
The code can be found below, but is not executed by this Rmd script (note, that the flag eval=FALSE has been used as option for the R chunk).
pepData<-selectFeatureData(pepData)
If we know the columns, we can specify them as a vector in the fcol slot.
pepData<-selectFeatureData(pepData,fcol=c("Proteins","Protein.names","Gene.names", "Potential.contaminant","Reverse","Sequence"))
Calculate how many non zero intensities we have per peptide. This will be useful for filtering.
fData(pepData)$nNonZero<- rowSums(exprs(pepData) > 0)
Peptides with zero intensities are missing peptides and should be represent with a NA
value instead of 0
.
exprs(pepData)[exprs(pepData)==0]<-NA
Next we create the data on the experimental layout. We can do this based on the samplenames. We will add an effect for cell type and also for the animal (mouse ID).
pData(pepData)$cellType <- pepData %>% sampleNames %>% substr(1,4) %>% as.factor
pData(pepData)$mouseID <- factor(paste0("m",rep(1:4,2)))
We will also replace the sample names to make them shorter
sampleNames(pepData)<-pepData %>% sampleNames %>% substr(1,4) %>% paste0(.,"_m",rep(1:4,2))
We can inspect the missingness in our data with the plotNA()
function provided with MSnbase
. 38% of all peptide intensities are missing and for some peptides we don’t even measure a signal in any sample. The missingness is similar across samples. Note, that we plot the peptide data, so the label protein in the plot refers to peptides.
plotNA(pepData)
When we look at density plots of the intensities, we see that most intensities are low and that there is a long tail to the right. It is more useful to explore the data upon log transformation.
plotDensities(exprs(pepData))
nrow(pepData)
## [1] 55814
There are 55814 peptides before preprocessing.
We will log transform, normalize, filter and summarize the data.
pepData <- log(pepData, base = 2)
plotDensities(exprs(pepData))
When we look at the densities we see small shifts in location and shape of the density distributions. So normalisation will be required. But, we will first perform filtering.
In our approach a peptide can map to multiple proteins, as long as there is none of these proteins present in a smaller subgroup.
pepData<-pepData[fData(pepData)$Proteins %in% smallestUniqueGroups(fData(pepData)$Proteins),]
We now remove the contaminants, peptides that map to decoy sequences and proteins, which were only identified by peptides with modifications.
pepData <- pepData[fData(pepData)$Reverse!="+",]
pepData <- pepData[fData(pepData)$Potential.contaminant!="+",]
Proteins for which all peptides are carrying modifications (PTMs) can be considered as unreliable. We will filter out these proteins. This information is included in the Only.identified.by.site column of proteinGroups.txt file of maxquant. The variable is containing a “+” character if the protein is only identified by modified peptides and an empty string “” if this is not the case. Sometimes an NA value is also present. We will replace these by the empty character string. The information of the peptides.txt file can be linked to the proteinGroups.txt file by using the Proteins column from the peptides.txt file and the Protein.IDs column in the proteinGroups.txt file. If NAs are included in the
proteinGroups<-read.table(proteinGroupsFile, sep = "\t", header = TRUE, quote = "", comment.char = "")
only_site <- proteinGroups$Only.identified.by.site
only_site[is.na(only_site)] <- ""
proteinsOnlyIdentifiedWithPTMs <- proteinGroups$Protein.IDs[only_site=="+"]
pepData<-pepData[!(fData(pepData)$Proteins %in% proteinsOnlyIdentifiedWithPTMs),]
We will now remove the proteinGroups object from the R session to free memory.
rm(proteinGroups,only_site)
gc()
## used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells 4158129 222.1 8603494 459.5 NA 5452427 291.2
## Vcells 8237701 62.9 24434430 186.5 16384 24430597 186.4
We want to keep peptide that were at least observed twice.
pepData<-pepData[fData(pepData)$nNonZero>=2,]
nrow(pepData)
## [1] 44396
We keep 44396 peptides upon filtering.
pepData <- normalise(pepData, "quantiles")
Upon normalisation the density curves for all samples coincide.
plotDensities(exprs(pepData))
We can visualize our data using a Multi Dimensional Scaling plot, eg. as provided by the limma package.
plotMDS(exprs(pepData),col=as.double(pData(pepData)$cellType))
The first axis in the plot is showing the leading log fold changes (differences on the log scale) between the samples. We notice that the leading differences (log FC) in the peptide data seems to be driven by technical variability. Indeed the samples do not seem to be clearly separated according to the spike in condition. Because there are missing values for the peptide level data we use the na.rm=TRUE
argument to summarize the data based on the observed peptides for every protein.
protData<-combineFeatures(pepData,fcol="Proteins",method="robust",na.rm=TRUE)
## Your data contains missing values. Please read the relevant section
## in the combineFeatures manual page for details the effects of
## missing values on data aggregation.
If you summarize using robust summarization please refer to Sticker et al. (2019) https://www.biorxiv.org/content/10.1101/668863v1
We filter proteins for which we have at least two protein expression values in both conditions:
protData<-protData[
protData %>% exprs %>% is.na %>% `!` %>% rowSums>4,]
plotMDS(exprs(protData),col=as.double(pData(pepData)$cellType))
MSqRob is currently working with a format where we have one dataframe for each protein. This will be changed in the next release. Therefore we first have to reorganise the data.
Next the models are fitted. This is done using the fit.model function. We only have to model the data using the factor condition from the pheno data of the protein level MSnSet. The name of the factor variable is specified in the fixed argument (if multiple predictors have to be incorporated in the model, a vector of variable names has to be provided in this argument.). The argument shrinkage is used to specify if ridge regression has to be adopted. For the sake of speed we do not do this in the tutorial. The shrinkage has to be specified for each variable in the fixed effects. We also have to indicate this for the intercept (which we never shrink). So we specify it at c(0,0) to indicate that the intercept (first 0) and the parameters for the factor condition (second 0) are not penalized. We set the robust_var function equal to FALSE, this functionality will be removed from the package in the next release.
protMSqRob <- MSnSet2protdata(protData, "Proteins",annotations=c("Protein.names","Gene.names"))
models <- fit.model(protdata=protMSqRob, response="quant_value", fixed=c("cellType","mouseID"),shrinkage.fixed=c(0,0),robust_var=FALSE)
Many biologists have problems with the reference coding. In MSqRob we have opted to formulate contrasts using all levels of a factor. Internally, the contrasts are than recasted according to the factor level that is the reference class.
L <- makeContrast("cellTypeTreg - cellTypeTcon", levels=c("cellTypeTreg","cellTypeTcon"))
tests <- test.contrast_adjust(models, L)
nSig <- sum(tests$signif,na.rm=TRUE)
head(tests,nSig)
## Protein.names
## P24452 Macrophage-capping protein
## P21550 Beta-enolase
## P08207 Protein S100-A10
## O09131 Glutathione S-transferase omega-1
## Q7TPR4 Alpha-actinin-1
## Q9D3P8 Plasminogen receptor (KT)
## Q8BGW0 Protein THEMIS
## Q80SU7 Interferon-induced very large GTPase 1
## P13020 Gelsolin
## P07356 Annexin A2
## Q8BFP9 [Pyruvate dehydrogenase (acetyl-transferring)] kinase isozyme 1, mitochondrial
## Q9JJU8 SH3 domain-binding glutamic acid-rich-like protein
## Q9QXY6 EH domain-containing protein 3
## Q61205 Platelet-activating factor acetylhydrolase IB subunit gamma
## Q9CYL5 Golgi-associated plant pathogenesis-related protein 1
## Q61503 5-nucleotidase
## P07091 Protein S100-A4
## Q9DC16 Endoplasmic reticulum-Golgi intermediate compartment protein 1
## P07742 Ribonucleoside-diphosphate reductase large subunit
## Q99N69 Leupaxin
## P54071 Isocitrate dehydrogenase [NADP], mitochondrial
## P16045 Galectin-1
## Q60611 DNA-binding protein SATB1
## Q9R1Q7 Proteolipid protein 2
## P70677 Caspase-3;Caspase-3 subunit p17;Caspase-3 subunit p12
## P30285 Cyclin-dependent kinase 4
## Q9QYB5 Gamma-adducin
## Q4QQM4 Tumor protein p53-inducible protein 11
## Q9WUU8 TNFAIP3-interacting protein 1
## Q3UW53 Protein Niban
## P09055 Integrin beta-1
## Q8BGC4 Zinc-binding alcohol dehydrogenase domain-containing protein 2
## Q64521 Glycerol-3-phosphate dehydrogenase, mitochondrial
## Q60710 Deoxynucleoside triphosphate triphosphohydrolase SAMHD1
## P57016 Ladinin-1
## Q03267 DNA-binding protein Ikaros
## Q8CG47 Structural maintenance of chromosomes protein 4
## P42230 Signal transducer and activator of transcription 5A
## P15307 Proto-oncogene c-Rel
## Q9QXG4 Acetyl-coenzyme A synthetase, cytoplasmic
## Q9JMH9 Unconventional myosin-XVIIIa
## P37913 DNA ligase 1
## Q00417 Transcription factor 7
## Q99KE1 NAD-dependent malic enzyme, mitochondrial
## Q8VCT3 Aminopeptidase B
## Q6PD03 Serine/threonine-protein phosphatase 2A 56 kDa regulatory subunit alpha isoform
## P48758 Carbonyl reductase [NADPH] 1
## P45377 Aldose reductase-related protein 2
## P49718 DNA replication licensing factor MCM5
## O70404 Vesicle-associated membrane protein 8
## Q9CQ62 2,4-dienoyl-CoA reductase, mitochondrial
## O88508 DNA (cytosine-5)-methyltransferase 3A
## P29452 Caspase-1;Caspase-1 subunit p20;Caspase-1 subunit p10
## P70236 Dual specificity mitogen-activated protein kinase kinase 6
## Q9Z0S1 3(2),5-bisphosphate nucleotidase 1
## P16546 Spectrin alpha chain, non-erythrocytic 1
## Q8VCW8 Acyl-CoA synthetase family member 2, mitochondrial
## Q9WTK5 Nuclear factor NF-kappa-B p100 subunit;Nuclear factor NF-kappa-B p52 subunit
## O55101 Synaptogyrin-2
## P20444 Protein kinase C alpha type
## Q8CFB4 Guanylate-binding protein 5
## O70400 PDZ and LIM domain protein 1
## Q8BP56 Acid trehalase-like protein 1
## P19182 Interferon-related developmental regulator 1
## Q62261 Spectrin beta chain, non-erythrocytic 1
## Q99MN9 Propionyl-CoA carboxylase beta chain, mitochondrial
## Q8C0L6 Peroxisomal N(1)-acetyl-spermine/spermidine oxidase
## P70302 Stromal interaction molecule 1
## Q8C142 Low density lipoprotein receptor adapter protein 1
## Q8K157 Aldose 1-epimerase
## P00329 Alcohol dehydrogenase 1
## Q9QYC0 Alpha-adducin
## Q8BMD8 Calcium-binding mitochondrial carrier protein SCaMC-1
## Q80U28 MAP kinase-activating death domain protein
## Q04447 Creatine kinase B-type
## P47856 Glutamine--fructose-6-phosphate aminotransferase [isomerizing] 1
## O88673 Diacylglycerol kinase alpha
## Q5FWK3 Rho GTPase-activating protein 1
## P83093 Stromal interaction molecule 2
## P29391 Ferritin light chain 1
## Q05186 Reticulocalbin-1
## Q922J3 CAP-Gly domain-containing linker protein 1
## Q8BIG7 Catechol O-methyltransferase domain-containing protein 1
## P25799 Nuclear factor NF-kappa-B p105 subunit;Nuclear factor NF-kappa-B p50 subunit
## Q8BH59 Calcium-binding mitochondrial carrier protein Aralar1
## Q8K296 Myotubularin-related protein 3
## P97310 DNA replication licensing factor MCM2
## Q3TBT3 Stimulator of interferon genes protein
## Q62422 Osteoclast-stimulating factor 1
## P50431 Serine hydroxymethyltransferase, cytosolic
## Q8R4N0 Citrate lyase subunit beta-like protein, mitochondrial
## P14094 Sodium/potassium-transporting ATPase subunit beta-1
## Q8CFK6 DENN domain-containing protein 1C
## P50096 Inosine-5-monophosphate dehydrogenase 1
## Q91VV4 DENN domain-containing protein 2D
## P61028 Ras-related protein Rab-8B
## O08807 Peroxiredoxin-4
## O70293 G protein-coupled receptor kinase 6
## Q8BUV3 Gephyrin;Molybdopterin adenylyltransferase;Molybdopterin molybdenumtransferase
## P54310 Hormone-sensitive lipase
## Q9QUG9 RAS guanyl-releasing protein 2
## Q9Z2L7 Cytokine receptor-like factor 3
## Q80XN0 D-beta-hydroxybutyrate dehydrogenase, mitochondrial
## Q61107 Guanylate-binding protein 4
## O70172 Phosphatidylinositol 5-phosphate 4-kinase type-2 alpha
## Q9JIY5 Serine protease HTRA2, mitochondrial
## O70370 Cathepsin S
## Q3UDE2 Tubulin--tyrosine ligase-like protein 12
## P56395 Cytochrome b5
## Q6Q899 Probable ATP-dependent RNA helicase DDX58
## Q9WU84 Copper chaperone for superoxide dismutase
## P18654 Ribosomal protein S6 kinase alpha-3
## Q3UN02 Lysocardiolipin acyltransferase 1
## Q8QZT1 Acetyl-CoA acetyltransferase, mitochondrial
## Q9DBX6 Cytochrome P450 2S1
## Q7TNG5 Echinoderm microtubule-associated protein-like 2
## Q61462 Cytochrome b-245 light chain
## P54227 Stathmin
## Q8R001 Microtubule-associated protein RP/EB family member 2
## Q6P9R4 Rho guanine nucleotide exchange factor 18
## Q8R502 Volume-regulated anion channel subunit LRRC8C
## C3VPR6 Protein NLRC5
## P28867 Protein kinase C delta type;Protein kinase C delta type regulatory subunit;Protein kinase C delta type catalytic subunit
## P49717 DNA replication licensing factor MCM4
## Q68FL4 Putative adenosylhomocysteinase 3
## P51855 Glutathione synthetase
## Q9CXJ1 Probable glutamate--tRNA ligase, mitochondrial
## Q99ME2 WD repeat-containing protein 6
## P97311 DNA replication licensing factor MCM6
## P10630 Eukaryotic initiation factor 4A-II
## Q921I1 Serotransferrin
## Q9D0K2 Succinyl-CoA:3-ketoacid coenzyme A transferase 1, mitochondrial
## Gene.names estimate se df Tval pval
## P24452 Capg 2.8383340 0.1076143 5.549307 26.375060 4.790894e-07
## P21550 Eno3 2.6243616 0.1328974 5.549307 19.747280 2.351907e-06
## P08207 S100a10 2.6113767 0.1642877 5.549307 15.895148 7.699407e-06
## O09131 Gsto1 3.1603951 0.1446574 4.549307 21.847456 8.802266e-06
## Q7TPR4 Actn1 -1.5843624 0.1132674 5.549307 -13.987806 1.541693e-05
## Q9D3P8 Plgrkt 1.5430021 0.1143314 5.549307 13.495873 1.871294e-05
## Q8BGW0 Themis -2.0237546 0.1537470 5.549307 -13.162886 2.141744e-05
## Q80SU7 Gvin1 1.4842622 0.1150625 5.549307 12.899614 2.388478e-05
## P13020 Gsn -1.6201620 0.1268165 5.549307 -12.775641 2.516151e-05
## P07356 Anxa2 2.2241620 0.1826479 5.549307 12.177319 3.257437e-05
## Q8BFP9 Pdk1 -1.5355360 0.1304300 5.549307 -11.772878 3.905206e-05
## Q9JJU8 Sh3bgrl 1.5178392 0.1289359 5.549307 11.772047 3.906684e-05
## Q9QXY6 Ehd3 -1.2763696 0.1104727 5.549307 -11.553709 4.319080e-05
## Q61205 Pafah1b3 1.2694876 0.1117895 5.549307 11.356059 4.737080e-05
## Q9CYL5 Glipr2 1.1747199 0.1060810 5.549307 11.073802 5.419292e-05
## Q61503 Nt5e 2.6069756 0.2385794 5.549307 10.927079 5.819204e-05
## P07091 S100a4 4.8972495 0.3441184 4.549307 14.231293 6.036953e-05
## Q9DC16 Ergic1 2.2588641 0.2183354 5.549307 10.345842 7.784621e-05
## P07742 Rrm1 1.1504309 0.1155599 5.549307 9.955273 9.546869e-05
## Q99N69 Lpxn 1.3028427 0.1309800 5.549307 9.946883 9.589571e-05
## P54071 Idh2 -1.3555077 0.1391533 5.549307 -9.741108 1.071046e-04
## P16045 Lgals1 1.5402209 0.1611947 5.549307 9.555032 1.185817e-04
## Q60611 Satb1 -1.7864732 0.1922832 5.549307 -9.290844 1.374473e-04
## Q9R1Q7 Plp2 1.7130527 0.1852884 5.549307 9.245332 1.410415e-04
## P70677 Casp3 1.5191831 0.1650921 5.549307 9.202034 1.445633e-04
## P30285 Cdk4 1.0762549 0.1204506 5.549307 8.935239 1.686818e-04
## Q9QYB5 Add3 -0.9784364 0.1107801 5.549307 -8.832240 1.792319e-04
## Q4QQM4 Trp53i11 1.2595495 0.1427441 5.549307 8.823827 1.801272e-04
## Q9WUU8 Tnip1 0.9415172 0.1083451 5.549307 8.689981 1.950992e-04
## Q3UW53 Fam129a 3.0426788 0.3520071 5.549307 8.643799 2.005993e-04
## P09055 Itgb1 1.5348990 0.1448128 4.549307 10.599192 2.230232e-04
## Q8BGC4 Zadh2 -0.9068988 0.1075316 5.549307 -8.433785 2.280144e-04
## Q64521 Gpd2 -0.9872796 0.1189136 5.549307 -8.302494 2.473780e-04
## Q60710 Samhd1 0.9902074 0.1217576 5.549307 8.132616 2.753559e-04
## P57016 Lad1 2.6804451 0.2653725 4.549307 10.100690 2.755727e-04
## Q03267 Ikzf1 -1.0968334 0.1355547 5.549307 -8.091443 2.826828e-04
## Q8CG47 Smc4 1.2206429 0.1510405 5.549307 8.081560 2.844754e-04
## P42230 Stat5a 0.9271934 0.1157509 5.549307 8.010251 2.978091e-04
## P15307 Rel 1.1098621 0.1390703 5.549307 7.980583 3.035704e-04
## Q9QXG4 Acss2 -1.2122020 0.1530582 5.549307 -7.919878 3.157694e-04
## Q9JMH9 Myo18a 0.8297900 0.1053944 5.549307 7.873185 3.255427e-04
## P37913 Lig1 1.5605331 0.2015234 5.549307 7.743680 3.545519e-04
## Q00417 Tcf7 -1.5769633 0.2043234 5.549307 -7.717978 3.606619e-04
## Q99KE1 Me2 -0.8654281 0.1127037 5.549307 -7.678791 3.702155e-04
## Q8VCT3 Rnpep 0.8396310 0.1098586 5.549307 7.642835 3.792419e-04
## Q6PD03 Ppp2r5a -1.0306822 0.1367435 5.549307 -7.537342 4.072427e-04
## P48758 Cbr1 -0.9314375 0.1245758 5.549307 -7.476871 4.243779e-04
## P45377 Akr1b8 1.2163102 0.1633175 5.549307 7.447517 4.329979e-04
## P49718 Mcm5 0.9299717 0.1248775 5.549307 7.447074 4.331296e-04
## O70404 Vamp8 0.9619197 0.1295017 5.549307 7.427853 4.388865e-04
## Q9CQ62 Decr1 1.4516462 0.1960095 5.549307 7.405997 4.455412e-04
## O88508 Dnmt3a 1.1899880 0.1612455 5.549307 7.379978 4.536174e-04
## P29452 Casp1 1.8363470 0.2066048 4.549307 8.888212 4.814175e-04
## P70236 Map2k6 -0.7930610 0.1088488 5.549307 -7.285898 4.842772e-04
## Q9Z0S1 Bpnt1 -1.0172732 0.1411493 5.549307 -7.207070 5.118361e-04
## P16546 Sptan1 -0.7807077 0.1086930 5.549307 -7.182685 5.207278e-04
## Q8VCW8 Acsf2 -0.9994857 0.1399699 5.549307 -7.140719 5.364552e-04
## Q9WTK5 Nfkb2 0.9884250 0.1388409 5.549307 7.119120 5.447650e-04
## O55101 Syngr2 1.1233790 0.1586745 5.549307 7.079772 5.602916e-04
## P20444 Prkca 1.1677163 0.1653710 5.549307 7.061194 5.678008e-04
## Q8CFB4 Gbp5 1.5198757 0.2170343 5.549307 7.002928 5.921227e-04
## O70400 Pdlim1 -1.2582275 0.1805901 5.549307 -6.967313 6.075865e-04
## Q8BP56 Athl1 0.7428528 0.1066601 5.549307 6.964673 6.087509e-04
## P19182 Ifrd1 -0.9693650 0.1397040 5.549307 -6.938707 6.203473e-04
## Q62261 Sptbn1 -0.8185595 0.1185810 5.549307 -6.902954 6.367371e-04
## Q99MN9 Pccb -0.7485517 0.1088581 5.549307 -6.876397 6.492377e-04
## Q8C0L6 Paox -1.0745498 0.1296667 4.549307 -8.287013 6.517121e-04
## P70302 Stim1 -0.9557104 0.1395146 5.549307 -6.850253 6.618234e-04
## Q8C142 Ldlrap1 -1.2298925 0.1820952 5.549307 -6.754118 7.106062e-04
## Q8K157 Galm 1.3399325 0.1985987 5.549307 6.746936 7.144150e-04
## P00329 Adh1 -1.4507621 0.2150497 5.549307 -6.746170 7.148228e-04
## Q9QYC0 Add1 -0.8166762 0.1212541 5.549307 -6.735247 7.206655e-04
## Q8BMD8 Slc25a24 0.9945078 0.1478739 5.549307 6.725378 7.259923e-04
## Q80U28 Madd -0.9075153 0.1358187 5.549307 -6.681815 7.500589e-04
## Q04447 Ckb -1.6509200 0.2479997 5.549307 -6.656944 7.642142e-04
## P47856 Gfpt1 0.6698891 0.1012165 5.549307 6.618379 7.867809e-04
## O88673 Dgka -0.8466005 0.1279824 5.549307 -6.614973 7.888111e-04
## Q5FWK3 Arhgap1 -0.7072659 0.1069889 5.549307 -6.610646 7.913990e-04
## P83093 Stim2 1.0711944 0.1627140 5.549307 6.583295 8.079859e-04
## P29391 Ftl1 0.8231906 0.1251454 5.549307 6.577874 8.113210e-04
## Q05186 Rcn1 1.6357822 0.2077568 4.549307 7.873545 8.120381e-04
## Q922J3 Clip1 -0.6976948 0.1065395 5.549307 -6.548695 8.295511e-04
## Q8BIG7 Comtd1 0.9264415 0.1416112 5.549307 6.542150 8.337049e-04
## P25799 Nfkb1 0.6929377 0.1067542 5.549307 6.490963 8.670386e-04
## Q8BH59 Slc25a12 -0.7292302 0.1123708 5.549307 -6.489500 8.680135e-04
## Q8K296 Mtmr3 0.9333997 0.1446205 5.549307 6.454130 8.919827e-04
## P97310 Mcm2 0.8293663 0.1291465 5.549307 6.421901 9.144945e-04
## Q3TBT3 Tmem173 0.7944146 0.1249954 5.549307 6.355552 9.629460e-04
## Q62422 Ostf1 -0.8096174 0.1281406 5.549307 -6.318196 9.915334e-04
## P50431 Shmt1 0.7224515 0.1150009 5.549307 6.282138 1.020064e-03
## Q8R4N0 Clybl -1.5088235 0.2023098 4.549307 -7.457984 1.023671e-03
## P14094 Atp1b1 -0.9954441 0.1587919 5.549307 -6.268860 1.030809e-03
## Q8CFK6 Dennd1c 0.9327610 0.1490587 5.549307 6.257678 1.039960e-03
## P50096 Impdh1 -0.6924848 0.1107307 5.549307 -6.253776 1.043175e-03
## Q91VV4 Dennd2d -1.0439585 0.1674659 5.549307 -6.233858 1.059769e-03
## P61028 Rab8b 0.9333800 0.1504512 5.549307 6.203871 1.085331e-03
## O08807 Prdx4 1.1311052 0.1824665 5.549307 6.198975 1.089573e-03
## O70293 Grk6 -0.8775149 0.1421745 5.549307 -6.172098 1.113199e-03
## Q8BUV3 Gphn 1.3092097 0.1796337 4.549307 7.288220 1.128931e-03
## P54310 Lipe 1.1598976 0.1594427 4.549307 7.274701 1.137861e-03
## Q9QUG9 Rasgrp2 -0.9495821 0.1545883 5.549307 -6.142650 1.139771e-03
## Q9Z2L7 Crlf3 -0.7126385 0.1163220 5.549307 -6.126430 1.154720e-03
## Q80XN0 Bdh1 -0.7489501 0.1223013 5.549307 -6.123811 1.157154e-03
## Q61107 Gbp4 1.9964914 0.3263826 5.549307 6.117028 1.163489e-03
## O70172 Pip4k2a -0.6864011 0.1122274 5.549307 -6.116166 1.164297e-03
## Q9JIY5 Htra2 0.8426850 0.1382621 5.549307 6.094836 1.184495e-03
## O70370 Ctss 1.0286773 0.1700837 5.549307 6.048065 1.230220e-03
## Q3UDE2 Ttll12 -0.8441482 0.1396677 5.549307 -6.043976 1.234314e-03
## P56395 Cyb5a 0.7798652 0.1293522 5.549307 6.029005 1.249438e-03
## Q6Q899 Ddx58 -0.7756916 0.1294978 5.549307 -5.989998 1.289862e-03
## Q9WU84 Ccs 0.9903560 0.1671171 5.549307 5.926119 1.359376e-03
## P18654 Rps6ka3 -0.7744090 0.1306868 5.549307 -5.925688 1.359859e-03
## Q3UN02 Lclat1 2.1035903 0.3020149 4.549307 6.965187 1.367649e-03
## Q8QZT1 Acat1 -0.6469405 0.1098099 5.549307 -5.891457 1.398905e-03
## Q9DBX6 Cyp2s1 -1.9296569 0.2125997 3.549307 -9.076481 1.400696e-03
## Q7TNG5 Eml2 0.6824168 0.1164865 5.549307 5.858333 1.437927e-03
## Q61462 Cyba 1.5971867 0.1776290 3.549307 8.991699 1.446487e-03
## P54227 Stmn1 1.2559476 0.2155718 5.549307 5.826122 1.477085e-03
## Q8R001 Mapre2 0.7542737 0.1296315 5.549307 5.818597 1.486409e-03
## Q6P9R4 Arhgef18 -0.6849842 0.1188530 5.549307 -5.763290 1.557071e-03
## Q8R502 Lrrc8c 0.7556299 0.1311365 5.549307 5.762165 1.558549e-03
## C3VPR6 Nlrc5 1.4753272 0.2573410 5.549307 5.732965 1.597459e-03
## P28867 Prkcd 0.6725316 0.1173186 5.549307 5.732524 1.598056e-03
## P49717 Mcm4 0.7159698 0.1253413 5.549307 5.712163 1.625865e-03
## Q68FL4 Ahcyl2 0.9211146 0.1613346 5.549307 5.709344 1.629758e-03
## P51855 Gss -0.7912540 0.1386555 5.549307 -5.706617 1.633535e-03
## Q9CXJ1 Ears2 -1.4370819 0.2524433 5.549307 -5.692693 1.652981e-03
## Q99ME2 Wdr6 0.7189244 0.1265785 5.549307 5.679673 1.671404e-03
## P97311 Mcm6 0.7420898 0.1307145 5.549307 5.677180 1.674960e-03
## P10630 Eif4a2 -0.9235078 0.1634430 5.549307 -5.650335 1.713797e-03
## Q921I1 Tf -1.7166178 0.2007176 3.549307 -8.552402 1.716823e-03
## Q9D0K2 Oxct1 -0.7469502 0.1323772 5.549307 -5.642590 1.725194e-03
## qval signif
## P24452 0.001828205 TRUE
## P21550 0.004487439 TRUE
## P08207 0.008397362 TRUE
## O09131 0.008397362 TRUE
## Q7TPR4 0.010668481 TRUE
## Q9D3P8 0.010668481 TRUE
## Q8BGW0 0.010668481 TRUE
## Q80SU7 0.010668481 TRUE
## P13020 0.010668481 TRUE
## P07356 0.012423255 TRUE
## Q8BFP9 0.012423255 TRUE
## Q9JJU8 0.012423255 TRUE
## Q9QXY6 0.012678160 TRUE
## Q61205 0.012911928 TRUE
## Q9CYL5 0.013551183 TRUE
## Q61503 0.013551183 TRUE
## P07091 0.013551183 TRUE
## Q9DC16 0.016503397 TRUE
## P07742 0.018296902 TRUE
## Q99N69 0.018296902 TRUE
## P54071 0.019462434 TRUE
## P16045 0.020568528 TRUE
## Q60611 0.022066135 TRUE
## Q9R1Q7 0.022066135 TRUE
## P70677 0.022066135 TRUE
## P30285 0.024548759 TRUE
## Q9QYB5 0.024548759 TRUE
## Q4QQM4 0.024548759 TRUE
## Q9WUU8 0.025516228 TRUE
## Q3UW53 0.025516228 TRUE
## P09055 0.027190721 TRUE
## Q8BGC4 0.027190721 TRUE
## Q64521 0.028605891 TRUE
## Q60710 0.029339404 TRUE
## P57016 0.029339404 TRUE
## Q03267 0.029339404 TRUE
## Q8CG47 0.029339404 TRUE
## P42230 0.029703195 TRUE
## P15307 0.029703195 TRUE
## Q9QXG4 0.030124402 TRUE
## Q9JMH9 0.030299293 TRUE
## P37913 0.032006650 TRUE
## Q00417 0.032006650 TRUE
## Q99KE1 0.032107785 TRUE
## Q8VCT3 0.032159712 TRUE
## Q6PD03 0.033288539 TRUE
## P48758 0.033288539 TRUE
## P45377 0.033288539 TRUE
## P49718 0.033288539 TRUE
## O70404 0.033288539 TRUE
## Q9CQ62 0.033288539 TRUE
## O88508 0.033288539 TRUE
## P29452 0.034222252 TRUE
## P70236 0.034222252 TRUE
## Q9Z0S1 0.035483880 TRUE
## P16546 0.035483880 TRUE
## Q8VCW8 0.035841781 TRUE
## Q9WTK5 0.035841781 TRUE
## O55101 0.036112133 TRUE
## P20444 0.036112133 TRUE
## Q8CFB4 0.036872909 TRUE
## O70400 0.036872909 TRUE
## Q8BP56 0.036872909 TRUE
## P19182 0.036988210 TRUE
## Q62261 0.037118407 TRUE
## Q99MN9 0.037118407 TRUE
## Q8C0L6 0.037118407 TRUE
## P70302 0.037139973 TRUE
## Q8C142 0.037950502 TRUE
## Q8K157 0.037950502 TRUE
## P00329 0.037950502 TRUE
## Q9QYC0 0.037950502 TRUE
## Q8BMD8 0.037950502 TRUE
## Q80U28 0.038256016 TRUE
## Q04447 0.038256016 TRUE
## P47856 0.038256016 TRUE
## O88673 0.038256016 TRUE
## Q5FWK3 0.038256016 TRUE
## P83093 0.038256016 TRUE
## P29391 0.038256016 TRUE
## Q05186 0.038256016 TRUE
## Q922J3 0.038330335 TRUE
## Q8BIG7 0.038330335 TRUE
## P25799 0.038968701 TRUE
## Q8BH59 0.038968701 TRUE
## Q8K296 0.039579138 TRUE
## P97310 0.040111620 TRUE
## Q3TBT3 0.041756840 TRUE
## Q62422 0.042313879 TRUE
## P50431 0.042313879 TRUE
## Q8R4N0 0.042313879 TRUE
## P14094 0.042313879 TRUE
## Q8CFK6 0.042313879 TRUE
## P50096 0.042313879 TRUE
## Q91VV4 0.042313879 TRUE
## P61028 0.042313879 TRUE
## O08807 0.042313879 TRUE
## O70293 0.042313879 TRUE
## Q8BUV3 0.042313879 TRUE
## P54310 0.042313879 TRUE
## Q9QUG9 0.042313879 TRUE
## Q9Z2L7 0.042313879 TRUE
## Q80XN0 0.042313879 TRUE
## Q61107 0.042313879 TRUE
## O70172 0.042313879 TRUE
## Q9JIY5 0.042641826 TRUE
## O70370 0.043612434 TRUE
## Q3UDE2 0.043612434 TRUE
## P56395 0.043741794 TRUE
## Q6Q899 0.044746499 TRUE
## Q9WU84 0.046185377 TRUE
## P18654 0.046185377 TRUE
## Q3UN02 0.046185377 TRUE
## Q8QZT1 0.046478734 TRUE
## Q9DBX6 0.046478734 TRUE
## Q7TNG5 0.047177737 TRUE
## Q61462 0.047177737 TRUE
## P54227 0.047665023 TRUE
## Q8R001 0.047665023 TRUE
## Q6P9R4 0.049152247 TRUE
## Q8R502 0.049152247 TRUE
## C3VPR6 0.049472783 TRUE
## P28867 0.049472783 TRUE
## P49717 0.049472783 TRUE
## Q68FL4 0.049472783 TRUE
## P51855 0.049472783 TRUE
## Q9CXJ1 0.049547668 TRUE
## Q99ME2 0.049547668 TRUE
## P97311 0.049547668 TRUE
## P10630 0.049873794 TRUE
## Q921I1 0.049873794 TRUE
## Q9D0K2 0.049873794 TRUE
There are 132 proteins with a significant effect at the 5% FDR level.
volcano<- ggplot(tests,aes(x=estimate,y=-log10(pval),color=signif)) + geom_point() + scale_color_manual(values=c("black","red"))
volcano
An interactive volcano plot can be obtained by using the plotly library. We alter the object tests to add the protein name to the data frame in ggplot, this will allow us to see the protein name in the interactive plot.
library(plotly)
volcano<- ggplot(tests %>% rownames_to_column("protein"), aes(x=estimate,y=-log10(pval),color=signif,protein=protein,annotation=Protein.names,gene=Gene.names)) + geom_point() + scale_color_manual(values=c("black","red"))
ggplotly(volcano)
There are no significant proteins so we will plot a heatmap of the top 10 proteins, instead. Note, that these results are not statistically significant.
sigNames<- tests %>% rownames_to_column("protein") %>% filter(signif) %>% pull(protein)
heatmap(exprs(protData)[sigNames,])
We first extract the normalized protein expression values for a particular protein and we create a dataframe that contains the protein expression values and the relevant variables of the experimental design.
for (protName in rownames(tests)[1:5])
{
plotDat<-pepData[fData(pepData)$Proteins==protName] %>% exprs
plotDataStack<-data.frame(quant_value=c(plotDat),sequence=rep(rownames(plotDat),rep=ncol(plotDat)),sample=rep(colnames(plotDat),each=nrow(plotDat)),condition=rep(pData(pepData)$cellType,each=nrow(plotDat)))
plot1 <- ggplot(plotDataStack, aes(x=sample, y=quant_value,fill=condition))
print(plot1 +geom_boxplot(outlier.shape=NA) + geom_point(position=position_jitter(width=.1),aes(shape=sequence)) + scale_shape_manual(values=1:nrow(plotDat)) +labs(title = protName, x="sample", y="Peptide intensity (log2)"))
}