Contents

1 Background

Duguet et al. 2017 compared the proteomes of mouse regulatory T cells (Treg) and conventional T cells (Tconv) in order to discover differentially regulated proteins between these two cell populations. For each biological repeat the proteomes were extracted for both Treg and Tconv cell pools, which were purified by flow cytometry. The data in data/quantification/mouseTcell on the pdaData repository are a subset of the data PXD004436 on PRIDE.

We first analyse a subset of the data that have a completely randomized design, i.e. 4 mice per group for the first group only regulatory T cells are assessed for the second group only conventional T cells are assessed.

2 Data

We first import the peptides.txt file. This is the file that contains your peptide-level intensities. For a MaxQuant search [6], this peptides.txt file can be found by default in the “path_to_raw_files/combined/txt/” folder from the MaxQuant output, with “path_to_raw_files” the folder where raw files were saved. In this tutorial, we will use a MaxQuant peptides file from MaxQuant that can be found on the pdaData repository. We will use the MSnbase package to import the data.

We generate the object peptideFile with the path to the peptides.txt file. In this file we will fetch the data from the github repository linked to the pda course: https://statomics.github.io/pda. You can also replace the peptideFile with a string that points to the path of a file on your local hard drive.

With the grepEcols function we find the columns that are containing the expression data of the peptides in the peptides.txt file.

library(MSqRob)
library(MSnbase)
library(tidyverse)
library(limma)
peptidesFile <- "https://raw.githubusercontent.com/statOmics/pda/data/quantification/mouseTcell/peptidesCRD.txt"
proteinGroupsFile <-"https://raw.githubusercontent.com/statOmics/pda/data/quantification/mouseTcell/proteinGroups.txt"
ecols<-grepEcols(peptidesFile, "Intensity ", split = "\t")

Next we import the peptide intensities

pepData<-readMSnSet2(peptidesFile,ecol=ecols,fnames="Sequence",sep="\t")
pepData
## MSnSet (storageMode: lockedEnvironment)
## assayData: 55814 features, 8 samples 
##   element names: exprs 
## protocolData: none
## phenoData: none
## featureData
##   featureNames: AAAAAAAAAAGAAGGR AAAAAAAAAAGDSDSWDADTFSMEDPVRK ...
##     YYYDKNIIHK (55814 total)
##   fvarLabels: Sequence N.term.cleavage.window ... MS.MS.Count (74
##     total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:  
## - - - Processing information - - -
##  MSnbase version: 2.10.1

The pepData object is an MSnSet, a container for the data, features information and experimental annotation. They can be accessed using the accessor functions ‘exprs’ (matrix of intensities, features in rows, samples in columns), ‘fData’ (properties for each feature, peptide or protein, on the rows) and ‘pData’ (properties for the samples on the columns).

We will make use from data wrangling functionalities from the tidyverse package. The %>% operator allows us to pipe the output of one function to the next function.

head(exprs(pepData))
##                               Intensity.Tconv.M11_R2_inj1
## AAAAAAAAAAGAAGGR                                 47982000
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK                           0
## AAAAAAAGGAAIAVSTGIETATIQK                               0
## AAAAAAGAASGIPGPVAQGIK                                   0
## AAAAAAGPEMVR                                     63831000
## AAAAAAPGGGGGEPR                                         0
##                               Intensity.Tconv.M12_1 Intensity.Tconv.M12_2
## AAAAAAAAAAGAAGGR                          171220000             111470000
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK              22907000              13303000
## AAAAAAAGGAAIAVSTGIETATIQK                         0                     0
## AAAAAAGAASGIPGPVAQGIK                      18284000                     0
## AAAAAAGPEMVR                                      0             104440000
## AAAAAAPGGGGGEPR                            25372000              11910000
##                               Intensity.Tconv.M6_inj1
## AAAAAAAAAAGAAGGR                                    0
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK                 6744500
## AAAAAAAGGAAIAVSTGIETATIQK                      314120
## AAAAAAGAASGIPGPVAQGIK                         2904800
## AAAAAAGPEMVR                                 12747000
## AAAAAAPGGGGGEPR                               2166100
##                               Intensity.Treg.M11_R1_inj1
## AAAAAAAAAAGAAGGR                                52952000
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK                          0
## AAAAAAAGGAAIAVSTGIETATIQK                              0
## AAAAAAGAASGIPGPVAQGIK                           10387000
## AAAAAAGPEMVR                                    21421000
## AAAAAAPGGGGGEPR                                        0
##                               Intensity.Treg.M12_3 Intensity.Treg.M5_inj1
## AAAAAAAAAAGAAGGR                         120130000                1668800
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK             14499000                 437460
## AAAAAAAGGAAIAVSTGIETATIQK                        0                 173070
## AAAAAAGAASGIPGPVAQGIK                     14162000                 520550
## AAAAAAGPEMVR                              70512000                1604600
## AAAAAAPGGGGGEPR                           13873000                 361240
##                               Intensity.Treg.M5_inj2
## AAAAAAAAAAGAAGGR                             4730400
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK                      0
## AAAAAAAGGAAIAVSTGIETATIQK                     382430
## AAAAAAGAASGIPGPVAQGIK                        1090900
## AAAAAAGPEMVR                                 3327900
## AAAAAAPGGGGGEPR                               665860
pepData %>% exprs %>% head
##                               Intensity.Tconv.M11_R2_inj1
## AAAAAAAAAAGAAGGR                                 47982000
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK                           0
## AAAAAAAGGAAIAVSTGIETATIQK                               0
## AAAAAAGAASGIPGPVAQGIK                                   0
## AAAAAAGPEMVR                                     63831000
## AAAAAAPGGGGGEPR                                         0
##                               Intensity.Tconv.M12_1 Intensity.Tconv.M12_2
## AAAAAAAAAAGAAGGR                          171220000             111470000
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK              22907000              13303000
## AAAAAAAGGAAIAVSTGIETATIQK                         0                     0
## AAAAAAGAASGIPGPVAQGIK                      18284000                     0
## AAAAAAGPEMVR                                      0             104440000
## AAAAAAPGGGGGEPR                            25372000              11910000
##                               Intensity.Tconv.M6_inj1
## AAAAAAAAAAGAAGGR                                    0
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK                 6744500
## AAAAAAAGGAAIAVSTGIETATIQK                      314120
## AAAAAAGAASGIPGPVAQGIK                         2904800
## AAAAAAGPEMVR                                 12747000
## AAAAAAPGGGGGEPR                               2166100
##                               Intensity.Treg.M11_R1_inj1
## AAAAAAAAAAGAAGGR                                52952000
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK                          0
## AAAAAAAGGAAIAVSTGIETATIQK                              0
## AAAAAAGAASGIPGPVAQGIK                           10387000
## AAAAAAGPEMVR                                    21421000
## AAAAAAPGGGGGEPR                                        0
##                               Intensity.Treg.M12_3 Intensity.Treg.M5_inj1
## AAAAAAAAAAGAAGGR                         120130000                1668800
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK             14499000                 437460
## AAAAAAAGGAAIAVSTGIETATIQK                        0                 173070
## AAAAAAGAASGIPGPVAQGIK                     14162000                 520550
## AAAAAAGPEMVR                              70512000                1604600
## AAAAAAPGGGGGEPR                           13873000                 361240
##                               Intensity.Treg.M5_inj2
## AAAAAAAAAAGAAGGR                             4730400
## AAAAAAAAAAGDSDSWDADTFSMEDPVRK                      0
## AAAAAAAGGAAIAVSTGIETATIQK                     382430
## AAAAAAGAASGIPGPVAQGIK                        1090900
## AAAAAAGPEMVR                                 3327900
## AAAAAAPGGGGGEPR                               665860
pepData %>% sampleNames
## [1] "Intensity.Tconv.M11_R2_inj1" "Intensity.Tconv.M12_1"      
## [3] "Intensity.Tconv.M12_2"       "Intensity.Tconv.M6_inj1"    
## [5] "Intensity.Treg.M11_R1_inj1"  "Intensity.Treg.M12_3"       
## [7] "Intensity.Treg.M5_inj1"      "Intensity.Treg.M5_inj2"

The sample names are rather long and contain information on the spike-in concentration and the repeat. We will remove “Intensity.” from the string

sampleNames(pepData) <- pepData %>% sampleNames %>% str_replace(., pattern="Intensity.", replacement="")

Next we will add information on the proteins to the feature data. The feature data of the object pepData contains information for each peptide in the experiment. This info on the proteins can be found in the proteingroups.txt file of maxquant. The peptides.txt file contains many data on each feature (peptide). We will only retain specified columns Protein, Sequence, Reverse and Potential.Contaminant, indicating the protein, peptide sequence, if the peptide is a decoy and if it is a contaminant, respectively.

In older and more recent versions of maxquant the names of the fields might vary. We can therefore explore the names we want to select with the shiny app that can be launched from the selectFeatureData from the MSnbase package, i.e. by running the function without an fcol argument that specifies the columns you would like to select.

The code can be found below, but is not executed by this Rmd script (note, that the flag eval=FALSE has been used as option for the R chunk).

pepData<-selectFeatureData(pepData)

If we know the columns, we can specify them as a vector in the fcol slot.

pepData<-selectFeatureData(pepData,fcol=c("Proteins","Protein.names","Gene.names", "Potential.contaminant","Reverse","Sequence"))

Calculate how many non zero intensities we have per peptide. This will be useful for filtering.

fData(pepData)$nNonZero<- rowSums(exprs(pepData) > 0)

Peptides with zero intensities are missing peptides and should be represent with a NA value instead of 0.

exprs(pepData)[exprs(pepData)==0]<-NA

Next we create the data on the experimental layout. We can do this based on the samplenames.

pData(pepData)$cellType <- pepData %>% sampleNames %>% substr(1,4) %>% as.factor

We will also replace the sample names to make them shorter

sampleNames(pepData)<-pepData %>% sampleNames %>% substr(1,4) %>% paste0(.,"_m",1:8)

2.1 Data exploration

We can inspect the missingness in our data with the plotNA() function provided with MSnbase. 41% of all peptide intensities are missing and for some peptides we don’t even measure a signal in any sample. The missingness is similar across samples. Note, that we plot the peptide data, so the label protein in the plot refers to peptides.

plotNA(pepData)

When we look at density plots of the intensities, we see that most intensities are low and that there is a long tail to the right. It is more useful to explore the data upon log transformation.

plotDensities(exprs(pepData))

nrow(pepData)
## [1] 55814

There are 55814 peptides before preprocessing.

3 Preprocessing

We will log transform, normalize, filter and summarize the data.

3.1 Log transform the data

pepData <- log(pepData, base = 2)
plotDensities(exprs(pepData))

When we look at the densities we see small shifts in location and shape of the density distributions. So normalisation will be required. But, we will first perform filtering.

3.2 Filtering

3.2.1 Handling overlapping protein groups

In our approach a peptide can map to multiple proteins, as long as there is none of these proteins present in a smaller subgroup.

pepData<-pepData[fData(pepData)$Proteins %in% smallestUniqueGroups(fData(pepData)$Proteins),]

3.2.2 Remove reverse sequences (decoys) and contaminants

We now remove the contaminants, peptides that map to decoy sequences and proteins, which were only identified by peptides with modifications.

pepData <- pepData[fData(pepData)$Reverse!="+",]
pepData <- pepData[fData(pepData)$Potential.contaminant!="+",]

3.2.3 Remove peptides of proteins that were only identified with modified peptides

Proteins for which all peptides are carrying modifications (PTMs) can be considered as unreliable. We will filter out these proteins. This information is included in the Only.identified.by.site column of proteinGroups.txt file of maxquant. The variable is containing a “+” character if the protein is only identified by modified peptides and an empty string “” if this is not the case. Sometimes an NA value is also present. We will replace these by the empty character string. The information of the peptides.txt file can be linked to the proteinGroups.txt file by using the Proteins column from the peptides.txt file and the Protein.IDs column in the proteinGroups.txt file. If NAs are included in the

proteinGroups<-read.table(proteinGroupsFile, sep = "\t", header = TRUE, quote = "", comment.char = "")
only_site <- proteinGroups$Only.identified.by.site
only_site[is.na(only_site)] <- ""
proteinsOnlyIdentifiedWithPTMs <- proteinGroups$Protein.IDs[only_site=="+"]
pepData<-pepData[!(fData(pepData)$Proteins %in% proteinsOnlyIdentifiedWithPTMs),]

We will now remove the proteinGroups object from the R session to free memory.

rm(proteinGroups,only_site)
gc()
##           used  (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
## Ncells 4158115 222.1    8594064 459.0         NA  5446005 290.9
## Vcells 8237739  62.9   24432497 186.5      16384 24428835 186.4

3.2.4 Drop peptides that were only identified in one sample

We want to keep peptide that were at least observed twice.

pepData<-pepData[fData(pepData)$nNonZero>=2,]
nrow(pepData)
## [1] 43346

We keep 43346 peptides upon filtering.

3.3 Quantile normalize the data

pepData <- normalise(pepData, "quantiles")

3.4 Explore quantile normalized data

Upon normalisation the density curves for all samples coincide.

plotDensities(exprs(pepData))

We can visualize our data using a Multi Dimensional Scaling plot, eg. as provided by the limma package.

plotMDS(exprs(pepData),col=as.double(pData(pepData)$cellType))

The first axis in the plot is showing the leading log fold changes (differences on the log scale) between the samples. We notice that the leading differences (log FC) in the peptide data seems to be driven by technical variability. Indeed the samples do not seem to be clearly separated according to the spike in condition. Because there are missing values for the peptide level data we use the na.rm=TRUE argument to summarize the data based on the observed peptides for every protein.

3.5 Summarization to protein level

protData<-combineFeatures(pepData,fcol="Proteins",method="robust",na.rm=TRUE)
## Your data contains missing values. Please read the relevant section
## in the combineFeatures manual page for details the effects of
## missing values on data aggregation.

If you summarize using robust summarization please refer to Sticker et al. (2019) https://www.biorxiv.org/content/10.1101/668863v1

We filter proteins for which we have at least two protein expression values in both conditions:

protData<-protData[
protData %>% exprs %>% is.na %>% `!` %>% rowSums>4,]
plotMDS(exprs(protData),col=as.double(pData(pepData)$cellType))

4 Data Analysis

4.1 Estimation

MSqRob is currently working with a format where we have one dataframe for each protein. This will be changed in the next release. Therefore we first have to reorganise the data.

Next the models are fitted. This is done using the fit.model function. We only have to model the data using the factor condition from the pheno data of the protein level MSnSet. The name of the factor variable is specified in the fixed argument (if multiple predictors have to be incorporated in the model, a vector of variable names has to be provided in this argument.). The argument shrinkage is used to specify if ridge regression has to be adopted. For the sake of speed we do not do this in the tutorial. The shrinkage has to be specified for each variable in the fixed effects. We also have to indicate this for the intercept (which we never shrink). So we specify it at c(0,0) to indicate that the intercept (first 0) and the parameters for the factor condition (second 0) are not penalized. We set the robust_var function equal to FALSE, this functionality will be removed from the package in the next release.

protMSqRob <- MSnSet2protdata(protData, "Proteins",annotations=c("Protein.names","Gene.names"))
models <- fit.model(protdata=protMSqRob, response="quant_value", fixed="cellType",shrinkage.fixed=c(0,0),robust_var=FALSE)

4.2 Inference

Many biologists have problems with the reference coding. In MSqRob we have opted to formulate contrasts using all levels of a factor. Internally, the contrasts are than recasted according to the factor level that is the reference class.

L <- makeContrast("cellTypeTreg - cellTypeTcon",  levels=c("cellTypeTreg","cellTypeTcon"))
tests <- test.contrast_adjust(models, L)
nSig <- sum(tests$signif,na.rm=TRUE)
head(tests,nSig)
##                                                                         Protein.names
## P24452                                                     Macrophage-capping protein
## P09055                                                                Integrin beta-1
## P51807                                                Dynein light chain Tctex-type 1
## Q7TSJ2                                               Microtubule-associated protein 6
## Q7TMB8                                         Cytoplasmic FMR1-interacting protein 1
## P07356                                                                     Annexin A2
## P13020                                                                       Gelsolin
## Q9DC16                 Endoplasmic reticulum-Golgi intermediate compartment protein 1
## P08207                                                               Protein S100-A10
## P21550                                                                   Beta-enolase
## Q9CQ62                                       2,4-dienoyl-CoA reductase, mitochondrial
## Q7TPR4                                                                Alpha-actinin-1
## Q00417                                                         Transcription factor 7
## Q05186                                                               Reticulocalbin-1
## P24472                                                   Glutathione S-transferase A4
## Q8BGW0                                                                 Protein THEMIS
## O09131                                              Glutathione S-transferase omega-1
## Q61107                                                    Guanylate-binding protein 4
## Q80SU7                                         Interferon-induced very large GTPase 1
## P54071                                 Isocitrate dehydrogenase [NADP], mitochondrial
## Q8K157                                                             Aldose 1-epimerase
## Q60611                                                      DNA-binding protein SATB1
## Q61205                    Platelet-activating factor acetylhydrolase IB subunit gamma
## Q04447                                                         Creatine kinase B-type
## Q8BFP9 [Pyruvate dehydrogenase (acetyl-transferring)] kinase isozyme 1, mitochondrial
## P27782                                             Lymphoid enhancer-binding factor 1
## Q8C0P5                                                                     Coronin-2A
## Q80TM9                                                                      Nischarin
## Q99JT9                           1,2-dihydroxy-3-keto-5-methylthiopentene dioxygenase
## O08709                                                                Peroxiredoxin-6
## Q8BMS9                                    Ras association domain-containing protein 2
## Q3UW53                                                                  Protein Niban
##        Gene.names   estimate        se       df       Tval         pval
## P24452       Capg  2.5733130 0.1805818 8.773187  14.250122 2.288090e-07
## P09055      Itgb1  1.4938717 0.1451408 8.773187  10.292572 3.424618e-06
## P51807     Dynlt1 -1.8839705 0.2041501 8.773187  -9.228358 8.288363e-06
## Q7TSJ2       Map6 -4.8486543 0.4105680 6.773187 -11.809627 9.116831e-06
## Q7TMB8     Cyfip1  3.8124960 0.2716250 5.773187  14.035882 1.110763e-05
## P07356      Anxa2  2.1200806 0.2383387 8.773187   8.895242 1.112390e-05
## P13020        Gsn -1.7559807 0.2043310 8.773187  -8.593806 1.463261e-05
## Q9DC16     Ergic1  2.2349065 0.2601791 8.773187   8.589877 1.468574e-05
## P08207    S100a10  2.1513269 0.2294401 7.773187   9.376422 1.658871e-05
## P21550       Eno3  2.1652362 0.2675637 8.773187   8.092415 2.349849e-05
## Q9CQ62      Decr1  1.5363410 0.1911350 8.773187   8.037989 2.477275e-05
## Q7TPR4      Actn1 -1.6026653 0.2098550 8.773187  -7.637013 3.687910e-05
## Q00417       Tcf7 -1.4581094 0.1926169 8.773187  -7.569996 3.947666e-05
## Q05186       Rcn1  1.8598060 0.2483611 8.773187   7.488315 4.291799e-05
## P24472      Gsta4  3.7435527 0.3490961 5.773187  10.723560 4.994952e-05
## Q8BGW0     Themis -1.9323078 0.2676994 8.773187  -7.218200 5.686465e-05
## O09131      Gsto1  2.8857200 0.4160874 8.773187   6.935369 7.699875e-05
## Q61107       Gbp4  1.6022703 0.2354065 8.773187   6.806397 8.867683e-05
## Q80SU7      Gvin1  1.2547038 0.1868043 8.773187   6.716677 9.794098e-05
## P54071       Idh2 -1.5283193 0.2279181 8.773187  -6.705563 9.916050e-05
## Q8K157       Galm  1.4390806 0.2160349 8.773187   6.661334 1.041808e-04
## Q60611      Satb1 -1.6974812 0.2726644 8.773187  -6.225532 1.716383e-04
## Q61205   Pafah1b3  1.4460925 0.2366737 8.773187   6.110067 1.966883e-04
## Q04447        Ckb -1.5665234 0.2594216 8.773187  -6.038522 2.141953e-04
## Q8BFP9       Pdk1 -1.6705728 0.2822215 8.773187  -5.919367 2.472431e-04
## P27782       Lef1 -2.7050547 0.4569871 8.773187  -5.919324 2.472559e-04
## Q8C0P5     Coro2a  3.3088892 0.4760683 6.773187   6.950450 2.569178e-04
## Q80TM9      Nisch -1.2700088 0.2165037 8.773187  -5.865992 2.638150e-04
## Q99JT9       Adi1 -0.9134530 0.1561034 8.773187  -5.851589 2.684915e-04
## O08709      Prdx6 -0.8809735 0.1532448 8.773187  -5.748797 3.046074e-04
## Q8BMS9     Rassf2 -1.1524409 0.2023824 8.773187  -5.694374 3.258429e-04
## Q3UW53    Fam129a  2.5595788 0.4551092 8.773187   5.624098 3.556796e-04
##                qval signif
## P24452 0.0008674147   TRUE
## P09055 0.0064913640   TRUE
## P51807 0.0069592040   TRUE
## Q7TSJ2 0.0069592040   TRUE
## Q7TMB8 0.0069592040   TRUE
## P07356 0.0069592040   TRUE
## P13020 0.0069592040   TRUE
## Q9DC16 0.0069592040   TRUE
## P08207 0.0069875349   TRUE
## P21550 0.0085375921   TRUE
## Q9CQ62 0.0085375921   TRUE
## Q7TPR4 0.0115120025   TRUE
## Q00417 0.0115120025   TRUE
## Q05186 0.0116215794   TRUE
## P24472 0.0126239096   TRUE
## Q8BGW0 0.0134733669   TRUE
## O09131 0.0171707204   TRUE
## Q61107 0.0186763263   TRUE
## Q80SU7 0.0187958721   TRUE
## P54071 0.0187958721   TRUE
## Q8K157 0.0188071218   TRUE
## Q60611 0.0295763957   TRUE
## Q61205 0.0324193646   TRUE
## Q04447 0.0338339309   TRUE
## Q8BFP9 0.0350983219   TRUE
## P27782 0.0350983219   TRUE
## Q8C0P5 0.0350983219   TRUE
## Q80TM9 0.0350983219   TRUE
## Q99JT9 0.0350983219   TRUE
## O08709 0.0384922168   TRUE
## Q8BMS9 0.0398474371   TRUE
## Q3UW53 0.0421369205   TRUE

There are 32 proteins with a significant effect at the 5% FDR level.

4.3 Plots

4.3.1 Volcano-plot

volcano<- ggplot(tests,aes(x=estimate,y=-log10(pval),color=signif)) + geom_point() + scale_color_manual(values=c("black","red"))
volcano

An interactive volcano plot can be obtained by using the plotly library. We alter the object tests to add the protein name to the data frame in ggplot, this will allow us to see the protein name in the interactive plot.

library(plotly)
volcano<- ggplot(tests %>% rownames_to_column("protein"), aes(x=estimate,y=-log10(pval),color=signif,protein=protein,annotation=Protein.names,gene=Gene.names)) + geom_point() + scale_color_manual(values=c("black","red"))
ggplotly(volcano)

4.3.2 Heatmap

There are no significant proteins so we will plot a heatmap of the top 10 proteins, instead. Note, that these results are not statistically significant.

sigNames<- tests %>% rownames_to_column("protein") %>% filter(signif) %>% pull(protein)
heatmap(exprs(protData)[sigNames,])

4.3.3 Detail plot for the top 5 proteins.

We first extract the normalized protein expression values for a particular protein and we create a dataframe that contains the protein expression values and the relevant variables of the experimental design.

for (protName in rownames(tests)[1:5])
{
plotDat<-pepData[fData(pepData)$Proteins==protName] %>% exprs
plotDataStack<-data.frame(quant_value=c(plotDat),sequence=rep(rownames(plotDat),rep=ncol(plotDat)),sample=rep(colnames(plotDat),each=nrow(plotDat)),condition=rep(pData(pepData)$cellType,each=nrow(plotDat)))
plot1 <- ggplot(plotDataStack, aes(x=sample, y=quant_value,fill=condition))
print(plot1 +geom_boxplot(outlier.shape=NA) + geom_point(position=position_jitter(width=.1),aes(shape=sequence)) + scale_shape_manual(values=1:nrow(plotDat)) +labs(title = protName, x="sample", y="Peptide intensity (log2)"))
}