sager
package to import
and analyse sage
results in Rvignettes/sager.Rmd
sager.Rmd
sager
package
Sage is a new
cross-platform, extremely performant, open source proteomics search
engine Michael Lazear. It uses
mzML files as input, is parameterised by a json file
and is executed from the command line. In addition to peptide
identification, sage also includes a variety of advanced features such
as retention time prediction and quantification (both isobaric &
LFQ). It produces two files: results.sage.tsv
with the
identification results and, when configured to quantify features,
quant.tsv
.
The sager
package uses these two files as input. It
facilitates their import into established Bioconductor classes:
Identification results are parsed and imported as PSMatch::PSM()
objects with sagePSM()
.
Quantitation (and identification) results are parsed, merged and
imported as QFeatures::QFeatures()
objects with sageQFeatures()
.
Sage results can also be integrated with a Spectra::Spectra() instance (holding the raw data) into a MsExperiment::MsExperiment() object, as documented in this document.
The goal of the sager
package can thus be summarised as
a way to leverage the existing R for Mass Spectrometry
and Bioconductor infrastructure
to analyse sage results.
To install sager
and its dependencies, you’ll need the
BiocManager
package, that can be installed from CRAN
(unless you already have it):
if (!requireNamespace("BiocManager"))
install.packages("BiocManager")
Note that sager
is under constant development, and thus
will depend on some under-development dependencies, that haven’t made it
into the stable releases yet. Here are the instructions to install
these:
To install the devel version of PSMatch
:
BiocManager::install("RforMassSpectrometry/PSMatch")
To install sager
and its dependencies, you can run the
following command:
BiocManager::install("UCLouvain-CBIO/sager")
sage
results
library(sager)
The sager
package provides example identification and
quantitation (procuced by sage
) and the raw data
(mzML
files) that were used the generate the former.
sagerAvailableData()
#> quant id mzml
#> TRUE TRUE TRUE
Data can be added with the sagerAddData()
function, that
avoid downloading data files if they are already available in the
package’s local cache directory:
sagerAddData()
#> 'quant' already available.
#> 'id' already available.
#> 'mzml' already available.
For details, including updating local data and their provenance,
please see the ?sagerData
manual page.
As noted above, identification data is handled by the PSMatch package. This section only provides a very short example of how to use it. For more details, see the package’s website and vignettes.
library(PSMatch)
psm <- sagePSM(sagerIdData())
psm
#> PSM with 977 rows and 34 columns.
#> names(34): peptide proteins ... protein_fdr ms1_intensity
Here we briefly visualise the hyperscore and spectrum FDR densities for the forward and decoy/reverse hits:
library(ggplot2)
data.frame(psm) |>
ggplot(aes(x = hyperscore,
colour = label)) +
geom_density()
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.logical(col, optional = optional): Direct call of
#> 'as.data.frame.logical()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
data.frame(psm) |>
ggplot(aes(x = spectrum_fdr,
colour = label)) +
geom_density()
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.logical(col, optional = optional): Direct call of
#> 'as.data.frame.logical()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
Below, we filter the PSMs removing decoy hits, PSMs of rank > 1 (none in the small testing data), shared peptides (none in the small testing data) and those that have a spectrum FDR below 0.05:
psm <- filterPSMs(psm) |>
filterPsmFdr()
#> Starting with 977 PSMs:
#> Removed 298 decoy hits.
#> Removed 0 PSMs with rank > 1.
#> Removed 0 shared peptides.
#> 679 PSMs left.
This leaves us with 679 PSMs matched against 481 scans.
psm
#> PSM with 679 rows and 34 columns.
#> names(34): peptide proteins ... protein_fdr ms1_intensity
As noted above, quantitation data is handled by the QFeatures package. This section only provides a very short example of how to use it. For more details, see the package’s website and vignettes.
We will also make use of some function from the scp package.
The quantitation data has been produced from running
sage
on three raw data files. These three files can
correspond to two different experimental designs:
The three acquisitions correspond to three fractions of the same set of 11 multiplexed samples. In this case, that we’ll call the Multiple fractions for the same set of samples, we end up with a single quantitation matrix with 11 columns and row containing the features from all files.
The three acquisitions correspond to single acquisitions from 3 different sets of 11 samples. In this case, that we’ll call the Different sets of samples per acquisition, we end up with three quantitation matrices, each with 11 columns and rows containing the features from the respective files.
Both sitations are illustrated below using the
sageQFeatures()
function , and a differenciated by the
value of the splitBy
argument. Note that below, the
byQuant
argument is set to accomodate the old sage default
header; this byQuant
argument wouldn’t need to be set
explicitly for newer output files.
Let’s read the data in, specifying that the tables should not be split into different quantitative assays.
qf1 <- sageQFeatures(sagerQuantData(),
sagerIdData(),
byQuant = c("file", "scannr"),
splitBy = NULL)
qf1
#> An instance of class QFeatures containing 1 assays:
#> [1] sage: SummarizedExperiment with 977 rows and 11 columns
The piped commands below implement the following worflow:
NA
, to make missing values explicit.
qf1 |>
filterFeatures(~ label > 0) |> ## 1
filterFeatures(~ rank == 1) |> ## 2
filterFeatures(~ spectrum_fdr < 0.05) |> ## 3
zeroIsNA(1) |> ## 4
logTransform(i = 1, name = "log_sage") |> ## 5
normalize(i = "log_sage", ## 6
name = "norm_sage",
method = "center.median") |>
aggregateFeatures(i = "norm_sage", ## 7
name = "proteins",
fcol = "proteins",
fun = colMedians,
na.rm = TRUE) -> qf1
#> 'label' found in 1 out of 1 assay(s)
#> 'rank' found in 1 out of 1 assay(s)
#> 'spectrum_fdr' found in 1 out of 1 assay(s)
#> Your quantitative data contain missing values. Please read the relevant
#> section(s) in the aggregateFeatures manual page regarding the effects
#> of missing values on data aggregation.
The new QFeatures
object contains 4 assays:
qf1
#> An instance of class QFeatures containing 4 assays:
#> [1] sage: SummarizedExperiment with 309 rows and 11 columns
#> [2] log_sage: SummarizedExperiment with 309 rows and 11 columns
#> [3] norm_sage: SummarizedExperiment with 309 rows and 11 columns
#> [4] proteins: SummarizedExperiment with 297 rows and 11 columns
The pipeline can then be visualised as shown below:
plot(qf1)
In this second example, the quantitative data are automatically split based on the mzML files.
qf <- sageQFeatures(sagerQuantData(), sagerIdData(),
byQuant = c("file", "scannr"))
qf
#> An instance of class QFeatures containing 3 assays:
#> [1] subset_dq_00084_11cell_90min_hrMS2_A5.mzML: SummarizedExperiment with 288 rows and 11 columns
#> [2] subset_dq_00086_11cell_90min_hrMS2_A9.mzML: SummarizedExperiment with 353 rows and 11 columns
#> [3] subset_dq_00087_11cell_90min_hrMS2_A11.mzML: SummarizedExperiment with 336 rows and 11 columns
Before processing the data, let’s annotate it. The
renameqf()
helper function will automate come of the file
renaming. The annotation will be composed of the file name of origin
(filename
), the TMT tag (tmt_tag
), and and
acquisition identifier (acquisition
).
renameqf <- function(x) sub("\\.mzML", "", sub("^.+hrMS2_", "", x))
qf$filename <- rep(names(qf), each = 11)
qf$tmt_tag <- rep(paste0("tmt_", 1:11), length(qf))
qf$acquisition <- renameqf(qf$filename)
names(qf) <- paste0("psm", renameqf(names(qf)))
Below, we update the assay and sample names.
qf <- renamePrimary(qf, paste(qf$acquisition, qf$tmt_tag, sep = "."))
colnames(qf) <- CharacterList(lapply(colnames(qf), renameqf))
colData(qf)
#> DataFrame with 33 rows and 3 columns
#> filename tmt_tag acquisition
#> <character> <character> <character>
#> A5.tmt_1 subset_dq_... tmt_1 A5
#> A5.tmt_2 subset_dq_... tmt_2 A5
#> A5.tmt_3 subset_dq_... tmt_3 A5
#> A5.tmt_4 subset_dq_... tmt_4 A5
#> A5.tmt_5 subset_dq_... tmt_5 A5
#> ... ... ... ...
#> A11.tmt_7 subset_dq_... tmt_7 A11
#> A11.tmt_8 subset_dq_... tmt_8 A11
#> A11.tmt_9 subset_dq_... tmt_9 A11
#> A11.tmt_10 subset_dq_... tmt_10 A11
#> A11.tmt_11 subset_dq_... tmt_11 A11
qf
#> An instance of class QFeatures containing 3 assays:
#> [1] psmA5: SummarizedExperiment with 288 rows and 11 columns
#> [2] psmA9: SummarizedExperiment with 353 rows and 11 columns
#> [3] psmA11: SummarizedExperiment with 336 rows and 11 columns
The piped commands below implement the following worflow:
NA
in all assays, to make missing values
explicit.
qf |>
filterFeatures(~ label > 0) |> ## 1
filterFeatures(~ rank == 1) |> ## 2
filterFeatures(~ spectrum_fdr < 0.05) |> ## 3
zeroIsNA(1:3) |> ## 4
logTransform(i = 1:3, ## 5
name = paste0("log_", names(qf))) |>
aggregateFeaturesOverAssays(i = 4:6, ## 6
fcol = "peptide",
name = sub("psm", "peptide", names(qf)),
fun = colMedians,
na.rm = TRUE) |>
joinAssays(i = 7:9, ## 7
name = "peptides") |>
normalize(i = 10, ## 8
name = "norm_peptides",
method = "center.median") |>
aggregateFeatures(i = "norm_peptides", ## 9
name = "proteins",
fcol = "proteins",
fun = colMedians,
na.rm = TRUE) -> qf
#> 'label' found in 3 out of 3 assay(s)
#> 'rank' found in 3 out of 3 assay(s)
#> 'spectrum_fdr' found in 3 out of 3 assay(s)
#> Aggregated: 1/3
#> Aggregated: 2/3
#> Aggregated: 3/3
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.numeric(col, optional = optional): Direct call of
#> 'as.data.frame.numeric()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Warning in as.data.frame.integer(col, optional = optional): Direct call of
#> 'as.data.frame.integer()' is deprecated. Use 'as.data.frame.vector()' or
#> 'as.data.frame()' instead
#> Your quantitative data contain missing values. Please read the relevant
#> section(s) in the aggregateFeatures manual page regarding the effects
#> of missing values on data aggregation.
The final QFeatures object now contains 12 assays:
qf
#> An instance of class QFeatures containing 12 assays:
#> [1] psmA5: SummarizedExperiment with 91 rows and 11 columns
#> [2] psmA9: SummarizedExperiment with 109 rows and 11 columns
#> [3] psmA11: SummarizedExperiment with 109 rows and 11 columns
#> ...
#> [10] peptides: SummarizedExperiment with 308 rows and 33 columns
#> [11] norm_peptides: SummarizedExperiment with 308 rows and 33 columns
#> [12] proteins: SummarizedExperiment with 297 rows and 33 columns
The pipeline can then be visualised as shown below:
plot(qf)
We have seen above how to import and process identification and quantitation data produced by sage using standard R/Bioconductor tools. Our main goal is to integrate these with the raw MS data that was used to generate them.
Let’s start by importing these raw data into R as a
Spectra
object. The three mzML files can be retrieved with
the sagerMzMLData()
function.
library(Spectra)
sp <- Spectra(sagerMzMLData())
sp
#> MSn data (Spectra) with 598 spectra in a MsBackendMzR backend:
#> msLevel rtime scanIndex
#> <integer> <numeric> <integer>
#> 1 2 279.184 1
#> 2 2 731.098 2
#> 3 2 736.768 3
#> 4 2 863.610 4
#> 5 2 866.034 5
#> ... ... ... ...
#> 594 2 5155.97 203
#> 595 2 5163.48 204
#> 596 2 5178.80 205
#> 597 2 5179.03 206
#> 598 2 5249.68 207
#> ... 33 more variables/columns.
#>
#> file(s):
#> subset_dq_00084_11cell_90min_hrMS2_A5.mzML
#> subset_dq_00086_11cell_90min_hrMS2_A9.mzML
#> subset_dq_00087_11cell_90min_hrMS2_A11.mzML
We can now create dedicates keys to identify features in the
different data produced above. Let’s start with the spectra within the
sp
object. Later, we will create a similar key in the
QFeatures
and PSM
objects generated above.
The goal of these keys will be to identify matching features across
data types, such as for example to match de MS spectra to the PSMs and
peptides or proteins. These different data types can be stored together
in an MsExperiment
object, and matched through one or
multiple keys.
We can add a key (by default, names .KEY
that will
identify a feature/scan by concatenating the scan number and the file in
which that scan was acquired.
sp$filename <- basename(dataOrigin(sp))
sp <- addKey(sp, vars = c("filename", "spectrumId"))
head(sp$.KEY)
#> [1] "subset_dq_00084_11cell_90min_hrMS2_A5.mzML.controllerType=0 controllerNumber=1 scan=1004"
#> [2] "subset_dq_00084_11cell_90min_hrMS2_A5.mzML.controllerType=0 controllerNumber=1 scan=4181"
#> [3] "subset_dq_00084_11cell_90min_hrMS2_A5.mzML.controllerType=0 controllerNumber=1 scan=4224"
#> [4] "subset_dq_00084_11cell_90min_hrMS2_A5.mzML.controllerType=0 controllerNumber=1 scan=5124"
#> [5] "subset_dq_00084_11cell_90min_hrMS2_A5.mzML.controllerType=0 controllerNumber=1 scan=5144"
#> [6] "subset_dq_00084_11cell_90min_hrMS2_A5.mzML.controllerType=0 controllerNumber=1 scan=5517"
Let’s do the same of the PSM object created above:
psm <- addKey(psm, vars = c("filename", "scannr"))
head(psm$.KEY)
#> [1] "subset_dq_00084_11cell_90min_hrMS2_A5.mzML.controllerType=0 controllerNumber=1 scan=31852"
#> [2] "subset_dq_00087_11cell_90min_hrMS2_A11.mzML.controllerType=0 controllerNumber=1 scan=31961"
#> [3] "subset_dq_00087_11cell_90min_hrMS2_A11.mzML.controllerType=0 controllerNumber=1 scan=16197"
#> [4] "subset_dq_00087_11cell_90min_hrMS2_A11.mzML.controllerType=0 controllerNumber=1 scan=32666"
#> [5] "subset_dq_00086_11cell_90min_hrMS2_A9.mzML.controllerType=0 controllerNumber=1 scan=22318"
#> [6] "subset_dq_00086_11cell_90min_hrMS2_A9.mzML.controllerType=0 controllerNumber=1 scan=11726"
And finally, with the QFeatures object.
Running addKey
on a QFeatures
object will
only add a key to assays that do have the relevant variables:
sapply(rowData(qf), function(x) ".KEY" %in% names(x))
#> psmA5 psmA9 psmA11 log_psmA5 log_psmA9
#> TRUE TRUE TRUE TRUE TRUE
#> log_psmA11 peptideA5 peptideA9 peptideA11 peptides
#> TRUE TRUE FALSE TRUE FALSE
#> norm_peptides proteins
#> FALSE FALSE
head(unname(rowData(qf[["psmA5"]])$.KEY))
#> [1] "subset_dq_00084_11cell_90min_hrMS2_A5.mzML.controllerType=0 controllerNumber=1 scan=10460"
#> [2] "subset_dq_00084_11cell_90min_hrMS2_A5.mzML.controllerType=0 controllerNumber=1 scan=10530"
#> [3] "subset_dq_00084_11cell_90min_hrMS2_A5.mzML.controllerType=0 controllerNumber=1 scan=10648"
#> [4] "subset_dq_00084_11cell_90min_hrMS2_A5.mzML.controllerType=0 controllerNumber=1 scan=10803"
#> [5] "subset_dq_00084_11cell_90min_hrMS2_A5.mzML.controllerType=0 controllerNumber=1 scan=11681"
#> [6] "subset_dq_00084_11cell_90min_hrMS2_A5.mzML.controllerType=0 controllerNumber=1 scan=11785"
Let’s now bundle these three types of data together into and
MsExperiment
object:
library(MsExperiment)
mse <- MsExperiment()
spectra(mse) <- sp
qdata(mse) <- qf
otherData(mse)$PSM <- psm
mse
#> Object of class MsExperiment
#> Spectra: MS2 (598)
#> QFeatures: 12 assay(s)
#> Other data: PSM
Each data element mse
is referenced by the same key. We
can now query for such keys direcly on the MsExperiment
object, that will percolate the request to its components.
k <- c("subset_dq_00084_11cell_90min_hrMS2_A5.mzML.controllerType=0 controllerNumber=1 scan=31852",
"subset_dq_00087_11cell_90min_hrMS2_A11.mzML.controllerType=0 controllerNumber=1 scan=32666")
filterKey(mse, k)
#> harmonizing input:
#> removing 132 sampleMap rows not in names(experiments)
#> removing 11 colData rownames not in sampleMap 'primary'
#> Object of class MsExperiment
#> Spectra: MS2 (2)
#> QFeatures: 6 assay(s)
#> Other data: PSM
It is however not convenient to search an arbitrary key. The more useful usecase would be to look for a protein of interest, for example, and then identify the features across other data types using the relevant key(s).
Let’s imagine that protein DNLI3_HUMAN is of interest to us.
It is referenced as "sp|P49916|DNLI3_HUMAN"
in our data.
The dropEmptyAssay()
simply drops assays that didn’t
reference the protein of interest - it is used to simplify the output
here.
qdata(mse)["sp|P49916|DNLI3_HUMAN", , ] |>
dropEmptyAssays()
#> Warning: 'experiments' dropped; see 'drops()'
#> harmonizing input:
#> removing 66 sampleMap rows not in names(experiments)
#> An instance of class QFeatures containing 6 assays:
#> [1] psmA5: SummarizedExperiment with 1 rows and 11 columns
#> [2] log_psmA5: SummarizedExperiment with 1 rows and 11 columns
#> [3] peptideA5: SummarizedExperiment with 1 rows and 11 columns
#> [4] peptides: SummarizedExperiment with 1 rows and 33 columns
#> [5] norm_peptides: SummarizedExperiment with 1 rows and 33 columns
#> [6] proteins: SummarizedExperiment with 1 rows and 33 columns
These are features that we might be interested in tracking across all
data using their key… which we can easily retrieve with the
getKey()
function:
qdata(mse)["sp|P49916|DNLI3_HUMAN", , ] |>
dropEmptyAssays() |>
getKey()
#> Warning: 'experiments' dropped; see 'drops()'
#> List of length 3
#> names(3): psmA5 log_psmA5 peptideA5
Given that getKey()
is run on a QFeatures
object, it returns the keys of all assays. Let’s extract the unique keys
as a character:
my_key <- qdata(mse)["sp|P49916|DNLI3_HUMAN", , ] |>
dropEmptyAssays() |>
getKey() |>
unlist() |>
unique()
#> Warning: 'experiments' dropped; see 'drops()'
my_key
#> [1] "subset_dq_00084_11cell_90min_hrMS2_A5.mzML.controllerType=0 controllerNumber=1 scan=17928"
And now use mykey
to subset the different component of
the MsExperiment
object:
my_mse <- filterKey(mse, my_key)
#> harmonizing input:
#> removing 165 sampleMap rows not in names(experiments)
#> removing 22 colData rownames not in sampleMap 'primary'
my_mse
#> Object of class MsExperiment
#> Spectra: MS2 (1)
#> QFeatures: 3 assay(s)
#> Other data: PSM
From the above, we can see that DNLI3_HUMAN was identified and quantified by a single peptides, that has the following quantitation profile acorss the 11 samples…
assay(qdata(my_mse)[[3]])
#> A5.tmt_1 A5.tmt_2 A5.tmt_3 A5.tmt_4 A5.tmt_5
#> [+229.1629]-SATNLPQLK[+229.1629] 18.48759 19.64292 NaN NaN 18.4555
#> A5.tmt_6 A5.tmt_7 A5.tmt_8 A5.tmt_9 A5.tmt_10
#> [+229.1629]-SATNLPQLK[+229.1629] NaN 20.28052 18.19301 NaN 18.82892
#> A5.tmt_11
#> [+229.1629]-SATNLPQLK[+229.1629] NaN
… and that that peptides was matched by a single spectrum, shown below:
plotSpectra(spectra(my_mse))
Please open an issue on the package’s Github repository.
#> R Under development (unstable) (2024-03-06 r86056)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] MsExperiment_1.5.4 Spectra_1.13.5
#> [3] ProtGenerics_1.35.3 BiocParallel_1.37.1
#> [5] scp_1.13.1 QFeatures_1.13.2
#> [7] MultiAssayExperiment_1.29.1 SummarizedExperiment_1.33.3
#> [9] Biobase_2.63.0 GenomicRanges_1.55.3
#> [11] GenomeInfoDb_1.39.8 IRanges_2.37.1
#> [13] MatrixGenerics_1.15.0 matrixStats_1.2.0
#> [15] ggplot2_3.5.0 PSMatch_1.7.1
#> [17] S4Vectors_0.41.4 BiocGenerics_0.49.1
#> [19] sager_0.3.2 BiocStyle_2.31.0
#>
#> loaded via a namespace (and not attached):
#> [1] DBI_1.2.2 bitops_1.0-7
#> [3] rlang_1.1.3 magrittr_2.0.3
#> [5] clue_0.3-65 compiler_4.4.0
#> [7] RSQLite_2.3.5 systemfonts_1.0.6
#> [9] vctrs_0.6.5 pkgconfig_2.0.3
#> [11] MetaboCoreUtils_1.11.2 crayon_1.5.2
#> [13] fastmap_1.1.1 dbplyr_2.4.0
#> [15] XVector_0.43.1 labeling_0.4.3
#> [17] utf8_1.2.4 rmarkdown_2.26
#> [19] ragg_1.2.7 purrr_1.0.2
#> [21] bit_4.0.5 xfun_0.42
#> [23] zlibbioc_1.49.0 cachem_1.0.8
#> [25] jsonlite_1.8.8 blob_1.2.4
#> [27] highr_0.10 DelayedArray_0.29.9
#> [29] parallel_4.4.0 cluster_2.1.6
#> [31] R6_2.5.1 bslib_0.6.1
#> [33] jquerylib_0.1.4 Rcpp_1.0.12
#> [35] bookdown_0.38 knitr_1.45
#> [37] BiocBaseUtils_1.5.1 Matrix_1.6-5
#> [39] igraph_2.0.2 tidyselect_1.2.1
#> [41] abind_1.4-5 yaml_2.3.8
#> [43] codetools_0.2-19 curl_5.2.1
#> [45] lattice_0.22-5 tibble_3.2.1
#> [47] withr_3.0.0 evaluate_0.23
#> [49] desc_1.4.3 BiocFileCache_2.11.1
#> [51] pillar_1.9.0 BiocManager_1.30.22
#> [53] filelock_1.0.3 ncdf4_1.22
#> [55] generics_0.1.3 RCurl_1.98-1.14
#> [57] munsell_0.5.0 scales_1.3.0
#> [59] glue_1.7.0 lazyeval_0.2.2
#> [61] tools_4.4.0 mzR_2.37.2
#> [63] fs_1.6.3 grid_4.4.0
#> [65] MsCoreUtils_1.15.4 colorspace_2.1-0
#> [67] SingleCellExperiment_1.25.0 GenomeInfoDbData_1.2.11
#> [69] cli_3.6.2 textshaping_0.3.7
#> [71] fansi_1.0.6 S4Arrays_1.3.6
#> [73] dplyr_1.1.4 AnnotationFilter_1.27.0
#> [75] gtable_0.3.4 sass_0.4.8
#> [77] digest_0.6.35 SparseArray_1.3.4
#> [79] htmlwidgets_1.6.4 farver_2.1.1
#> [81] memoise_2.0.1 htmltools_0.5.7
#> [83] pkgdown_2.0.7.9000 lifecycle_1.0.4
#> [85] httr_1.4.7 bit64_4.0.5
#> [87] MASS_7.3-60.2