Single cell proteomics data acquired by the Slavov Lab. This is the dataset associated to the third version of the preprint. It contains quantitative information of melanoma cells and monocytes at PSM, peptide and protein level. This version of the data was acquired using the pSCoPE MS acquisition approach.

leduc2022_pSCoPE

Format

A QFeatures object with 138 assays, each assay being a SingleCellExperiment object:

  • Assay 1-134: PSM data acquired with a TMT-18plex protocol, hence those assays contain 18 columns. Columns hold quantitative information from single-cell channels, carrier channels, reference channels, empty (negative control) channels and unused channels.

  • peptides: peptide data containing quantitative data for 20,804 peptides and 1556 single-cells. These data have been filtered to keep high-quality PSMs, all batches have been normalized to the reference channel, PSMs were aggregated to peptides, and single-cells with low median coefficient of variation were kept.

  • peptides_log: peptide data containing quantitative data for 12,284 peptides and 1543 single-cells. The peptides data was further normalized, highly missing peptides were removed and the quantifications were log-transformed.

  • proteins_norm2: protein data containing quantitative data for 2844 proteins and 1543 single-cells. The peptides from peptides_log were aggregated to proteins and normalized.

  • proteins_processed: protein data containing quantitative data for 2844 proteins and 1543 single-cells. The proteins_norm2 data were imputed, batch corrected and normalized.

The colData(leduc2022_pSCoPE()) contains cell type annotation, LC batch information, the TMT label, the MS run ID. We also added the sample prep annotations provided by the cellenONE dispensing device (only for single cells): time stamp of cell isolation by the device, the diameter and elongation of the cell, the ID of the sample glass side (4 slides in total), the field within the glass (each slide is divided in 4 field), the pooled well ID (each field contains 9 pools), the x and y coordinates of each cell dropped in a field and of each cell pool upon pickup. Finally, we also retrieved the melanoma subpopulation generated by the authors upon data analysis. The main population is encoded as A while the small population is encoded B. The description of the rowData fields for the PSM data can be found in the MaxQuant documentation.

Source

The data were downloaded from the Slavov Lab website. The raw data and the quantification data can also be found in the massIVE repository MSV000089159: ftp://massive.ucsd.edu/MSV000089159.

Acquisition protocol

The data were acquired using the following setup. More information can be found in the source article (see References).

  • Cell isolation: CellenONE cell sorting.

  • Sample preparation performed using the improved SCoPE2 protocol using the CellenONE liquid handling system. nPOP cell lysis (DMSO) + trypsin digestion + TMT-18plex labeling and pooling. A target library was generated as well to perform prioritized DDA (Huffman et al. 2022) using MaxQuant.Live (2.0.3).

  • Separation: online nLC (DionexUltiMate 3000 UHPLC with a 25cm x 75um IonOpticks Aurora Series UHPLC column; 200nL/min).

  • Ionization: ESI (1,800V).

  • Mass spectrometry: Thermo Scientific Q-Exactive (MS1 resolution = 70,000; MS2 accumulation time = 300ms; MS2 resolution = 70,000). Prioritized data acquisition was performed using the pSCoPE protocol (Huffman et al. 2022)

  • Data analysis: MaxQuant (1.6.17.0) + DART-ID

Data collection

The PSM data were collected from a shared Google Drive folder that is accessible from the SlavovLab website (see Source section). The folder contains the following files of interest:

  • ev_updated.txt: the MaxQuant/DART-ID output file

  • annotation.csv: sample annotation

  • batch.csv: batch annotation

  • t0.csv: the processed data table containing the peptides data

  • t3.csv: the processed data table containing the peptides_log data

  • t4b.csv: the processed data table containing the proteins_norm2 data

  • t6.csv: the processed data table containing the proteins_processed data

We combined the sample annotation and the batch annotation in a single table. We also formatted the quantification table so that columns match with those of the annotations. Both annotation and quantification tables are then combined in a single QFeatures object using the scp::readSCP() function.

The 4 CSV files were loaded and formatted as SingleCellExperiment objects and the sample metadata were matched to the column names (mapping is retrieved after running the author's original R script) and stored in the colData. The object is then added to the QFeatures object (containing the PSM assays) and the rows of the peptide data are linked to the rows of the PSM data based on the peptide sequence information through an AssayLink object.

References

Andrew Leduc, Gray Huffman, and Nikolai Slavov. 2022. “Droplet Sample Preparation for Single-Cell Proteomics Applied to the Cell Cycle.” bioRxiv. Link to article

Gray Huffman, Andrew Leduc, Christoph Wichmann, Marco di Gioia, Francesco Borriello, Harrison Specht, Jason Derks, et al. 2022. “Prioritized Single-Cell Proteomics Reveals Molecular and Functional Polarization across Primary Macrophages.” bioRxiv. Link to article.

Examples

# \donttest{
leduc2022_pSCoPE()
#> see ?scpdata and browseVignettes('scpdata') for documentation
#> loading from cache
#> An instance of class QFeatures containing 138 assays:
#>  [1] eAL00219: SingleCellExperiment with 6269 rows and 18 columns 
#>  [2] eAL00220: SingleCellExperiment with 6603 rows and 18 columns 
#>  [3] eAL00221: SingleCellExperiment with 6511 rows and 18 columns 
#>  ...
#>  [136] peptides_log: SingleCellExperiment with 12284 rows and 1543 columns 
#>  [137] proteins_norm2: SingleCellExperiment with 2844 rows and 1543 columns 
#>  [138] proteins_processed: SingleCellExperiment with 2844 rows and 1543 columns 
# }