The `proteomic` dataset contains the 20Q2 quantitative profiling of proteins via mass spectrometry from the Gygi lab. This dataset contains 12399 proteins tested in 375 cell lines, including 24 primary diseases and 27 lineages. The columns of this dataset are: `depmap_id`, a foreign key corresponding to the cancer cell lineage, `cell_line` the common CCLE name of the cancer cell lines, `gene_name` containing the HUGO gene name and `entrez_id` containing only the entrez ID# and `protein_expression` which contains the normalized protein expression for cancer cell lines. This dataset can be loaded into R environment with the `depmap_proteomic` function.
Format
A data frame with 24963776 rows (cell lines) and 12 variables:
- depmap_id
Cell line foreign key (i.e. "ACH-000956")
- cell_line
Name of cancer cell line (i.e. "22RV1_PROSTATE")
- gene_name
HUGO symbol (e.g. "TSPAN6")
- entrez_id
Ensembl ID (e.g. ENSG00000044574)
- protein_expression
normalized protein expression
- protein
protein name with TenPx (e.g. MDAMB468_BREAST_TenPx01)
- protein_id
Protein ID (e.g. sp|P55011|S12A2_HUMAN)
- desc
Description (e.g. S12A2_HUMAN Solute carrier family 12 member 2)
- group_id
Group ID
- uniprot
Uniprot ID (e.g. S12A2_HUMAN)
- uniprot_acc
Uniprot accession ID (e.g. P55011)
- TenPx
TenPx number (e.g. TenPx01)
Details
This data originates from the `protein_quant_current_normalized.csv` file taken from the 20Q2 [Broad Institute](https://depmap.org/portal/download/) cancer depenedency study. The derived dataset found in the `depmap` package features the addition of a foreign key `depmap_id` found in the first column of this dataset, which was added from the `metadata` dataset. This dataset has been converted to a long format tibble. Variables names from the original dataset were converted to lower case, put in snake case, and abbreviated where feasible.
Change log
- 20Q2: Initial dataset consisted of a data frame with 24963776 rows (cell lines) and 12 variables
- 20Q3: no change, no further releases are scheduled at this time.
- 20Q4: no change, no further releases are scheduled at this time.
- 21Q1: no change, no further releases are scheduled at this time.
- 21Q2: no change, no further releases are scheduled at this time.
- 21Q3: no change, no further releases are scheduled at this time.
- 21Q4: no change, no further releases are scheduled at this time.
- 22Q1: no change, no further releases are scheduled at this time.
- 22Q2: no change, no further releases are scheduled at this time.
References
David P. Nusinow, John Szpyt, Mahmoud Ghandi, Christopher M. Rose, E. Robert McDonald III, Marian Kalocsay, Judit Jané-Valbuena, Ellen Gelfand, Devin K. Schweppe, Mark Jedrychowski, Javad Golji, Dale A. Porter, Tomas Rejtar, Y. Karen Wang, Gregory V. Kryukov, Frank Stegmeier, Brian K. Erickson, Levi A. Garraway, William R. Sellers, Steven P. Gygi (2020). Quantitative Proteomics of the Cancer Cell Line Encyclopedia. Cell 180, 2.