The `metadata` dataset contains the metadata about cell lines in the 22Q2 Broad Institute DepMap release, which includes mapping between `depmap_id` and `cell_line` name for cancer cell lines. This dataset does not contain any data from the Achilles screen nor dependency data, but contains the metadata from the other datasets pertaining to the 22Q1 DepMap release, for 1840 cell lines, 0 genes, 33 primary diseases and 30 lineages. The columns of `metadata` are: `depmap_id`, `stripped_cell_line_name`, `cell_line`, `aliases`, `cosmic_id`, `sanger_id`, `WTSI_master_cell_ID`, `primary_disease`, `subtype_disease`, `sub_subtype_disease`, `gender`, `source` . This dataset can be loaded into the R environment with the `depmap_metadata` function.
Format
A data frame with 1829 rows (cell lines) and 22 variables:
- depmap_id
Cancer cell line primary key (i.e. "ACH-00001")
- stripped_cell_line_name
Name of stripped cell line
- cell_line
CCLE name of cancer cell line (i.e. "184A1_BREAST")
- cell_line_name
Abbreviated name of cancer cell line (i.e. "NIH:OVCAR-3")
- aliases
Aliases of cancer cell line
- cosmic_id
Catalogue Of Somatic Mutations In Cancer ID number (e.g. 905933)
- sex
Sex of tissue sample)
- source
Source of tissue sample)
- culture_type
Culture type of tissue sample)
- RRID
Resource Identification Portal ID
- sample_collection_site
Site of sample collection (AML), M3 (Promyelocytic))
- primary_or_metastasis
Primary cancer cell line or metastatic
- primary_disease
Primary Disease (e.g. cancer type)
- subtype_disease
Subtype Disease (e.g. Acute Myelogenous Leukemia)
- age
Age of individual sample of cell line was derived
- sanger_id
Sanger ID (eg. 2201)
- WTSI_master_cell_ID
Wellcome Trust Sanger Institute ID (eg. 1369)
- additional_info
Additional information about samples
- lineage
Lineage of cancer cell line
- lineage_subtype
Subtype of lineage of cancer cell line
- lineage_sub_subtype
Subtype of subtype of Lineage of cancer cell line
- lineage_molecular_subtype
Molecular type of Lineage of cancer cell line
- model_manipulation
Culture model manipulation details
- model_manipulation_details
Culture model manipulation details
- patient_id
Patient id
- parent_patient_id
Parent patient id
- Cellosaurus_NCIt_disease
Cellosaurus NCIt disease
- Cellosaurus_NCIt_id
Cellosaurus NCIt_id
- Cellosaurus_NCIt_id
Cellosaurus NCIt_id
Details
This data represents the `sample_info.csv` file taken from the 22Q2 [Broad Institute](https://depmap.org/portal/download/) cancer depenedency study. This dataset features the a primary key `depmap_id` which is a unique ID given to each cell line and is found in the first column of this dataset. The `depmap_id` attribute is used as a foreign key in all other datasets in the package. This dataset has been converted to a long format tibble. This dataset does not contain any expression or dependency data but rather contains the metadata for all cancer cell lines used in the depmap project. Variables names were converted to lower case, put in snake case, and abbreviated where feasible (e.g. "Sanger ID" was changed to "sanger_id").
Change log
- 19Q1: Initial dataset consisted of data frame with 1677 rows (cell lines) and 9 variables, representing 0 genes, 1677 cell lines, 38 primary diseases and 33 lineages
- 19Q2: adds 37 new cell lines, 1 primary disease and 1 lineage. This version of the metadata dataset contains 6 variables not found in previous versions, relating the the Achilles metadata: `Achilles_n_replicates`, `cell_line_NNMD`, `culture_type`, `culture_medium`, and `cas9_activity`.
- 19Q3: adds 30 cell lines, 2 primary diseases and 2 lineages
- 19Q4: adds 42 cell lines, 0 primary diseases and 3 lineages
- 20Q1: adds 19 cell lines, `gender` was changed to `sex`, `age`, `primary_or_metastasis` and `sample_collection_site“ were added
- 20Q2: adds 30 cell lines and 1 lineage
- 20Q3: adds new column `WTSI_master_cell_ID`
- 20Q4: adds 6 cell lines and 1 lineage. Adds column `cell_line_name`
- 21Q1: removes 1 cell line
- 21Q2: adds 3 cell lines
- 21Q3: adds 1130 cell lines, 8 primary diseases and 8 lineages
- 21Q4: removes 1119 cell lines, 8 primary diseases and 8 lineages
- 22Q1: adds 4 cell lines. The features relating to Achilles metadata have been removed and put into their own dataset: `Achilles_n_replicates`, `cell_line_NNMD`, `culture_type`, `culture_medium`, and `cas9_activity`.
- 22Q2: adds 11 cell lines and removes 2 primary diseases and 30 lineages. The feature `culture_type` has been removed and columns "model_manipulation", "model_manipulation_details", "patient_id", "parent_depmap_id", "Cellosaurus_NCIt_disease", "Cellosaurus_NCIt_id" and "Cellosaurus_issues" have been added.
References
Tsherniak, A., Vazquez, F., Montgomery, P. G., Weir, B. A., Kryukov, G., Cowley, G. S., ... & Meyers, R. M. (2017). Defining a cancer dependency map. Cell, 170(3), 564-576.
DepMap, Broad (2019): DepMap Achilles 19Q1 Public. https://figshare.com/articles/DepMap_Achilles_19Q1_Public/7655150
Robin M. Meyers, Jordan G. Bryan, James M. McFarland, Barbara A. Weir, ... David E. Root, William C. Hahn, Aviad Tsherniak. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nature Genetics 2017 October 49:1779–1784.
Mahmoud Ghandi, Franklin W. Huang, Judit Jané-Valbuena, Gregory V. Kryukov, ... Todd R. Golub, Levi A. Garraway & William R. Sellers. 2019. Next- generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019).