The `mutationCalls` dataset contains merged the 22Q2 mutation calls (for coding region, germline filtered) and includes data from 18784 genes, 1771 cell lines, 33 primary diseases and 30 lineages. This dataset can be considered the metadata data set for mutations and does not contain any dependency data. This dataset can be loaded into the R environment with the `depmap_mutationCalls` function.
Format
A data frame with 1235466 rows and 32 variables:
- depmap_id
depmap_id
- gene_name
Hugo Symbol denotes a unique and meaningful name for each gene (e.g. SAP25)
- entrez_id
Gene ID for NCBI Entrez gene database, (e.g. 100316904)
- ncbi_build
NCBI Build (i.e. reference genome)
- chromosome
Chromosome
- start_pos
Gene start position
- end_pos
Gene end position
- strand
Strand location of gene
- var_class
Variant Classification
- var_type
Variant Type
- ref_allele
Reference Allele
- alt_allele
Tumor Seq Allele1
- dbSNP_RS
Single Nucleotide Polymorphism Database (dbSNP) reference cluster
- dbSNP_val_status
dbSNP Val Status
- genome_change
Genome Change
- annotation_transcript
Annotation Transcript
- cDNA_change
change in cDNA
- codon_change
Codon_Change
- protein_change
Protein_Change
- is_deleterious
Status of gene knockout on cell lineage
- is_tcga_hotspot
isTCGAhotspot
- tcga_hsCnt
TCGAhsCnt
- is_cosmic_hotspot
isCOSMIChotspot
- cosmic_hsCnt
COSMIChsCnt
- ExAC_AF
ExAC_AF
- CGA_WES_AC
CGA_WES_AC
- sanger_WES_AC
SangerWES_AC
- RNAseq_AC
RNAseq_AC
- HC_AC
HC_AC
- RD_AC
RD_AC
- WGS_AC
WGS_AC
- var_annotation
Variant_annotation
Details
This data represents the `CCLE_mutations.csv` file taken from the 22Q2 [Broad Institute](https://depmap.org/portal/download/) cancer depenedency study. The derived dataset found in the `depmap` package features the addition of a foreign key `depmap_id` found in the first column of this dataset, which was added from the `metadata` dataset. This dataset has been converted to a long format tibble. Variables names from the original dataset were converted to lower case, put in snake case, and abbreviated where feasible.
Change log
- 19Q1: Initial dataset for package consisted of dataframe with 1243145 rows and 35 variables representing 18755 genes, 1601 cell lines, 37 primary diseases and 33 lineages.
- 19Q2: adds 30 cell lines, 1 primary disease and 1 lineage. This version has different columns than the previous version: the variable "VA_WES_AC" is no longer present in this dataset. Some minor alterations to the original file were made. The first column of the original dataset, (ID, Sample number) was removed, as this column was only the row number and did not serve any unique identifying purpose.
- 19Q3: adds 1 gene, 25 cell lines and removes 1 primary disease.
- 19Q4: adds 1 gene, 10 cell lines, 0 primary diseases and 2 lineages.
- 20Q1: adds 4 genes, 31 cell lines, 1 lineage.
- 20Q2: adds 44 cell lines, 1 lineage.
- 20Q3: no change.
- 20Q4: removes 13 genes, adds 8 cell lines and 1 lineage. Columns `tumor_sample_barcode` and `sanger_recalib_WES_AC` were removed.
- 21Q1: removes 11 genes and 2 cell lines.
- 21Q2: removes 1 genes and adds 3 cell lines.
- 21Q3: removes 3 genes, 4 cell lines and 1 lineage.
- 21Q3: removes 3 genes, 4 cell lines and 1 lineage.
- 21Q4: adds 9 cell lines.
- 22Q1: adds 4 cell lines and 1 lineage. The variable `tumor_seq_allele1` was renamed `alt_allele`.
- 22Q2: adds 12 cell lines and removes 2 primary diseases and 8 lineages.
References
Tsherniak, A., Vazquez, F., Montgomery, P. G., Weir, B. A., Kryukov, G., Cowley, G. S., ... & Meyers, R. M. (2017). Defining a cancer dependency map. Cell, 170(3), 564-576.
DepMap, Broad (2019): DepMap Achilles 19Q1 Public. https://figshare.com/articles/DepMap_Achilles_19Q1_Public/7655150
Robin M. Meyers, Jordan G. Bryan, James M. McFarland, Barbara A. Weir, ... David E. Root, William C. Hahn, Aviad Tsherniak. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nature Genetics 2017 October 49:1779–1784.
Mahmoud Ghandi, Franklin W. Huang, Judit Jané-Valbuena, Gregory V. Kryukov, ... Todd R. Golub, Levi A. Garraway & William R. Sellers. 2019. Next- generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019).