Two tibbles are provided that give access to DepMap data, as shared by the Broad Institute's DepMap project on Figshare (https://figshare.com/authors/Broad_DepMap/5514062).
- The [dmsets()] function returns a tibble with DepMap datasets. Each dataset is described by its title, its unique identifier, the number of files it contains, the Figshare URL, and a `DepMapDataset` object that contains further details of the dataset.
- The [dmfiles()] function returns a tibble with DepMap files. Each file is described by its dataset identifier, its own unique identifier, its name, size (in bytes), a download URL, md5 hash and mime type.
- Depmap data files can be downloaded with the [dmget()] function, that takes as input a tibble or data.frame of depmap files such as `dmfiles`. Files are downloaded and automatically in the package's central cache. See [dmCache()].
Usage
DepMapDataset(id)
DepMapFiles(x)
dmFileNames(object)
dmTitle(object)
dmNumFiles(object)
dmget(dmtab, cache = dmCache())
dmfiles()
dmsets()
Arguments
- id
`numeric()` with one or multiple DepMap dataset identifier(s). Note that `id` is converted to an integer. Missing values are not permitted.
- x
either an `numeric()` that will be passed to `DepMapDataset` or an instance (or list of) `DepMapDataset`.
- object
an instance of class `DepMapDataset`.
- dmtab
A `tibble` or `data.frame` containing the file to be downloaded, such as [dmfiles()], or created by [DepMapFiles()]. If is expected to contain the `"name"`, `"id"` and `"dowload_url"` variables.
- cache
Object of class [BiocFileCache()]. Default is to use the central `depmap` cache returned by [dmCache()], but users can use their own cache.
Details
The `DepMapDataset` class stores the informtion describing a depmap dataset, as stored on Figshare (articles, as it's called there). The [DepMapDataset()] constructor requires one or multiple dataset identifiers and returns one or a list of instances.
The following accessors are available: - [dmFileNames()] returns the dataset's filenames. - [dmTitle()] returns the dataset's title. - [dmNumFiles()] returns the number of files in the dataset.
(These are used to construct the main depmap dataset tibble.)
A tibble describing the files in depmap dataset can be cretated with the [DepMapFiles()] function. It either takes one or multiple dataset idenifiers, or one or a list of `DepMapDataset` instances
The [DepMapDataset()] and [DepMapFiles()] functions are mostly used internally, to create the `dmsets` and `dmfiles` tibbles. If a more recent dataset is available on Figshare and not (yet) in the `depmap` package, a user might create the depmap files table to download the files, and/or open a [GitHub issue](https://github.com/UCLouvain-CBIO/depmap) for the new data to be added by the maintainer(s).
All the information is retrieved from Figshare using their API, as described at https://docs.figshare.com.
Adding new datasets
Adding new datasets is simple. Once a new dataset (or Article, as called on Figshare) has been identified on the Broad Institute's [DepMap project on Figshare](https://figshare.com/authors/Broad_DepMap/5514062), one needs to add the dataset's URL to the `depmapURLs` vector in [`inst/extdata/make-dmfiles.R`](https://github.com/UCLouvain-CBIO/depmap/blob/master/inst/scripts/make-dmfiles.R), and re-run the script to update the `dmsets.rds.` and `dmfiles.rds` files in `inst/extdata`.
Feel free to send a GitHub pull request or open a [GitHub issue](https://github.com/UCLouvain-CBIO/depmap) for the new data to be added by the maintainer(s).
Examples
## The depmap datasets
dmsets
#> function ()
#> readRDS(dir(system.file("extdata", package = "depmap"), pattern = "dmsets.rds",
#> full.names = TRUE))
#> <bytecode: 0x55659413c4f0>
#> <environment: namespace:depmap>
## The depmap files
dmfiles
#> function ()
#> readRDS(dir(system.file("extdata", package = "depmap"), pattern = "dmfiles.rds",
#> full.names = TRUE))
#> <bytecode: 0x55659448fb40>
#> <environment: namespace:depmap>
############################################################
## Mostly for internal use, or to update/generate the depmap
## dataset and files tables.
## One dataset identifier: 24667905
my_dmset <- DepMapDataset(24667905)
my_dmset
#> Title: DepMap 23Q4 Public
#> Id: 24667905
#> License: CC BY 4.0
#> Use `DepMapFiles()` to list 56 files
## Multiple dataset identifiers
my_dmsets <- DepMapDataset(c(24667905, 22765112))
my_dmsets
#> [[1]]
#> Title: DepMap 23Q4 Public
#> Id: 24667905
#> License: CC BY 4.0
#> Use `DepMapFiles()` to list 56 files
#>
#> [[2]]
#> Title: DepMap 23Q2 Public
#> Id: 22765112
#> License: CC BY 4.0
#> Use `DepMapFiles()` to list 52 files
#>
## Create the files table from one or dataset multiple dataset
## identifiers
DepMapFiles(24667905)
#> # A tibble: 56 × 7
#> dataset_id id name size download_url md5 mimetype
#> <int> <int> <chr> <dbl> <chr> <chr> <chr>
#> 1 24667905 43347678 README.txt 2.91e4 https://ndo… 4d2d… text/pl…
#> 2 24667905 43346361 AchillesCommonEssenti… 1.70e4 https://ndo… 1cbf… text/pl…
#> 3 24667905 43346367 AchillesHighVarianceG… 7.07e3 https://ndo… 3ac0… text/pl…
#> 4 24667905 43346370 AchillesNonessentialC… 1.15e4 https://ndo… 9b21… text/pl…
#> 5 24667905 43346379 AchillesScreenQCRepor… 3.16e5 https://ndo… c5dd… text/pl…
#> 6 24667905 43346382 AchillesSequenceQCRep… 4.37e5 https://ndo… 6ade… text/pl…
#> 7 24667905 43346391 AvanaGuideMap.csv 1.60e7 https://ndo… b694… text/pl…
#> 8 24667905 43346409 AvanaLogfoldChange.csv 3.17e9 https://ndo… 58b1… text/pl…
#> 9 24667905 43346505 AvanaRawReadcounts.csv 9.70e8 https://ndo… a7b5… text/pl…
#> 10 24667905 43346574 CRISPRGeneDependency.… 3.94e8 https://ndo… b581… text/pl…
#> # ℹ 46 more rows
DepMapFiles(my_dmset)
#> # A tibble: 56 × 7
#> dataset_id id name size download_url md5 mimetype
#> <int> <int> <chr> <dbl> <chr> <chr> <chr>
#> 1 24667905 43347678 README.txt 2.91e4 https://ndo… 4d2d… text/pl…
#> 2 24667905 43346361 AchillesCommonEssenti… 1.70e4 https://ndo… 1cbf… text/pl…
#> 3 24667905 43346367 AchillesHighVarianceG… 7.07e3 https://ndo… 3ac0… text/pl…
#> 4 24667905 43346370 AchillesNonessentialC… 1.15e4 https://ndo… 9b21… text/pl…
#> 5 24667905 43346379 AchillesScreenQCRepor… 3.16e5 https://ndo… c5dd… text/pl…
#> 6 24667905 43346382 AchillesSequenceQCRep… 4.37e5 https://ndo… 6ade… text/pl…
#> 7 24667905 43346391 AvanaGuideMap.csv 1.60e7 https://ndo… b694… text/pl…
#> 8 24667905 43346409 AvanaLogfoldChange.csv 3.17e9 https://ndo… 58b1… text/pl…
#> 9 24667905 43346505 AvanaRawReadcounts.csv 9.70e8 https://ndo… a7b5… text/pl…
#> 10 24667905 43346574 CRISPRGeneDependency.… 3.94e8 https://ndo… b581… text/pl…
#> # ℹ 46 more rows
DepMapFiles(c(24667905, 22765112))
#> # A tibble: 108 × 7
#> dataset_id id name size download_url md5 mimetype
#> <int> <int> <chr> <dbl> <chr> <chr> <chr>
#> 1 24667905 43347678 README.txt 2.91e4 https://ndo… 4d2d… text/pl…
#> 2 24667905 43346361 AchillesCommonEssenti… 1.70e4 https://ndo… 1cbf… text/pl…
#> 3 24667905 43346367 AchillesHighVarianceG… 7.07e3 https://ndo… 3ac0… text/pl…
#> 4 24667905 43346370 AchillesNonessentialC… 1.15e4 https://ndo… 9b21… text/pl…
#> 5 24667905 43346379 AchillesScreenQCRepor… 3.16e5 https://ndo… c5dd… text/pl…
#> 6 24667905 43346382 AchillesSequenceQCRep… 4.37e5 https://ndo… 6ade… text/pl…
#> 7 24667905 43346391 AvanaGuideMap.csv 1.60e7 https://ndo… b694… text/pl…
#> 8 24667905 43346409 AvanaLogfoldChange.csv 3.17e9 https://ndo… 58b1… text/pl…
#> 9 24667905 43346505 AvanaRawReadcounts.csv 9.70e8 https://ndo… a7b5… text/pl…
#> 10 24667905 43346574 CRISPRGeneDependency.… 3.94e8 https://ndo… b581… text/pl…
#> # ℹ 98 more rows
DepMapFiles(my_dmsets)
#> # A tibble: 108 × 7
#> dataset_id id name size download_url md5 mimetype
#> <int> <int> <chr> <dbl> <chr> <chr> <chr>
#> 1 24667905 43347678 README.txt 2.91e4 https://ndo… 4d2d… text/pl…
#> 2 24667905 43346361 AchillesCommonEssenti… 1.70e4 https://ndo… 1cbf… text/pl…
#> 3 24667905 43346367 AchillesHighVarianceG… 7.07e3 https://ndo… 3ac0… text/pl…
#> 4 24667905 43346370 AchillesNonessentialC… 1.15e4 https://ndo… 9b21… text/pl…
#> 5 24667905 43346379 AchillesScreenQCRepor… 3.16e5 https://ndo… c5dd… text/pl…
#> 6 24667905 43346382 AchillesSequenceQCRep… 4.37e5 https://ndo… 6ade… text/pl…
#> 7 24667905 43346391 AvanaGuideMap.csv 1.60e7 https://ndo… b694… text/pl…
#> 8 24667905 43346409 AvanaLogfoldChange.csv 3.17e9 https://ndo… 58b1… text/pl…
#> 9 24667905 43346505 AvanaRawReadcounts.csv 9.70e8 https://ndo… a7b5… text/pl…
#> 10 24667905 43346574 CRISPRGeneDependency.… 3.94e8 https://ndo… b581… text/pl…
#> # ℹ 98 more rows