TCGA data
tcga.Rd
The Cancer Genome Atlas (TCGA) is a collaboration between the National
Cancer Institute (NCI) and the National Human Genome Research Institute
(NHGRI) that has generated multi-omics analyses (genomic, transcriptomic,
proteomic and epigenetic) in 33 types of cancer.
RNAseq and clinical data analysed here come from LUAD (lung adenocarcinoma)
tumors and corresponding patients.
TCGA clinical and RNAseq expression data extracted from the
curatedTCGAData
package. See inst/scripts/tcga.R
for
details.
Format
expression
: RNA expression data frame with 570 observations on
the following 8 variables.
sampleID
a factor
patient
a character vector
type
a character vector
A1BG
a numeric vector
A1CF
a numeric vector
A2BP1
a numeric vector
A2LD1
a numeric vector
A2ML1
a numeric vector
clinical1
: clinical data for 516 observations on the following
15 variables.
patientID
a character vector
tumor_tissue_site
a character vector
gender
a character vector
age_at_diagnosis
a numeric vector
vital_status
a numeric vector
days_to_death
a numeric vector
days_to_last_followup
a numeric vector
pathologic_stage
a character vector
pathology_T_stage
a character vector
pathology_N_stage
a character vector
pathology_M_stage
a character vector
smoking_history
a character vector
number_pack_years_smoked
a numeric vector
year_of_tobacco_smoking_onset
a numeric vector
stopped_smoking_year
a numeric vector
clinical2
: small clinical data with 516 observations on the
following 3 variables.
patientID
a character vector
gender
a character vector
years_at_diagnosis
a numeric vector
A clinical summary data with 2 observations on the following 3 variables.
gender
a character vector
- current smoker
a numeric vector
- lifelong non-smoker
a numeric vector
In addition, the clinical1.csv
and expression.csv
function return the paths to these respective comma-separated value
spreadsheets. The expressions.csv
function returns the path to
the expression data split by gene.