09/22/2014
Customer asks AccuraScience to investigate what publicly available datasets might be assembled to study differential occurrences of germ-line mutations between cancer patients and healthy individuals, for a few dozen genes. He is particularly interested in breast, bladder and lung cancer. Whereas data from a thousand genomes project and quite a few publicly available whole exome sequencing data for healthy donors can be used as control, the difficulty is to identify genotype or raw resequencing data for adequate number of cancer patients. Although TCGA likely contains controlled access data that would serve his purposes, he would like us to explore a quicker solution without the need to go through the application/approval process for TCGA access.
Thu, 09/25/2014 at 5:28 PM
%AccuraScience LB: Here is the result of our exploration: (1) The most "famous" cancer sequence resources, including ICGC, TCGA and dbGap, all have controlled access to their data, and require an application/approval process that we discussed over phone.
(2) COSMIC documents mostly somatic mutation information. However, the Cancer Gene Census project underneath COSMIC (http://cancer.sanger.ac.uk/cancergenome/projects/census/) provides some germline mutation and cancer type information. But it does not provide information about germline mutations within genes - and the latter information needs to be obtained through reading of related original articles.
(3) Locus-Specific Databases in Cancer and similar resources document information about germline mutations vs somatic mutations within genes. Might be worth further exploration.
(4) SRA contains some exome, targeted resequencing data and even whole-genome sequencing data for cancer patients, including some from non-tumor samples of tumor patients (which corresponds to germline mutation information, though not necessarily from blood-drawing samples). Summarized as follows:
(i) Breast cancer: 5 exome projects, about 50 samples; targeted resequencing: 1 project: 77 samples,
(ii) Bladder cancer: 2 exome projects, about 60 samples,
(iii) Lung cancer: 1 whole-genome sequencing project, 2 samples.
(5) EBI contains some similar sequencing data, summarized as follows:
(i) Breast cancer: 2 projects (1 of which overlaps with SRA), 20 potentially usable samples,
(ii) Bladder cancer: none,
(iii) Lung cancer: 2 exome projects, about 30 usable samples.
(6) Personal Genome Project,
(i) Breast cancer: 4 individuals genomic information,
(ii) Bladder cancer: none,
(iii) Lung cancer: 1 individual's genomic information.
In summary, it might be possible to obtain the data of a total of 230 samples, give or take, for all 3 cancer types in the public domain.
Back to Other Selected Recent Inquiries
Note: LB stands for Lead Bioinformatician. An AccuraScience LB is a senior bioinformatics expert and leader of an AccuraScience data analysis team.
Disclaimer: This text was selected and edited based on genuine communications that took place between a customer and AccuraScience data analysis team at specified dates and times. The editing was made to protect the customer’s privacy and for brevity. The edited text may or may not have been reviewed and approved by the customer. AccuraScience is solely responsible for the accuracy of the information reflected in this text.