18.04.04 Population genotyping - studies and relevance Flashcards
Why is it important to study genetic variation in populations?
Although any two unrelated people share ~99.9% of DNA sequence, the remaining 0.1% contains genetic variants that influence how people differ in their risk of disease or response to drugs. Discovering DNA sequence variants that contribute to disease risk offers opportunity to understand complex causes of common diseases.
Population genotyping studies also provide invaluable data for geneticists in a clinical diagnostic setting; when looking for the causative/pathogenic variant in a patient with a given phenotype, these studies make it possible to rule out variants (SNPs or CNVs) that are common in relevant populations.
Also, population genotyping studies provide resources to genome-wide association studies, facilitating identification of new disease genes and associated syndromes/phenotypes.
Describe the aim of the 1000 genomes project.
Ran 2008-2015.
Aim: to find 95% of genetic variation that exists >1% frequency
Generate a resource to support GWAS and other medical research studies
Which populations were involved in the 1000s genomes project?
Final data set includes populations of E. Asian, S. Asia, African, European and American ancestry.
What level of sequencing was performed in the 1000 genomes project?
Sequenced at a level of ~4x. Limited number of haplotypes at a given locus and samples combined to allow efficient detection of variants in region.
Overall coverage figures:
Whole genome data - low (4%)
Full exome data >20x
The 1000 genomes project largely replaced the information gained through the HapMap project. Provide a brief overview of the HapMap project.
- Most chromosome regions have only a few common haplotypes, which account for most of the variation from person to person in a population.
- HapMap is a catalogue of common genetic variants. It describes the nature of the variant and how it is distributed among people both within populations, and between populations in different parts of the world
3) HapMap does not make any links between variants and disease. It was designed to provide information that other researchers can use to link genetic variants to the risk for specific illnesses.#
4) HapMap data identified >150 risk loci in studies of over 60 diseases
5) HapMap data included in dbSNP
What was the aim of the NHLBI Exome sequencing project?
The goal of the NHLBI GO Exome Sequencing Project (ESP) is to discover novel genes and mechanisms contributing to heart, lung and blood disorders by pioneering the application of next-generation sequencing of the protein coding regions of the human genome across diverse, richly-phenotyped populations and to share these datasets and findings with the scientific community to extend and enrich the diagnosis, management and treatment of heart, lung and blood disorders
Which populations were included in the NHLBI prESP project?
Women’s Health Initiative
atherosclerosis and cardiovascular cohorts
Framingham heart study
Jackson heart study
Multiple lung cohorts, cystic fibrosis cohort, etc: See the hyperlink above for a complete list.
6503 samples in total - 2203AA, 4300 EA
What is ExAC?
The Exome Aggregation Consortium (ExAC) is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a wide variety of large-scale sequencing projects, and to make summary data available for the wider scientific community.
Includes data from 60,706 unrelated individuals, inc. participants from 1000GP and ESP
What is gnomAD?
Genome Aggregation Database. Updated version of ExAC. Includes >5000 ASJ
123,136 exomes
15,496 genomes from unrelated indivdauls
Individuals affected with childhood-onset disease removed.
As it is a combination of datasets that have used different chemistries coverage is variable between sites.
What is DGV?
The Database of Genomic Variants is an online resource which provides a publically available comprehensive catalogue of structural variation (SV), larger than 50bp, found in the genomes of healthy controls.
Data from 67 studies, (array and NGS) = >2.5million entries across >22,300 genomes
What is COSMIC?
Catalogue of somatic mutations in cancer
All cancers arise as a result of the acquisition of a series of fixed DNA sequence abnormalities, mutations, many of which ultimately confer a growth advantage upon the cells in which they have occurred. COSMIC is designed to store and display somatic mutation information and related details and contains information relating to human cancers
Facilitates finding novel drivers in cancer.
Has expert curated data and systematic genome-wide screen data for unbiased molecular profiling.
What is the cancer genome atlas?
TCGA collects and analyses high quality tumour samples, and provides information on clinical information, sample metadata (e.g. weight of a sample portion), histopathology slide images, and molecular information derived from the samples (e.g. mRNA/miRNA expression, protein expression, copy number).
TCGA is also attempting to include high quality non-tumour samples in some assays, with the goal of analysing every patient’s germline DNA to establish which abnormalities detected in a tumour sample are peculiar to the oncogenic process.
What are the ‘CHIP’ studies?
Clonal Haematopoiesis of Indeterminate Potential (CHIP) studies
Investigate the incidence of developing somatic mutations ordinarily associated with haematological cancer, but in the absence of disease (seems to occur with age). Ongoing studies to determine the risk of having these mutations, and at which point and which factors influence these mutations in triggering disease