19.04.04 Population genotyping Flashcards
1
Q
Examples of population studies
A
1) 100K project
2) HapMap project
3) Exome sequencing project
4) ExAC
5) GnomAD
6) Database of Genomic Variants (DGV)
7) COSMIC
8) Cancer Genome Atlas
2
Q
Why study genetic variation?
A
- Common diseases are caused by many genes and environmental factors
- Genetic variation determines a person genetic risk of certain conditions
- Finding associations between genetic variation and disease is very important
- Population genotyping provides info regarding ruling out or in pathogenicity
3
Q
1000 Genomes project
A
- Goal was to find most (>95%) genetic variants with frequencies of at least 1% in the population
- Also to define haplotype structure in the human genomes and to develop sequence analysis methods and tools
- Plan was to sequence genome wide at low coverage (x4) and exome at deep coverage (x20)
- Three sub-phases
1) low coverage and exome data analysis. on 1092 individuals (across 14 populations)
2) Expanded set, sequencing of 1722 individuals (across 19 populations)
3) 2535 individuals, across 26 populations (around 60-100 people per populations) - Identified over 79 million variants across the whole spectrum of SNVs and CNVs
- Results are available in DGV (for CNVs) and ensembl (for SNVs)
- Data is also being used to understand MIT variants further (inheritance and mosaicism)
- Cell lines also established to allow researchers to study cellular phenotypes
4
Q
Haplotype Mapping (HapMap) Project
A
- Alleles of SNPs that are close together tend to be inherited together
- Set of associated SNP alleles in chromosomal region is called a haplotype
Most chromosome regions only have a few common haplotypes (and this accounts for the majority of variation from person to person) - HAPMAP is a catalogue of common genetic variants - although it does not make any links between variants and disease (just provides info that other researchers can use to make a connection)
- It identified SNPs in multiple individuals, recorded the haplotypes and then identified the ‘tag’ SNPs in each haplotype
- The SNP tags can then be tested in each person to identify their collection of haplotypes
- Studies using HAPMAP data has identified over 150 risk loci for 60 common diseases
- HAPMAP data is included in dbSNP
5
Q
NHLBI Exome Sequencing Project (ESP) Exome Variant Server
A
- Goal is to discover novel genes and mechanisms contributing to heart, lung and blood disorders using NGS
- Final dataset includes 2203 African-Americans and 4300 European-Americans, and had control samples, the extremes of specific traits (e.g. blood pressure) and specific diseases (e.g. early onset stroke, lung diseases)
6
Q
ExAC
A
- Exome Aggregation Consortium
- Goal was to combine large-scale exome sequencing datato make it available to the wider community
- Dataset includes over 60K people (includes 1000 Genomes and ESP data)
- Now also includes CNV calls
- Can search by gene, transcript, variant or region
7
Q
GnomAD
A
- Updated version of ExAC - includes genomes data as well as allele frequencies from 5000 Ashkenazi Jewish individuals
- Brings together exome (123,000 people) and genome (15,000 people) data
- People with sever paediatric disease (and their first degree relatives) have been removed, but does contain late onset and reduced penetrance variants
- All data (gathered from multiple projects) has all been analysed in the same bioinformatic manner to increase consistency, however the sequencing platforms do differ so coverage across samples is different
8
Q
Database of Genomic Variants (DGV)
A
- online resource with freely available structural variation (larger than 50bp) found in healthy controls
- Currently it holds data from 67 studies (mostly array and NGS based studies) giving a total of over 2.5million entries from over 22k genomes
9
Q
COSMIC
A
- Catalogue of somatic mutations in cancer
- Designed to store and display somatic mutation info (not germline mutations)
- Contains expert manual curation data (to give accurate frequency data), and systematic genome-wide screen data (providing unbiased molecular profiling of diseases to find driver genes in cancer)
10
Q
The Cancer Genome Atlas
A
- Contains clinical info, genomic characterization, and sequence analysis of tumour genomes
- Covers a broad range of phenotypes, but groups data by cancer type, histopathology slides, and molecular info