genome variation Flashcards
1
Q
changes at the DNA level
A
- substitutions
- deletions
- duplications
- insertions
- inversion
- translocations
- complex changes
2
Q
changes at the RNA level
A
- RNA processing/initiation
- promoter/initiation site
- affects amount of RNA
- splicing
- abolition of splice site
- activation of intronic splice site producing a pseudoexon
- large deletions
- e.g. part of promoter
3
Q
changes at protein level
A
- silent/synonymous changes
- same amino acid
- substitution, deletion, insertion
- missense (different amino acid)
- nonsense (stop codon)
- duplication, insertion, translocation
- complex rearrangements
4
Q
importance of genomic variation
A
- cause of disease
- risk of developing disease
- especially with multiple variants
- response to drugs
5
Q
1000 genomes project
A
- DNA of 1000 healthy adults sequenced
- 14 populations
- ancestry from all continents
- low coverage whole genome, and targeted deep exome sequence data
- x4 coverage
6
Q
discovery aims of HGP
A
- single nucleotide variants with frequency >1%
- single nucleotide variants with frequency 0.1-0.5% in gene regions
- structural variants
- CNVs, insertions/deletions, inversions
- estimated frequencies of variant alleles
7
Q
genetic variant databases
A
- gene specific
- p53 gene
- disease specific
- cystic fibrosis
- large scale
- dbSNP
8
Q
dbSNP
A
- database of short sequence variations
- all organisms
- single nucelotide substitutions, insertions/deletions
- any variations >50 nt in separate dbVAR database
9
Q
dbSNP information
A
- submissions (public and private)
- validation
- evidence used
- refSNP ID number for each submission if more than one
- ancestral allele
- by comparison between human and chimp
- links to other relevant resources
- molecular and functional consequences
- computed or submitted
- global MAF
- association studies from GWAS
10
Q
MAF
A
- minor allele frequency
- default global population frequency
- based on 2500 individuals worldwide from phase 3 of 1000 GP
11
Q
whole exome sequencing
A
- all exons of protein coding genes sequenced (1% of genome)
- useful for identifying novel rare disease variants
- results analysed using large databases of known variants
- ExAC
12
Q
ExAC
A
- exome aggregation consortium database
- exome sequences of >60,000 unrelated individuals
- from many disease-specific and population genetics studies
- adult-onset diseases
- no homozygous variants causing childhood-onset mendelian diseases
- gives allele frequency
- 2014
13
Q
ClinVar
A
- combines information about genomic variation and relates to human health
- germline and somatic variants
- clinical significance from submitters
- from all submitted records for same variant
- indicates consensus or conflict
- interpretation for >200,000 variants
14
Q
OMIM
A
- online mendelian inheritance in man
- catalogue of human genes and mendelian genetic disorders
- 6000 phenotypes with known molecular basis
- 4000 genes with phenotype-causing mutations
15
Q
humsavar
A
- human polymorphisms and disease mutations
- single amino acid variants only (missense)
- 75,000 in total
- disease variants
- polymorphisms
- unclassified variants
- links to other databases
- uniprot, dbSNP, OMIM
16
Q
COSMIC
A
- catalogue of somatic mutations in cancer
- >4 mill coding mutations reported
- combined genome-wide results from >28,000 tumours
- displays distribution of types of variants for each gene
- tissue specific variant information
17
Q
in silico prediction
A
- distinguish between disease-causing and neutral variants
- expensive to do wet lab work
- interpret large datasets of rare variants
- prioritisation and characterisation
- requires knowledge of amino acid sequence and effects of variants
- mutability
18
Q
amino acid mutability
A
- from 1000 GP
- arginine most mutable
- 4/6 codons encoding arg have CpG
- leucine low mutability
- no codons with CpG dinucleotides
- CpG mutates at higher rate than other dinucleotides
- more complex amino acids have lowest mutabilities
- trp, phe
19
Q
in silico prediction programs
A
- >30 available for amino acid substitutions
- for missense:
- sequence based methods
- e.g. SIFT
- sequence and structure based combinations
- e.g. PolyPhen-2
- sequence based methods
20
Q
SIFT
A
- sorting intolerant from tolerant
- align homologous sequences to query
- identify conserved residues unlikely to tolerate substitutions
- PSI-BLAST to create MSA of similar sequences
- calculate probability of any of the 20 amino acids at each position
- calculate output score
- probability for each of the 19 possible substitutions at each position in the aligned target protein
- 0.05 or less indicates deleterious
21
Q
PolyPhen-2
A
- polymorphism phenotyping
- structures and annotations of residues improve prediction accuracy
- also qualitative assessment
- benign, possibly damaging, probably damaging
- compare WT allele properties to mutant
- evaluate sequence conservation by aligning homologous sequences
- predicts effect of single substitution or large number (batch mode)
- also database search of already computed predictions
22
Q
HumDiv
A
- default PolyPhen-2 classifier
- 3000 damaging alleles causing human mendelian disease (uniprot)
- 6000 differences between human proteins and close mammalian homologs assumed as non-damaging
23
Q
HumVar
A
- another classifier for PolyPhen-2
- 13000 human disease-causing mutation (uniprot)
- 9000 human non-synonymous amino acid substitutions with no annotated disease involvement (non-damaging)
- more appropriate for mendelian disease associated substitutions
- drastic effect
24
Q
SAAPs
A
- single amino acid polymorphism data analysis
- SAAPdap (predictor)
- SAAPpred (predictor)
- analyses liekly structural effect of amino acid substitution
- considers residue conservation
- requires experimental structure
25
Q
in silico prediction
focus on amino acid susbstitutions
A
- many changes are very subtle
- difficult to identify in wet lab work
- easier to study protein than non-coding regions of DNA
- improve understanding of how mutation affects protein