genome variation Flashcards

1
Q

changes at the DNA level

A
  • substitutions
  • deletions
  • duplications
  • insertions
  • inversion
  • translocations
  • complex changes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

changes at the RNA level

A
  • RNA processing/initiation
    • promoter/initiation site
    • affects amount of RNA
  • splicing
    • abolition of splice site
    • activation of intronic splice site producing a pseudoexon
  • large deletions
    • e.g. part of promoter
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

changes at protein level

A
  • silent/synonymous changes
    • same amino acid
  • substitution, deletion, insertion
  • missense (different amino acid)
  • nonsense (stop codon)
  • duplication, insertion, translocation
  • complex rearrangements
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

importance of genomic variation

A
  • cause of disease
  • risk of developing disease
    • especially with multiple variants
  • response to drugs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

1000 genomes project

A
  • DNA of 1000 healthy adults sequenced
  • 14 populations
  • ancestry from all continents
  • low coverage whole genome, and targeted deep exome sequence data
  • x4 coverage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

discovery aims of HGP

A
  • single nucleotide variants with frequency >1%
  • single nucleotide variants with frequency 0.1-0.5% in gene regions
  • structural variants
    • CNVs, insertions/deletions, inversions
  • estimated frequencies of variant alleles
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

genetic variant databases

A
  • gene specific
    • p53 gene
  • disease specific
    • cystic fibrosis
  • large scale
    • dbSNP
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

dbSNP

A
  • database of short sequence variations
  • all organisms
  • single nucelotide substitutions, insertions/deletions
  • any variations >50 nt in separate dbVAR database
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

dbSNP information

A
  • submissions (public and private)
  • validation
    • evidence used
    • refSNP ID number for each submission if more than one
  • ancestral allele
    • by comparison between human and chimp
  • links to other relevant resources
  • molecular and functional consequences
    • computed or submitted
  • global MAF
  • association studies from GWAS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

MAF

A
  • minor allele frequency
  • default global population frequency
  • based on 2500 individuals worldwide from phase 3 of 1000 GP
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

whole exome sequencing

A
  • all exons of protein coding genes sequenced (1% of genome)
  • useful for identifying novel rare disease variants
  • results analysed using large databases of known variants
    • ExAC
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

ExAC

A
  • exome aggregation consortium database
  • exome sequences of >60,000 unrelated individuals
    • from many disease-specific and population genetics studies
  • adult-onset diseases
  • no homozygous variants causing childhood-onset mendelian diseases
  • gives allele frequency
  • 2014
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

ClinVar

A
  • combines information about genomic variation and relates to human health
  • germline and somatic variants
  • clinical significance from submitters
    • from all submitted records for same variant
    • indicates consensus or conflict
  • interpretation for >200,000 variants
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

OMIM

A
  • online mendelian inheritance in man
  • catalogue of human genes and mendelian genetic disorders
  • 6000 phenotypes with known molecular basis
  • 4000 genes with phenotype-causing mutations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

humsavar

A
  • human polymorphisms and disease mutations
  • single amino acid variants only (missense)
  • 75,000 in total
    • disease variants
    • polymorphisms
    • unclassified variants
  • links to other databases
    • uniprot, dbSNP, OMIM
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

COSMIC

A
  • catalogue of somatic mutations in cancer
  • >4 mill coding mutations reported
  • combined genome-wide results from >28,000 tumours
  • displays distribution of types of variants for each gene
  • tissue specific variant information
17
Q

in silico prediction

A
  • distinguish between disease-causing and neutral variants
    • expensive to do wet lab work
  • interpret large datasets of rare variants
    • prioritisation and characterisation
  • requires knowledge of amino acid sequence and effects of variants
    • mutability
18
Q

amino acid mutability

A
  • from 1000 GP
  • arginine most mutable
    • 4/6 codons encoding arg have CpG
  • leucine low mutability
    • no codons with CpG dinucleotides
  • CpG mutates at higher rate than other dinucleotides
  • more complex amino acids have lowest mutabilities
    • trp, phe
19
Q

in silico prediction programs

A
  • >30 available for amino acid substitutions
  • for missense:
    • sequence based methods
      • e.g. SIFT
    • sequence and structure based combinations
      • e.g. PolyPhen-2
20
Q

SIFT

A
  • sorting intolerant from tolerant
  • align homologous sequences to query
  • identify conserved residues unlikely to tolerate substitutions
  • PSI-BLAST to create MSA of similar sequences
  • calculate probability of any of the 20 amino acids at each position
  • calculate output score
    • probability for each of the 19 possible substitutions at each position in the aligned target protein
    • 0.05 or less indicates deleterious
21
Q

PolyPhen-2

A
  • polymorphism phenotyping
  • structures and annotations of residues improve prediction accuracy
  • also qualitative assessment
    • benign, possibly damaging, probably damaging
  • compare WT allele properties to mutant
  • evaluate sequence conservation by aligning homologous sequences
  • predicts effect of single substitution or large number (batch mode)
  • also database search of already computed predictions
22
Q

HumDiv

A
  • default PolyPhen-2 classifier
  • 3000 damaging alleles causing human mendelian disease (uniprot)
  • 6000 differences between human proteins and close mammalian homologs assumed as non-damaging
23
Q

HumVar

A
  • another classifier for PolyPhen-2
  • 13000 human disease-causing mutation (uniprot)
  • 9000 human non-synonymous amino acid substitutions with no annotated disease involvement (non-damaging)
  • more appropriate for mendelian disease associated substitutions
    • drastic effect
24
Q

SAAPs

A
  • single amino acid polymorphism data analysis
  • SAAPdap (predictor)
  • SAAPpred (predictor)
  • analyses liekly structural effect of amino acid substitution
  • considers residue conservation
  • requires experimental structure
25
Q

in silico prediction

focus on amino acid susbstitutions

A
  • many changes are very subtle
    • difficult to identify in wet lab work
  • easier to study protein than non-coding regions of DNA
  • improve understanding of how mutation affects protein