Lecture 9 - Coding Vs Non Coding Variation Flashcards

1
Q

What is the most common human genetic variation?

A

Human polymorphism

SNP is Single nucleotide polymorphisms —> single nucleotide substitutions is most common

  • Human genome has millions of SNPs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Most types of SNP? Their affect?

Challenge?

A

Most SNPs are NEUTRAL = no phenotype

Most SNPs are in NON-CODING REGIONS

—> Non-coding SNPs are likely more IMPORTANT IN COMPLEX DISEASES AND PHENOTYPES

The CHALLENGE: Which of the variants alter phenotype?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Types of Functional Variants : 2

A
  1. CODING variation
  2. Non coding/ regulatory variation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Types of Functional Variation: CODING -2

A

• Amino acid variation

• Splice/reading frame variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Types of functional variation: Non - Coding Variation - 4

A

• Transcriptional

• Post-transcriptional (mRNA processing)

• Non-coding RNA (miRNA, lncRNA)

• Epigenetic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Types of functional variation: Non - Coding Variation - 4

A

• Transcriptional

• Post-transcriptional (mRNA processing)

• Non-coding RNA (miRNA, lncRNA)

• Epigenetic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Every locus in Unqiue …

A

Every locus unique - requires a different approach

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Overview of Functional annotation

A

A.
- GWAS
- GENOME
- EXOME
- Linkage and sequence analysis

B. Sequencing on Chromosome

C. Variants of various functional classes

D. Comparative genomics

E. Biochemistry/ Structure

F. Experimental function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Overview of Functional Enrichment

Where are the Regulatory elements?

A

Most functional variation is NON CODING AND REGULATORY

Genome
- coding 0.5%
- UTR 0.8%
- promoter 2.2%
- DHS 15.7%
- enhancer 3.2%
- intergenic DNA 52.0%
- Introns 28.8%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Workflow for Variant Functional Identification - 5

A
  1. FINE MAPPING
  2. In silico annotation
  3. SNP function
    4 . Target gene(s) identification
  4. Target gene function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Fine mapping involves:

A

• Locus genotyping (sequencing)

• Statistical tests for independent associations

• LD mapping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In silico annotation:

A
  • Integrative tools (ENCODE)

• Exome-translation

• Intron-intergenic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

SNP function —> target gene/s identification

A
  • Coding– GERP , PolyPhen
    • Non-coding– GERP, RegulomeDB (ENCODE)➢ Transcr. factor element –EMSA, reporter gene, CHA-seq
    ➢ miRNA - TargetScan, miRanda
    ➢ lncRNAs– REMSA
    ➢ Epigenetic variants - MeQTL
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Target gene/s function

A

• Cell culture models, human tissues

• Isogeneic models (CRISPR/Cas)

• Animal models– mouse KOs, zebrafish, Drosophilia

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Identification of Coding-Region Variants: 3 types

A
  1. Frame-shift variation (indels) etc —> protein truncations, fusions
  2. Base-substitution: 2 types
  3. Splice-site variation
    - (may change AA sequence, may have
    non-coding effects e.g. regulation of RNA processing)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

2 types of Base-substitution: 5 points

A
  1. synonymous– base change with no amino acid change
  2. non-synonymous– base change with AA change
    … 3. Conservative– change to similar AA

… 4. Semi-conservative – e.g.
-ve to +ve charged AA

… 5. Radical – AA with very different properties

17
Q

Approaches to predicting Coding Variant Functions:

Tools for nucleotide-sequence-based prediction of deleteriousness

A

GERP : single site scoring - evolutionary

18
Q

Approaches to predicting Coding Variant Functions

Tools for protein-sequence-based prediction of deleteriousness

A

Polyphen: trained classifier : evolutionary, biochemistry and structural

19
Q

Bioinformatics approaches to predicting coding variant function:

Polyphen vs GERP

A

‘Genomic Evolutionary Rate Profiling (GERP)’
- ORTHOLOGOUS nucleotide sequences COMPARED to DETERMINE EVOLUTIONARY CONSTRAINTS TO CHANGE IN SEQUENCE

‘Polyphen’
- PREDICTS impact of AA SUBSTITUTION on STRUCTURE & FUNCTION of a
PROTEIN
- using PHYSICAL and COMPARATIVE CONSIDERATIONS

20
Q

Genomic Evolutionary Rate Profiling (GERP) in detail 5.

A
  1. leverage comparative NUCLEOTIDE sequence information by looking for REGIONS THAT EXHIBIT EVIDENCE OF SELECTIVE CONSTRAINT
  2. IDENTIFIES CONSTRAINED ELEMENTS (strings of nucleotides) by QUANTIFYING SUBSTITUTION DEFICITS
    ‘i.e. deficits represent substitutions that would have occurred if the
    element was neutral DNA, but didn’t occur due to selective pressure’
  3. Remember: Conservation equals function
  4. • GERP ESTIMATES CONSTRAINTS FOR EACH ALIGNMENT COLUMN COMPARED TO NO CONSTRAINT
  5. R = sum (expected – observed rate)
21
Q

PolyPhen: Polymorphism Phenotyping

A
  1. Predicts impact of AA SUBSTITUTION on the STABILITY and FUNCTION OF A PROTEIN.
  2. Uses PHYSICAL (3-D structure) & COMPARATIVE EVOLUTIONARY COMPARISONS
  3. Estimates PROBABILITY of VARIANT BEING DAMAGING TO PROTEIN FUNCTION OR STRUCTURE.
  4. ‘Prediction outcome’ - PROBABLE DAMAGING, POSSIBLY damaging, or BENIGN
  5. PolyPhen-2 found at a website.
22
Q

What is ENCODE? what does it do?

A
  1. a project to identify ALL FUNCTIONAL ELEMENTS IN THE HUMAN GENOME
    SEQUENCE.
  2. Transcription factor binding sites (ChIP-seq)
  3. DNase I Hypersensitive sites (DNase-seq)
  4. regRNAbinding sites
  5. SNP catalogue (1000 genomes, GWAS)
23
Q

Prioritisation scores - RegulomeDB

A

lower scores indicate increasing evidence for a variant to be locate in a functional region.

category 1 variants have equivalents in other categories with the additional requirment of eQTL information.

23
Q

Bioinformatic Identification of Regulatory Variants:
‘Regulome DB’

A
  1. INTEGRATES FUNCTIONAL DATA contained in ENCODE with GENETIC VARIATION DATABASES
    (dbSNP, ClinVar etc)
  2. Predicts WHETHE VARIANTS are FUNCTIONAL - PRIORITISATION SCORE.
24
Q

example SNV rs9261424 overlapping many regulatory.

A

NFKB track for 3 individuals:
1. homozygous to
reference allele (G),

  1. heterozygous, and
  2. homozygous to alternate allele (C)
24
Q

Summary – Lead SNP to Functional SNP

A
  1. Genome
  2. Predicted motifs
  3. DNAseI Hypersensitivity Peaks
  4. ChIP-seq Peaks for TF1
  5. Linkage Disequilibrium
25
Q

What’s Next?

A

What’s Next?
“A major goal will be to develop a unified, quantitative, predictive
framework to estimate the prior probabilities for any given mutation to be
both functionally relevant and disease-relevant, accounting for both
computational and experimental sources of information.

A number of
challenges must be met for such a framework to succeed”

  • need large collections of true positive (functional) and true
    negative (neutral) variant