Lecture 9 - Coding Vs Non Coding Variation Flashcards

Question 1

Q

What is the most common human genetic variation?

Answer

A

Human polymorphism

SNP is Single nucleotide polymorphisms —> single nucleotide substitutions is most common

Human genome has millions of SNPs

Question 2

Q

Most types of SNP? Their affect?

Challenge?

Answer

A

Most SNPs are NEUTRAL = no phenotype

Most SNPs are in NON-CODING REGIONS

—> Non-coding SNPs are likely more IMPORTANT IN COMPLEX DISEASES AND PHENOTYPES

The CHALLENGE: Which of the variants alter phenotype?

Question 3

Q

Types of Functional Variants : 2

Answer

A

CODING variation
Non coding/ regulatory variation

Question 4

Q

Types of Functional Variation: CODING -2

Answer

A

• Amino acid variation

• Splice/reading frame variation

Question 5

Q

Types of functional variation: Non - Coding Variation - 4

Answer

A

• Transcriptional

• Post-transcriptional (mRNA processing)

• Non-coding RNA (miRNA, lncRNA)

• Epigenetic

Question 6

Q

Types of functional variation: Non - Coding Variation - 4

Answer

A

• Transcriptional

• Post-transcriptional (mRNA processing)

• Non-coding RNA (miRNA, lncRNA)

• Epigenetic

Question 7

Q

Every locus in Unqiue …

Answer

A

Every locus unique - requires a different approach

Question 8

Q

Overview of Functional annotation

Answer

A

A.
- GWAS
- GENOME
- EXOME
- Linkage and sequence analysis

B. Sequencing on Chromosome

C. Variants of various functional classes

D. Comparative genomics

E. Biochemistry/ Structure

F. Experimental function

Question 9

Q

Overview of Functional Enrichment

Where are the Regulatory elements?

Answer

A

Most functional variation is NON CODING AND REGULATORY

Genome
- coding 0.5%
- UTR 0.8%
- promoter 2.2%
- DHS 15.7%
- enhancer 3.2%
- intergenic DNA 52.0%
- Introns 28.8%

Question 10

Q

Workflow for Variant Functional Identification - 5

Answer

A

FINE MAPPING
In silico annotation
SNP function
4 . Target gene(s) identification
Target gene function

Question 11

Q

Fine mapping involves:

Answer

A

• Locus genotyping (sequencing)

• Statistical tests for independent associations

• LD mapping

Question 12

Q

In silico annotation:

Answer

A

Integrative tools (ENCODE)

• Exome-translation

• Intron-intergenic

Question 13

Q

SNP function —> target gene/s identification

Answer

A

Coding– GERP , PolyPhen
• Non-coding– GERP, RegulomeDB (ENCODE)➢ Transcr. factor element –EMSA, reporter gene, CHA-seq
➢ miRNA - TargetScan, miRanda
➢ lncRNAs– REMSA
➢ Epigenetic variants - MeQTL

Question 14

Q

Target gene/s function

Answer

A

• Cell culture models, human tissues

• Isogeneic models (CRISPR/Cas)

• Animal models– mouse KOs, zebrafish, Drosophilia

Question 15

Q

Identification of Coding-Region Variants: 3 types

Answer

A

Frame-shift variation (indels) etc —> protein truncations, fusions
Base-substitution: 2 types
Splice-site variation
- (may change AA sequence, may have
non-coding effects e.g. regulation of RNA processing)

Question 16

Q

2 types of Base-substitution: 5 points

Answer

A

synonymous– base change with no amino acid change
non-synonymous– base change with AA change
… 3. Conservative– change to similar AA

… 4. Semi-conservative – e.g.
-ve to +ve charged AA

… 5. Radical – AA with very different properties

Question 17

Q

Approaches to predicting Coding Variant Functions:

Tools for nucleotide-sequence-based prediction of deleteriousness

Answer

A

GERP : single site scoring - evolutionary

Question 18

Q

Approaches to predicting Coding Variant Functions

Tools for protein-sequence-based prediction of deleteriousness

Answer

A

Polyphen: trained classifier : evolutionary, biochemistry and structural

Question 19

Q

Bioinformatics approaches to predicting coding variant function:

Polyphen vs GERP

Answer

A

‘Genomic Evolutionary Rate Profiling (GERP)’
- ORTHOLOGOUS nucleotide sequences COMPARED to DETERMINE EVOLUTIONARY CONSTRAINTS TO CHANGE IN SEQUENCE

‘Polyphen’
- PREDICTS impact of AA SUBSTITUTION on STRUCTURE & FUNCTION of a
PROTEIN
- using PHYSICAL and COMPARATIVE CONSIDERATIONS

Question 20

Q

Genomic Evolutionary Rate Profiling (GERP) in detail 5.

Answer

A

leverage comparative NUCLEOTIDE sequence information by looking for REGIONS THAT EXHIBIT EVIDENCE OF SELECTIVE CONSTRAINT
IDENTIFIES CONSTRAINED ELEMENTS (strings of nucleotides) by QUANTIFYING SUBSTITUTION DEFICITS
‘i.e. deficits represent substitutions that would have occurred if the
element was neutral DNA, but didn’t occur due to selective pressure’
Remember: Conservation equals function
• GERP ESTIMATES CONSTRAINTS FOR EACH ALIGNMENT COLUMN COMPARED TO NO CONSTRAINT
R = sum (expected – observed rate)

Question 21

Q

PolyPhen: Polymorphism Phenotyping

Answer

A

Predicts impact of AA SUBSTITUTION on the STABILITY and FUNCTION OF A PROTEIN.
Uses PHYSICAL (3-D structure) & COMPARATIVE EVOLUTIONARY COMPARISONS
Estimates PROBABILITY of VARIANT BEING DAMAGING TO PROTEIN FUNCTION OR STRUCTURE.
‘Prediction outcome’ - PROBABLE DAMAGING, POSSIBLY damaging, or BENIGN
PolyPhen-2 found at a website.

Question 22

Q

What is ENCODE? what does it do?

Answer

A

a project to identify ALL FUNCTIONAL ELEMENTS IN THE HUMAN GENOME
SEQUENCE.
Transcription factor binding sites (ChIP-seq)
DNase I Hypersensitive sites (DNase-seq)
regRNAbinding sites
SNP catalogue (1000 genomes, GWAS)

Question 23

Q

Prioritisation scores - RegulomeDB

Answer

A

lower scores indicate increasing evidence for a variant to be locate in a functional region.

category 1 variants have equivalents in other categories with the additional requirment of eQTL information.

Question 24

Q

Bioinformatic Identification of Regulatory Variants:
‘Regulome DB’

Answer

A

INTEGRATES FUNCTIONAL DATA contained in ENCODE with GENETIC VARIATION DATABASES
(dbSNP, ClinVar etc)
Predicts WHETHE VARIANTS are FUNCTIONAL - PRIORITISATION SCORE.

Question 25

Q

example SNV rs9261424 overlapping many regulatory.

Answer

A

NFKB track for 3 individuals:
1. homozygous to
reference allele (G),

heterozygous, and
homozygous to alternate allele (C)

Question 26

Q

Summary – Lead SNP to Functional SNP

Answer

A

Genome
Predicted motifs
DNAseI Hypersensitivity Peaks
ChIP-seq Peaks for TF1
Linkage Disequilibrium

Question 27

Q

What’s Next?

Answer

A

What’s Next?
“A major goal will be to develop a unified, quantitative, predictive
framework to estimate the prior probabilities for any given mutation to be
both functionally relevant and disease-relevant, accounting for both
computational and experimental sources of information.

A number of
challenges must be met for such a framework to succeed”

need large collections of true positive (functional) and true
negative (neutral) variant