Lecture 9 - Coding Vs Non Coding Variation Flashcards
What is the most common human genetic variation?
Human polymorphism
SNP is Single nucleotide polymorphisms —> single nucleotide substitutions is most common
- Human genome has millions of SNPs
Most types of SNP? Their affect?
Challenge?
Most SNPs are NEUTRAL = no phenotype
Most SNPs are in NON-CODING REGIONS
—> Non-coding SNPs are likely more IMPORTANT IN COMPLEX DISEASES AND PHENOTYPES
The CHALLENGE: Which of the variants alter phenotype?
Types of Functional Variants : 2
- CODING variation
- Non coding/ regulatory variation
Types of Functional Variation: CODING -2
• Amino acid variation
• Splice/reading frame variation
Types of functional variation: Non - Coding Variation - 4
• Transcriptional
• Post-transcriptional (mRNA processing)
• Non-coding RNA (miRNA, lncRNA)
• Epigenetic
Types of functional variation: Non - Coding Variation - 4
• Transcriptional
• Post-transcriptional (mRNA processing)
• Non-coding RNA (miRNA, lncRNA)
• Epigenetic
Every locus in Unqiue …
Every locus unique - requires a different approach
Overview of Functional annotation
A.
- GWAS
- GENOME
- EXOME
- Linkage and sequence analysis
B. Sequencing on Chromosome
C. Variants of various functional classes
D. Comparative genomics
E. Biochemistry/ Structure
F. Experimental function
Overview of Functional Enrichment
Where are the Regulatory elements?
Most functional variation is NON CODING AND REGULATORY
Genome
- coding 0.5%
- UTR 0.8%
- promoter 2.2%
- DHS 15.7%
- enhancer 3.2%
- intergenic DNA 52.0%
- Introns 28.8%
Workflow for Variant Functional Identification - 5
- FINE MAPPING
- In silico annotation
- SNP function
4 . Target gene(s) identification - Target gene function
Fine mapping involves:
• Locus genotyping (sequencing)
• Statistical tests for independent associations
• LD mapping
In silico annotation:
- Integrative tools (ENCODE)
• Exome-translation
• Intron-intergenic
SNP function —> target gene/s identification
- Coding– GERP , PolyPhen
• Non-coding– GERP, RegulomeDB (ENCODE)➢ Transcr. factor element –EMSA, reporter gene, CHA-seq
➢ miRNA - TargetScan, miRanda
➢ lncRNAs– REMSA
➢ Epigenetic variants - MeQTL
Target gene/s function
• Cell culture models, human tissues
• Isogeneic models (CRISPR/Cas)
• Animal models– mouse KOs, zebrafish, Drosophilia
Identification of Coding-Region Variants: 3 types
- Frame-shift variation (indels) etc —> protein truncations, fusions
- Base-substitution: 2 types
- Splice-site variation
- (may change AA sequence, may have
non-coding effects e.g. regulation of RNA processing)
2 types of Base-substitution: 5 points
- synonymous– base change with no amino acid change
- non-synonymous– base change with AA change
… 3. Conservative– change to similar AA
… 4. Semi-conservative – e.g.
-ve to +ve charged AA
… 5. Radical – AA with very different properties
Approaches to predicting Coding Variant Functions:
Tools for nucleotide-sequence-based prediction of deleteriousness
GERP : single site scoring - evolutionary
Approaches to predicting Coding Variant Functions
Tools for protein-sequence-based prediction of deleteriousness
Polyphen: trained classifier : evolutionary, biochemistry and structural
Bioinformatics approaches to predicting coding variant function:
Polyphen vs GERP
‘Genomic Evolutionary Rate Profiling (GERP)’
- ORTHOLOGOUS nucleotide sequences COMPARED to DETERMINE EVOLUTIONARY CONSTRAINTS TO CHANGE IN SEQUENCE
‘Polyphen’
- PREDICTS impact of AA SUBSTITUTION on STRUCTURE & FUNCTION of a
PROTEIN
- using PHYSICAL and COMPARATIVE CONSIDERATIONS
Genomic Evolutionary Rate Profiling (GERP) in detail 5.
- leverage comparative NUCLEOTIDE sequence information by looking for REGIONS THAT EXHIBIT EVIDENCE OF SELECTIVE CONSTRAINT
- IDENTIFIES CONSTRAINED ELEMENTS (strings of nucleotides) by QUANTIFYING SUBSTITUTION DEFICITS
‘i.e. deficits represent substitutions that would have occurred if the
element was neutral DNA, but didn’t occur due to selective pressure’ - Remember: Conservation equals function
- • GERP ESTIMATES CONSTRAINTS FOR EACH ALIGNMENT COLUMN COMPARED TO NO CONSTRAINT
- R = sum (expected – observed rate)
PolyPhen: Polymorphism Phenotyping
- Predicts impact of AA SUBSTITUTION on the STABILITY and FUNCTION OF A PROTEIN.
- Uses PHYSICAL (3-D structure) & COMPARATIVE EVOLUTIONARY COMPARISONS
- Estimates PROBABILITY of VARIANT BEING DAMAGING TO PROTEIN FUNCTION OR STRUCTURE.
- ‘Prediction outcome’ - PROBABLE DAMAGING, POSSIBLY damaging, or BENIGN
- PolyPhen-2 found at a website.
What is ENCODE? what does it do?
- a project to identify ALL FUNCTIONAL ELEMENTS IN THE HUMAN GENOME
SEQUENCE. - Transcription factor binding sites (ChIP-seq)
- DNase I Hypersensitive sites (DNase-seq)
- regRNAbinding sites
- SNP catalogue (1000 genomes, GWAS)
Prioritisation scores - RegulomeDB
lower scores indicate increasing evidence for a variant to be locate in a functional region.
category 1 variants have equivalents in other categories with the additional requirment of eQTL information.
Bioinformatic Identification of Regulatory Variants:
‘Regulome DB’
- INTEGRATES FUNCTIONAL DATA contained in ENCODE with GENETIC VARIATION DATABASES
(dbSNP, ClinVar etc) - Predicts WHETHE VARIANTS are FUNCTIONAL - PRIORITISATION SCORE.
example SNV rs9261424 overlapping many regulatory.
NFKB track for 3 individuals:
1. homozygous to
reference allele (G),
- heterozygous, and
- homozygous to alternate allele (C)
Summary – Lead SNP to Functional SNP
- Genome
- Predicted motifs
- DNAseI Hypersensitivity Peaks
- ChIP-seq Peaks for TF1
- Linkage Disequilibrium
What’s Next?
What’s Next?
“A major goal will be to develop a unified, quantitative, predictive
framework to estimate the prior probabilities for any given mutation to be
both functionally relevant and disease-relevant, accounting for both
computational and experimental sources of information.
A number of
challenges must be met for such a framework to succeed”
- need large collections of true positive (functional) and true
negative (neutral) variant