MARCO - genome variation Flashcards
Genome Variation
SNP + indels
is due to mutations which mostly occurs during DNA replication.
* Germ-line mutations: If the mutation occurs in the germ line (sperm/ egg cells), it will be passed down to future generations
* Somatic mutations: Accumulation of mutation in specific somatic cell types can cause the cells to differentiate/ become aggressive & develop into diseases/ cancer that only affects the individual. (not passed down)
SNP – single nucleotide polymorphism
- DNA sequence variations that occur when a single nucleotide (A, T, C, or G) in the genome sequence is altered.
- For a variation to be considered a SNP, it must occur in at least 1% of the population
- SNPs make up about 90% of all human genetic variation (the rest = indels – insertions &
deletions) - Most are heterozygous – happens only once among 2 copies of chromosome
- There are roughly 4 - 5 million SNPs in an individual human genome (occur
approximately every 1000 bases) – so 98% of genome = conserved
Importance of SNPs
SNPs can affect how:
* humans develop diseases
* an individual respond to pathogens, chemicals, drugs, etc
Potentially their greatest importance in biomedical research is for comparing regions of the genome between cohorts ex. which SNP associates w/ disease vs. healthy individuals
Location of SNPs
(random & can occur anywhere)
A : Distal intergenic region – within transcription enhancer/ other regulatory regions
B: Proximal intergenic region – within promoter or other TF binding regions
C: Within exon– most dangerous because could affect protein coding by changing amino acid sequences/ introducing early STOP codon –
truncating the protein
D: Within intron – can affect splicing
Coding SNPs
– SNPs in coding region (exons) 2% of SNPs
2 types:
1. Synonymous (silent):
* affected codon codes for the same amino acid so the mutation is silent. (possible due to redundancy in the first 2 nt of codons)
* However, synonymous SNPs may still affect Exon Splicing Enhancers (ESE) or Exon Splicing Silencers (ESS) site – affecting the mRNA transcript, so it can’t always be ignored
2. Non-synonymous:
* affected codon codes for a different amino acid – could be detrimental
Coding SNPs - 2 possible changes of amino acids:
- Transition (Ti): - most common substitution
o Replacement of a purine by another ( AàG) or a pyrimidine by another (TàC) - Transversion (Tv):
o Replacement of purine by pyrimidine or vice versa (AàC,T or CàA,G)
o transversions in the third base of a codon is most likely to change the encoded
amino acid due to codons being redundant in the first 2 locations
Ti/Tv ratio – varies within genome and is used to assess GWAS data quality
* Across entire genome, the ratio averages around 2
* In protein coding regions typically higher, often above 3
Non-coding SNPs
98% of SNPs
Occur in regulatory regions which include:
* Enhancers
* Silencers
* Splice sites
* Locus control regions
* Promoters
* Regions coding for long non-coding RNA responsible for maintaining higher order structure of 3D genome
Disease associated SNPs fall into 2 categories:
- Monogenic:
* One nucleotide change is enough to lead to diseases
* Relatively easy to detect and analyze
* Affect simple traits (traits regulated by 1 gene) - Polygenic:
* Multiple nucleotide changes contribute together to the development of a disease
* Hard to detect and analyze
* Affect complex traits (traits regulated by many different genes) ex. Alzheimer’s
eg. Sickle Cell Anemia
- Inherited blood disorder due to SNP causing GAG to GTG mutation at aa position 6 of the β-globin (HBB) gene. This results in glutamic acid being substituted by valine.
- Produces fragile, sickle-shaped cells which deliver less oxygen to body’s tissues. The
sickle cells also get struck more easily in small blood vessels & can break into pieces that
can interrupt healthy blood flow - Autosomal recessive mutation – only individuals homozygous for the SNPs will have
sickle cell anemia
Symptoms include: - Shortness of breath
- Infections (bone, gall bladder)
- Joint pain
Found primarily in African & related populations – because it can prevent plasmodium from infecting tissues, protecting the affected individuals from malaria
eg. Alzheimer’s disease
2 main types of Alzheimer’s disease:
Early-Onset or Familial Alzheimer’s
* normally runs in the family and affected individuals will develop it in early life (around 40 years old)
* Associated with mutations in gene called amyloid precursor protein (APP) or Presenilin-1 and 2
* Rare– only 5% of Alzheimer’s cases
Sporadic late onset Alzheimer’s
* Develops later in life (around 65-70 or later)
* Associated with many possible genes which are still elusive. An identified affected gene
include: Apolipoprotein E (ApoE)
ApoE contains two SNPs that result in three possible alleles: E2, E3, and E4 The protein product of each allele differs by one amino acid
* E3 allele is a result of a synonymous SNP. Inheritance of E3 allele does not have an effect on the possibility of developing Alzheimer’s
* Inheritance of at least one E4 allele will have a greater chance of developing Alzheimer’s disease (SNP resulting in E4 allele of ApoE gene will cause affected individuals to be more prone to Alzheimer’s
* Inheritance of the E2 allele indicates that a person is less likely to develop Alzheimer’s
However, it’s not always definite. Someone who has inherited two E4 alleles may never develop Alzheimer’s disease, while another who has inherited two E2 alleles may. Because as with most common chronic disorders such as heart disease, diabetes, or cancer, Alzheimer’s is a disease that can be caused by variations in several genes.
Effects of each allele is known by …
GWAS. Indicate that SNPs can actually have many kinds of effects – neutral, detrimental, or protective. (not just diseases causing). This is because changes in protein structure can result in both less or more efficient interaction.
Disease-causing non-coding SNPs can be due to:
- Disruption of TF binding motifs which must be accurate for recognition by TF
o Modification of the motif, will result in inability of TF to bind to these motifs, inactivating the enhancer/promotor & repressing the downstream gene. - Disruption of splice sites (which provide signal for proteins to splice RNA, to include only exons in the mature mRNA transcript)
o This can result in aberrant splicing, so the protein translated from aberrant mRNA transcript will be different from the wild type - Disruption of auxiliary regions that stimulate splicing, ex: exonic splicing enhancers (ESE) or intronic splicing enhancers (ISE) can also affect the splicing mechanism, resulting in a variant mRNA transcript & protein
Ex. of Intron SNP: OAS1 Gene – associated with type I diabetes
SNP at Intron 6 of OAS1 gene where AG is changed to AA shifts the 3’ splice site by 1 nucleotide. This changes the reading frame of exon 7, resulting in a longer protein.
synonymous mutation in coding SNP
Disruption of splice site can also be due to synonymous mutation in coding SNP which do not alter amino acid sequences but will cause cryptic splice sites within exons, resulting in a different protein
* Ex. SNP at exon 11 of LMNA gene results in an internal 5’ splice site
Indels – Insertion or Deletions of nucleotides
- Indels can occur due to imprecise repair of DNA
- Insertion or deletion causes frameshift mutation which can disrupt start codon, splice
site, and can introduce STOP codon (common in coding sequences read out of frame) - These disruptions can result in truncated/ elongated proteins
Genome Wide Association Studies (GWAS)
Allow us to study the association of certain SNPs with certain diseases
Profile genome of population with/ without disease to observe the differences in their genome & see the common SNPs among the disease population.
Use of GWAS has identified genetic variations that contribute to risk of:
Type 2 diabetes, Parkinson’s disease, Heart disorders, Obesity, Crohn’s disease, Prostate cancer