MARCO - genome variation Flashcards

1
Q

Genome Variation

A

SNP + indels

is due to mutations which mostly occurs during DNA replication.
* Germ-line mutations: If the mutation occurs in the germ line (sperm/ egg cells), it will be passed down to future generations
* Somatic mutations: Accumulation of mutation in specific somatic cell types can cause the cells to differentiate/ become aggressive & develop into diseases/ cancer that only affects the individual. (not passed down)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

SNP – single nucleotide polymorphism

A
  • DNA sequence variations that occur when a single nucleotide (A, T, C, or G) in the genome sequence is altered.
  • For a variation to be considered a SNP, it must occur in at least 1% of the population
  • SNPs make up about 90% of all human genetic variation (the rest = indels – insertions &
    deletions)
  • Most are heterozygous – happens only once among 2 copies of chromosome
  • There are roughly 4 - 5 million SNPs in an individual human genome (occur
    approximately every 1000 bases) – so 98% of genome = conserved
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Importance of SNPs

A

SNPs can affect how:
* humans develop diseases
* an individual respond to pathogens, chemicals, drugs, etc
Potentially their greatest importance in biomedical research is for comparing regions of the genome between cohorts ex. which SNP associates w/ disease vs. healthy individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Location of SNPs

A

(random & can occur anywhere)
A : Distal intergenic region – within transcription enhancer/ other regulatory regions
B: Proximal intergenic region – within promoter or other TF binding regions
C: Within exon– most dangerous because could affect protein coding by changing amino acid sequences/ introducing early STOP codon –
truncating the protein
D: Within intron – can affect splicing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Coding SNPs

A

– SNPs in coding region (exons) 2% of SNPs

2 types:
1. Synonymous (silent):
* affected codon codes for the same amino acid so the mutation is silent. (possible due to redundancy in the first 2 nt of codons)
* However, synonymous SNPs may still affect Exon Splicing Enhancers (ESE) or Exon Splicing Silencers (ESS) site – affecting the mRNA transcript, so it can’t always be ignored
2. Non-synonymous:
* affected codon codes for a different amino acid – could be detrimental

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Coding SNPs - 2 possible changes of amino acids:

A
  • Transition (Ti): - most common substitution
    o Replacement of a purine by another ( AàG) or a pyrimidine by another (TàC)
  • Transversion (Tv):
    o Replacement of purine by pyrimidine or vice versa (AàC,T or CàA,G)
    o transversions in the third base of a codon is most likely to change the encoded
    amino acid due to codons being redundant in the first 2 locations

Ti/Tv ratio – varies within genome and is used to assess GWAS data quality
* Across entire genome, the ratio averages around 2
* In protein coding regions typically higher, often above 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Non-coding SNPs

A

98% of SNPs
Occur in regulatory regions which include:
* Enhancers
* Silencers
* Splice sites
* Locus control regions
* Promoters
* Regions coding for long non-coding RNA responsible for maintaining higher order structure of 3D genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Disease associated SNPs fall into 2 categories:

A
  1. Monogenic:
    * One nucleotide change is enough to lead to diseases
    * Relatively easy to detect and analyze
    * Affect simple traits (traits regulated by 1 gene)
  2. Polygenic:
    * Multiple nucleotide changes contribute together to the development of a disease
    * Hard to detect and analyze
    * Affect complex traits (traits regulated by many different genes) ex. Alzheimer’s
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

eg. Sickle Cell Anemia

A
  • Inherited blood disorder due to SNP causing GAG to GTG mutation at aa position 6 of the β-globin (HBB) gene. This results in glutamic acid being substituted by valine.
  • Produces fragile, sickle-shaped cells which deliver less oxygen to body’s tissues. The
    sickle cells also get struck more easily in small blood vessels & can break into pieces that
    can interrupt healthy blood flow
  • Autosomal recessive mutation – only individuals homozygous for the SNPs will have
    sickle cell anemia
    Symptoms include:
  • Shortness of breath
  • Infections (bone, gall bladder)
  • Joint pain

Found primarily in African & related populations – because it can prevent plasmodium from infecting tissues, protecting the affected individuals from malaria

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

eg. Alzheimer’s disease

A

2 main types of Alzheimer’s disease:
Early-Onset or Familial Alzheimer’s
* normally runs in the family and affected individuals will develop it in early life (around 40 years old)
* Associated with mutations in gene called amyloid precursor protein (APP) or Presenilin-1 and 2
* Rare– only 5% of Alzheimer’s cases
Sporadic late onset Alzheimer’s
* Develops later in life (around 65-70 or later)
* Associated with many possible genes which are still elusive. An identified affected gene
include: Apolipoprotein E (ApoE)
ApoE contains two SNPs that result in three possible alleles: E2, E3, and E4 The protein product of each allele differs by one amino acid
* E3 allele is a result of a synonymous SNP. Inheritance of E3 allele does not have an effect on the possibility of developing Alzheimer’s
* Inheritance of at least one E4 allele will have a greater chance of developing Alzheimer’s disease (SNP resulting in E4 allele of ApoE gene will cause affected individuals to be more prone to Alzheimer’s
* Inheritance of the E2 allele indicates that a person is less likely to develop Alzheimer’s
However, it’s not always definite. Someone who has inherited two E4 alleles may never develop Alzheimer’s disease, while another who has inherited two E2 alleles may. Because as with most common chronic disorders such as heart disease, diabetes, or cancer, Alzheimer’s is a disease that can be caused by variations in several genes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Effects of each allele is known by …

A

GWAS. Indicate that SNPs can actually have many kinds of effects – neutral, detrimental, or protective. (not just diseases causing). This is because changes in protein structure can result in both less or more efficient interaction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Disease-causing non-coding SNPs can be due to:

A
  • Disruption of TF binding motifs which must be accurate for recognition by TF
    o Modification of the motif, will result in inability of TF to bind to these motifs, inactivating the enhancer/promotor & repressing the downstream gene.
  • Disruption of splice sites (which provide signal for proteins to splice RNA, to include only exons in the mature mRNA transcript)
    o This can result in aberrant splicing, so the protein translated from aberrant mRNA transcript will be different from the wild type
  • Disruption of auxiliary regions that stimulate splicing, ex: exonic splicing enhancers (ESE) or intronic splicing enhancers (ISE) can also affect the splicing mechanism, resulting in a variant mRNA transcript & protein

Ex. of Intron SNP: OAS1 Gene – associated with type I diabetes
SNP at Intron 6 of OAS1 gene where AG is changed to AA shifts the 3’ splice site by 1 nucleotide. This changes the reading frame of exon 7, resulting in a longer protein.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

synonymous mutation in coding SNP

A

Disruption of splice site can also be due to synonymous mutation in coding SNP which do not alter amino acid sequences but will cause cryptic splice sites within exons, resulting in a different protein
* Ex. SNP at exon 11 of LMNA gene results in an internal 5’ splice site

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Indels – Insertion or Deletions of nucleotides

A
  • Indels can occur due to imprecise repair of DNA
  • Insertion or deletion causes frameshift mutation which can disrupt start codon, splice
    site, and can introduce STOP codon (common in coding sequences read out of frame)
  • These disruptions can result in truncated/ elongated proteins
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Genome Wide Association Studies (GWAS)

A

Allow us to study the association of certain SNPs with certain diseases

Profile genome of population with/ without disease to observe the differences in their genome & see the common SNPs among the disease population.

Use of GWAS has identified genetic variations that contribute to risk of:
Type 2 diabetes, Parkinson’s disease, Heart disorders, Obesity, Crohn’s disease, Prostate cancer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

100,000 Genomes Project (an example of GWAS)

A
  • The Genomics England project in collaboration with the NHS
  • Sequenced the genomes from approximately 70,000 people - Participants are NHS
    patients that either have a rare disease or patients with cancer & their families
  • To identify variants associated with different conditions, so can offer faster diagnosis +
    research on new, more effective treatments
17
Q

Identifying Variants

A
  • Genomes sequenced from diseased & healthy population which are then aligned to observe any variants/ SNPs.
  • If the SNPs are common among the disease population but not the normal population, then the SNP can be deduced to be related to that disease.
18
Q

Challenges/ limitations to GWAS:

A
  1. Tissue-specificity
    * The effects of non-coding SNP are less clear since it just disrupts regulatory elements
    not actual coding genes.
    * Certain disease-associated SNPs can also be present in the normal population (but in
    smaller amount) ex. 80% vs 20%.
    * This is because gene regulation is tissue-specific
    * The non-coding SNP might have an effect on 1 cell type but not the other
    * However, during sampling, the same set of genomes are taken from all cell types.
    * Therefore, GWAS can identify candidate SNPs, but additional functional validation is
    required.
  2. Sampling limitation
    * must collect genomes from all over the world to be representative of the whole
    population, not just certain groups of people/ nationalities (Since most SNPs are population associated). Must be taken with a grain of salt until a better pool of genome has been collected.
  3. Main limitation: Linkage disequilibrium
    * refers to the association of alleles at two or more loci within a population, causing non-
    random distribution of alleles
    * results in haplotypes (group of alleles) occuring at un-expected frequencies
    * SNPs near each other will be in Linkage Disequilibrium – meaning that they will always occur together.
    * Therefore, out of the many SNPs present in a disease population, only some are actually disease-associated (have an actual effect), while some are just in linkage with those disease-associated SNPs.
    * Linkage disequilibrium is an impossible limitation to solve in GWAS because the constant SNPs association will not allow for the identification of SNPs that are actually causing the disease.
19
Q

Expression Quantitative Trait Loci (eQTL)

A

eQTL = a non-coding SNP that has a quantitative effect on the expression of a gene.

  • eQTL analysis involves RNA sequencing of a specific SNP variant and the control genome, which allow us to quantify differences in their gene expression levels.
  • Any non-coding SNPs associated with an abnormal RNA expression level is classified as an expression quantitative trait locus
  • eQTL mapping allows for identification of the gene that the non-coding SNP has an effect on
  • eQTL can be:
    o cis: maps close to the gene (SNP influences a nearby gene)
    o trans: eQTL maps far from the gene (SNP influences a gene further away from
    the mutation site - could be on another chromosome)
  • A disease-associated SNP which is also an eQTL may confer the risk of the disease.
  • Combination of GWAS & eQTL of identified SNPs can lead to identification of disease- causing genes
20
Q

SNP Genotyping

A

Used to identify the presence of predefined SNPs (discovered by GWAS)
1. Utilize microarray chips with probes complimentary to the SNP variant of interest
2. Different cells on the array will contain probes for alternative SNPs.
3. The expression level of those SNPs can be compared to see which one is more prevalent

21
Q

Conclusion:

A
  • 90% of genetic variation is due to SNPs
  • SNPs can be:
    o Coding – in the exon, therefore changing the protein amino acid sequence
    o Non-coding – in regulatory regions (ex. promoters, enhancers, TFBS, splice sites)
  • Most SNPs do not have any consequences because:
    o Certain SNPs can be synonymous ex. does not change aa sequence or does not affect function of regulatory elements
    o most SNP are heterozygous and have a recessive effect
  • Association of SNP with diseases can be studied using:
    o GWAS (look for SNP present only in disease population)
    o eQTL (identify SNP associated with certain genes – which can be related to a
    disease)