Eukaryotic genomes and their evolution Flashcards

1
Q

How big is the human genome?

A

Genome=Set of genetic material (DNA) present in a cell or organism
Also includes non-coding sections
3 billon base pairs which is 2 metres of DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain variability in genomes between humans and bacteria

A

A lot of variability in composition and size of genomes between species
Human is mostly made of non-coding DNA
Bacteria is mostly made of coding DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the C-paradox?

A

Size of genome doesn’t correlate to complexity
Example amoeba has 600 billion base pairs but human only has 3 billion base pairs
Amoeba is less complex but has a larger genome than humans
Gene number is also highly variable
Worm has same number of genes as human but is much simpler

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the composition of the human genome

A

1% of human genome consists of exons (coding DNA that makes proteins)
24% is introns
Exons comprise 5% of each gene, so genes (exons + introns) comprise 25% of the genome
Human genome has 20,000 genes
Repetitive DNA (transposable elements) (<50%)
Regulatory elements (introns/other intergenic DNA): switches that activate/deactivate genes
non coding genome consists of 1 million enhancers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is gene duplication and what does it lead to?

A

Gene duplication is how new genes evolve
38% of human genes are derived from gene duplication
Gene duplication leads to gene families (paralogous genes)
Are sister genes that share a common ancestor
Very similar sequence
Found in the same genome
Can be found on different or the same chromosome
May be clustered together or dispersed through the genome with diverse function
Degenerate into pseudogenes: come from same ancestor but have lost function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Paralogous vs Orthologous genes

A

Paralogous: 2 sister genes or gene clusters in the same organism, arises from gene duplication, structural similarity, come from common ancestor but have diverged since
Orthologous: same gene found in 2 different genomes with the same function, Example humans and chimpanzees both have a specific gene (with usually the same name)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is gene duplication rewarded by evolution?

A

More protein production
If gene doesn’t work anymore, sister gene can produce a similar protein

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is synteny?

A

Pieces of genome/chromosomal regions of different species where homologous genes occur in the same order
Come from the same ancestor
Relationships between mouse and human genomes, most functional genes are in a syntenic region

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain the two different approaches to the human genome project

A

Public (Watson/colins) aproach said they would sequence in 15 years and cost 3 billion dollars
Celera genomics aimed to sequence in 3 years and 300k dollars. Used shotgun sequencing.
HGP was published in 2003
But 8% of genome is still unsequenced due to heterochromatin
Now there is next generation sequencing techniques (Illumina) that sequences quickly and cheap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How are genomic elements conserved among species? How can we use bioinformatics?

A

Conservation between species varies depending on what we are looking at: coding genes, enhancers/promoters, transcription factor binding sites
Bioinformatics: Uses sequence alignment tools to study conservation of the genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How are coding genes conserved between species?

A

Sequence conservation predicts conservation in function. Orthologues are most likely to retain the common ancestral function 80% of human genes are found in mice. So can express the gene in mice to study effect of a specific disease gene. Use mice as model organisms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How are regulatory elements conserved between species (transcription factors)?

A

Does not apply to cis-regulatory elements. Conservation of binding preferences and binding sites. But only small amount of transcription factor binding is conserved among species.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How are enhancers conserved across species?

A

Enhancers with conserved sequences across species are NOT equally functional
Most enhancers are not functional across species
80% of human genes are conserved in mice
But humans and mice have different enhancers that regulate which genes are expressed
So function differently even though genes are the same
This also applies to primates
Humans and chimpanzees are 98% genetically similar but have different enhancers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Compare genome similarity of Humans vs Chimps

A

1% divergence between genes shared (98% same)
6% of genes are not shared between humans and chimps
Large amount of loss and gain of genes since evolutionary split
Human chromosome 2 is a result of the fusion of the chimp chromosomes 2A and 2B
Humans have lost many olfactory genes (humans don’t need to smell as much)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are molecular clocks?

A

LUCA lived 3.8 billion years ago (first form of life)
We know due to molecular clocks
Uses fossils and rate of mutations to deduce when a species diverged
Nucleotide or amino acid sequences are compares among species to date when they last shared a common ancestor
Rate of mutation assumed to be constant
Rate may differ from gene to gene
Genes that are responsible for basic functions mutate more slowly
Mitochondria was formed from symbiosis: was a bacteria that was incorporated into the cell due to it’s essential function
Use mitochondrial genome to measure mutation rate as it has a constant mutation rate
Effects of mutations are neutral
Circular DNA with only a few genes
Inherit mitochondrial DNA from the mother without recombination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are mitochondrial haplogroups?

A

Haplogroup: specific mutations present in mitochondrial DNA
Lived about 200,000 years ago in West Africa
Supports out of Africa hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How does a genome acquire new genes?

A

Horizontal gene transfer
Exon shuffling
Duplication and divergence - this is very rare (1% chance for 1 gene in 1 million years)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the 3 different outcomes of gene duplication?

A

Duplication of one gene leads to 2 similar genes
Selective pressure on both genes: genes stay similar (More genes = more proteins)
Selective pressure on just one of the genes: one copy degrades (Accumulates mutations and generates pseudogenes)
Selective pressure on just one of the genes: one copy acquires a new function (Gene is important but can tolerate a new function. sub-functionalization: new copy of gene is slightly different = specialization)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How does gene duplication occur during DNA replication/meiosis?

A

Gene duplication can occur during chromosomal recombination (crossing over)
Crossing over occurs during meiosis and leads to new combinations of alleles
Error in chromatid pairing leads to duplication of regions

During DNA replication due to DNA polymerase slippage
DNA replication occurs via DNA polymerase
Ex. 15 CA repeats originally
Polymerase pauses in CA repeat domain
Newly formed strand melts and reanneals incorrectly (slipping)
Mutation is repaired incorrectly = duplication
Ex. now 17 CA repeats

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is Neo (sub) functionalization? Give an example

A

After gene duplication, two genes with identical function are unlikely to be maintained in the genome
Each daughter gene adopts a part of the function of the parental gene
Changes occur in expression pattern of two genes
Gains mutations
Leads to genes having similar but not identical functions (specialization)
Genes are expressed at different times and in different cell types
Example: trypsin vs chymotrypsin
Duplicated 1500 million years ago
Proteases
Trypsin: cuts at arginine and lysine
Chymotrypsin: cuts at phenylalanine’s, tryptophan’s, tyrosine’s
Example: transcription factor families (S0X genes)
Many paralogues of S0X with similar functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are pseudogenes?

A

Pseudogenes: gene duplicates and one copy completely degrades
Occurs in the first million years after duplication if the gene is not under selection
Gene duplication generates function redundancy
Not advantageous to keep identical copies of the same gene
Mutations disrupting structure and function and not deleterious
Accumulate until gene becomes non-functional pseudogene
Time frame = 4 million years
Pseudogenes can still be transcribed to mRNA but will not produce a functional protein

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the non-processed pseudogenes?

A

Tandem duplication of genomic region (from a normal duplication event)
1 copy faces lack of selection
Inactivating mutations or incomplete duplication
Missing regulatory regions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are processed pseudogenes?

A

Reverse transcriptase activity (LINE, retrovirus, transposons): parasitic elements with a copy paste mechanism
Gene is transcribed to RNA
RNA is reverse transcribed to cDNA and re-integrated into the genome
Lack of regulatory regions/introns (mRNA source) = non functional
Contain polyA tail/flanking repeats (responsible for transcription termination)
Can integrate into the same or different chromosome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are ribosomal protein pseudogenes in humans and how are they conserved across primates?

A

20,000 human pseudogenes in genome
Many are ribosomal protein pseudogenes
Large family (2000 copies)
Processed pseudogenes
Form specific L1 retrotransposon
Highly transcribed / high expression rate
Highly conserved across primates
2/3rds human RP pseudogenes also in chimpanzee genome
<12 shared with ordents
Implies recent origin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are multigenerational families?

A

Multigene family: When a duplication is beneficial to form a group of similar genes
Genes in family can have slightly different functions so become specialized
Example rRNA genes (Mycoplasma genitalium:2, Xenopus laevis > 500)
Tandem gene family: members of multigene family are on the same chromosome
Dispersed gene family: members of the multigene family are on different chromosomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What are HOX genes and what is their function?

A

HOX genes are a multigene family
They form a homeotic protein
Encode for transcription factors that bind DNA and can regulate activation or inactivation of genes during embryonic development
Important for development and patterning of limbs / appendages
Control pattern of body formation during early embryonic development
Control compartmentalization / regionalization of body parts in animals along head to tail (anterior-posterior) axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the homeodomain in HOX genes?

A

Homeodomain / homeobox / HOX
Domain: functional unit of a protein
60 amino acid protein, forms a helix-turn-helix, highly conserved protein in animals
Has a DNA binding domain
Zinc finger domain also important for DNA binding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How was the growth of anntennapedia discovered?

A

Normal antennapedia gene is expressed in second segment of a flies thorax and helps in the development of the second pair of legs
Mutation changes the location of the gene and causes legs to frow from the fly’s head in place of the antennae
Not important how much genes are expressed but rather where they are expressed
If HOX TF’s are expressed in the wrong location, appendages / limbs grow in the wrong place
Homeotic = something has changed to resemble something else

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Explain the composition and function of HOX genes in insects

A

Insects have one cluster of HOX genes consisting of 8 genes
8 genes are expressed in a specific region of the body
Cluster is divided into 2 clusters / complexes
Antennapedia complex: 4 genes responsible for head and first and second thoracic segments
Bithorax complex: 4 genes responsible to third thoracic complex, bithorax complex and 8 abdominal segments
Homeotic transformations in insects: mutations in insect HOX genes result in one body segment taking on the identity of another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Explain human HOX genes

A

Humans have 4 clusters of HOX genes
Each cluster has 13 genes = 52 HOX genes
Each cluster is in a different chromosome (4 in total)
HOXA, HOXB, HOXC, HOXD (HOXA1, HOXA2 for gene number)
Gene duplication and neo functionalization lead to 52 TF in humans = specialization = more complex structure / function in humans than in insects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

How are HOX genes conserved between species?

A

Conservation of HOX genes between drosophila and humans
Many of human HOX genes were already present in drosophila (ancestral versions)
But neo functionalization allows human to be more complex than a fly (8 vs 52 genes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Give some examples of mutation of HOX genes and their impact

A

HOXD13: patterning of fingers is impaired
HOXA2: impacts ear development
HOXB1: eye and face development

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is the HOX vertebrae common ancestor?

A

Branchiostoma lanceolatum: Oldest vertebrae known to have 1 cluster of HOX genes, ancestor of humans
Marine fish-like chordate (vertebrae)
Displays features of last common ancestor
1 cluster (15 genes) of HOX genes: barely has an appendage/mouth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What HOX genes does a Sea lamprey display?

A

Sea lamprey: oldest vertebrae that has 4 clusters of HOX genes like humans
Before increase in body plan complexity
More HOX genes = more complexity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is genome duplication? Why is it more tolerated than single chromosome duplication?

A

Larger duplications than genes and segments is possible
Genome duplication: duplicating the entire genome (incl. transposons, regulatory elements)
One singular chromosome duplication is not tolerated well
Example down syndrome trisomy on chromosome 21, Edwards syndrome, trisomy 18, Patau syndrome, trisomy 13)
Leads to gene product imbalance and reduced life expectancy
Whole genome duplications (WGD) could be a source of speciation
Duplicating the entire genome is more tolerated
Eukaryotes contain 2 haploid gene sets (diploid)
Polyploidy: have multiple complete sets of chromosomes (entire genome is duplicated not only 1 chromosome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Where is polyploidy common and what are the 2 types?

A

Polyploidy is widespread in plants
80% of flowering plant species originated via polyploidy
Ex. oats, cotton, potatoes, banana, coffee
Polyploidy is common in invertebrates, fish and amphibians but rare in mammals
2 main types of polyploidy
Autopolyploidy: happens within the same species. mistake during meiosis makes diploid gametes instead of haploid gametes (4 chromosomes of each instead of 2)
Allopolyploidy: occurs between different species, hybrid reproduction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is autopolyploidy and what issues does it produce?

A

Multiplication of identical species within a single (sub) species
Fertilisation by unreduced gametes
Error in meiosis accidentally produces diploid gametes
1-40% frequency of formation
Very common in plants
Can reproduce Successfully but can’t breed with parent species (2n + n = 3n)
Allows speciation
Autopolyploids are more viable than allopolyploids (especially in plants) because each chromosome has a homologous partner and can form a bivalent in meiosis
Issues=can induce disease symptoms
Genomic shock: widespread activation of transposons, gene expression, recombination
Things that are not meant to be repressed / activated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is allopolyploidy?

A

Hybridization between 2 species reproductively compatible species that are very similar ex. only recently split in evolution
One step model (most common route): both / one parent(s) have unreduced gametes (diploid) due to error in meiosis = polyploid offspring (diploid + diploid = tetraploid
Two step model: hybridization between haploid gametes followed by somatic doubling (after mating duplication event)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What are the benefits of whole genome duplication?

A

Raw material for evolutionary diversification
Functional gene divergence
Defence against mutation (If one gene looses it’s function, another gene can replace it’s function)
Buffer against environment (and extinction)
Colonise new environments
Fitness consequences (Increases cell size, Organ size, Faster growth, Dosage regulated gene expression)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Locus

A

each gene has a locus which is a specific position on a pair of homologous chromosomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Allele

A

alternative form of a gene. each parent donates one allele for every gene

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Homozygous

A

alleles are identical. Same genetic variant in the two alleles in gene locus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Heterozygous

A

alleles are different. Different genetic variants in the two alleles in a gene locus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Genotype

A

combination of two alleles (maternal and paternal) for each gene

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Dominant alleles

A

always upper case. Gene that will be expressed if two alleles are different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Recessive alleles

A

always lower case. Masked if two alleles are different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Phenotype

A

Physical manifestation of genotype

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What is an SNP and how often do they occur?

A

SNPs: DNA sequence variations that occur when a single nucleotide (A, T, C, G) in the genome sequence is altered
Example: AATCGAC –> AAGCGAC
For a variation to be considered an SNP, it must occur in at least 1% of the population
SNPs make up 90% of all human genetic variation
SNPs occur approximately every 1000 bases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Why are SNPs important?

A

Can affect how humans develop diseases
Can affect how an individual responds to pathogens
Can affect how an individual responds to drugs, etc
In biomedical research for comparing regions of the genome between cohorts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Where do SNPs occur in the genome?

A

Intergenic region: a transcription factor or enhancer/regulatory sequence
In promotor or transcription factor binding region
In exon: affects amino acid sequence = affects protein (example a premature stop codon truncates the protein)
In intron: can be a regulatory region example mutation in splice site affects splicing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

What are the 2 categories of disease associated SNPs?

A

SNPs may be a direct cause of a disease or signal for increases chance of disease
Disease associated SNPs fall into 2 categories:
Monogenic: One SNP in one gene. One nucleotide change leads to disease. Easy to detect / analyse
Simple traits
Polygenic: SNP in multiple genes
Multiple nucleotide changes affect chance of disease
Hard to detect
Complex traits
Example familiar vs sporadic Alzheimer’s
Familiar: 5-10% of cases, SNP in single gene (APP)
Sporadic: late onset due to polygenic SNP

52
Q

What are the two types of coding SNPs?

A

Coding SNPs: disease causing as they affect the aa sequence and the protein
2 Types of coding SNPs
Synonymous (silent): Affected codon codes for the same amino acid so mutation is silent
Non-synonymous: affected codon codes for a different amino acid - can be detrimental and change protein

53
Q

Explain transition vs transversion with SNPs

A

Transition: most common substitution. Replacement of purine by another (example A –> G) or pyrimidine by another (example T –> C)
Transversion: less common. Replacement of purine by pyrimidine or vice versa (A–>C, A–> T, G–>C). Changes biochemistry and structure of DNA
Transition : Transversion ratio varies within genome and used to assess GWAS data quality
Across entire genome averages around 2 (in stable genomes)
In protein coding regions usually higher, around 3 due to transversions in third base of a codon being more likely to change the encoded amino acid

54
Q

Explain how SNPs in Apolipoprotein E can lead to Alzheimers disease

A

SNP’s are not always the absolute indicators of disease
Apolipoprotein E (ApoE) contains 2 SNP’s that result in 3 possible alleles for the gene: E2, E3, E4
Protein product of each gene differs by one amino acid
Person who inherits at least one E4 allele is more likely of developing Alzheimer’s
Person who inherits at least one E2 allele is less likely to develop Alzheimer’s
E3 is neutral / no effect on inducing disease
Someone who has inherited two E4 alleles may never develop Alzheimer’s
Inheriting two E2 alleles may cause Alzheimer’s
Alzheimer’s (and heart disease, diabetes, cancer) is caused by variations in several genes
Sporadic Alzheimer’s: ApoE gene alone is not responsible for Alzheimer’s. Need multiple genes to develop the disease

55
Q

Where are disease associated SNPs in the non-coding genome located? give an example of a disease.

A

Disease associated SNP’s are usually in regulatory DNA sequences: Enhancers, Promotors, AD boundaries, Long non-coding RNAs
Approx. more than one million enhancers in human genome
98% of type 2 diabetes associated SNP’s are non-coding
Most of genome is non-coding: 80% of SNP’s are found in non-coding region

56
Q

How can SNPs disrupt splice sites? Give an example

A

Introns are spliced out by activation of splice sites (found 2-3 nucleotides in front of exon)
SNP’s can affect splice sites and splicing –> affects exons –> affects protein produced

OAS1 Gene is associated with type 1 diabetes
Intron 6 AG-AA variant shifts 3’ splice site by 1 nucleotide
Changed the reading frame of exon 7 resulting in a longer protein

At least 10% of all mutations causing human inherited disease disrupt splice site consensus sequences
Can cause total loss of associated exon
SNP can also introduce a cryptic splice site
Auxiliary sequences: stimulate splicing and found in exons (exonic splicing enhancers, ESE) and introns (Intronic splicing enhancers, ISE)
SNP’s in auxiliary sequences causes impairment of splicing

57
Q

What are insertions and deletions (indels) and what do they cause?

A

Indels: more likely to change function of protein than an SNP
90% of variation in the genome is due to SNP’s, the rest is Indels
Can cause: disrupted start codon, disrupted stop codon, disrupted splice site, frame shift
Frameshift mutation (base is inserted or deleted): alters the codon and changes reading frame so that all downstream codons are out of frame
Can either cause a protein that is too long or too short due to stop codon

58
Q

What is a GWAS and what is it used for?

A

Genome Wide Association Study
Requires sequencing of thousands of genomes
500 people that have disease and 500 people that don’t have the disease
Scan for SNPs that are higher frequency in people with disease and lower frequency / not present in people without disease
GWAS has been able to identify genetic variations that contribute to risk of: Type 2 diabetes, Parkinson’s disease, heart disorder, obesity, Crohn’s disease, prostate cancer

59
Q

What are the challenges of GWAS?

A

Accurately identifying the SNP’s
Coding mutations alter the amino acid sequence of a protein = effect is clear
Usually SNP’s in GWAS are in non coding regions (since genome is 98% non-coding) so less clear
Not all genes are functional
50% of human genes show tissue specific expression
GWAS can identify candidate SNPs but confirmation requires additional work
GWAS is a starting point

60
Q

How are GWAS results visualised?

A

Use a Manhattan plot to visualise GWAS findings
-log10 P-value (frequency of SNP) vs chromosome number
The more significant the P-value, the higher the change the disease is associated with the SNP

61
Q

What was the 100,000 genomes project?

A

Genomics England project in collaboration with NHS
Aims to sequence the genomes from 70,000 people
Participants have a rare disease or cancer
Genomes of families are also sequenced
Identified variants associated with the disease
Patients can be offered a diagnosis when this wasn’t possible before
Problem with project: diversity missing, majority white Caucasian

62
Q

How does linkage disequilibrium affect GWAS?

A

Linkage disequilibrium: association of alleles at two or more loci within a population
During crossing over some parts of the chromosome always travel together (bias)
SNP’s close to each other or on same chromosome are in LD
Haplotypes don’t occur at the expected frequencies - are random
If 5 SNP’s are associated more frequently than normal and one is the cause of the disease - don’t know which one is causing the disease
Would need to test every single SNP

63
Q

What is expression Quantitative Trait Loci (eQTL)?

A

Most SNP’s are non coding (98% DNA is non-coding) so found in regulatory elements
Combines GWAS (identifies SNP’s in non coding regions) and RNA-seq (next generation sequencing that measures amount of mRNA production by gene / measures gene expression)
Identifies SNPs in non-coding regions that are responsible for changes in gene expression
SNP’s will most likely be enhancers or promotors that regulate gene expression
Disease associated SNP which is an eQTL can be responsible for disease
eQTL mapping allows regulated genes to be identified as they are unlikely to be closest to a disease associated SNP
Cis eQTL: gene affected by SNP is found on same chromosome
Trans eQTL: gene affected by SNP is on a different chromosome

64
Q

What is SNP genotyping?

A

SNP genotyping: uses microarrays (chips) to identify presence of SNPs in an individual
Contains all possible SNP combinations for the gene (probes)
Specific SNP = specific study
For Affy SNP array: probe contains ATTCATG
On the array will be another probe for the alternative SNP: ATTTATG

65
Q

What factors/elements are needed for gene regulation?

A

Non-coding regions: enhancers and promotors. They are in cis (on the same chromosome as the gene)
Enhancer region: distal to the gene, 10-100kD’s (1Kd = 1000 nucleotides), activators/repressors, act as switches
Promotor: proximal to the gene
Genome forms a loop: mediated by DNA-bending proteins and transcription factors
DNA binding proteins (transcription factors): bind enhancer and promotor DNA and activates them (enhancer = switch, TF = flips the switch)
Mediator proteins: recruit DNA polymerase II and start transcription

66
Q

Why do eukaryotic organisms have multiple different cell types?

A

Eukaryotic organisms have multiple different cell types
Different shapes, different functions, populate different areas of the body
All cells have the same genome but not all genes are expressed in every cell due to gene regulation

67
Q

What are transcription factors? Give an example. When specifically are they important?

A

Example: HOX genes
DNA binding proteins with a DNA binding domain (zinc-finger, homeodomain). Bind specific DNA sequences (motifs), high sequence specificity
Non coding SNP’s are detrimental because one change in the enhancer site can affect TF binding
Can activate or repress gene expression by modulating promotor and enhancer activity
Some TF’s are only repressors or only activators but some can do both
Determines if switches (enhancers/promotors) are ON or OFF
Very important during development: during cell differentiation because need to activate or repress genes at a specific time

68
Q

What is the nucleosome?

A

DNA is condensed in chromatin
Nucleosome: 146 bp of DNA is wrapped around 8 histone proteins
Histone octamer: 2 copies of each histone (2X H2A, H2B, H3, H4)
Interactions between DNA and histones is sequence independent: hydrogen bonding + ionic interactions with sugar phosphate backbone
Chromatin is made of repeating units of nucleosomes
Nucleosomes are disassembled during replication

69
Q

What is chromatin?

A

Chromatin is made of repeating units of nucleosomes
There are 2 types of chromatin
Euchromatin: active form, uncondensed, TF can bind
Heterochromatin: silent form, condensed/compact, TF can’t bind

70
Q

What are pioneer transcription factors?

A

Not all TF’s bind DNA in the same way
Pioneer factor mechanism
Can bind condensed heterochromatin by recognizing a motif (usually TF can’t bind heterochromatin) and recruits other TF’s
ATP hydrolysis by BAF, NuRF, ISWI makes DNA accessible and other TF can now bind
Pioneer factors have a dual role
Passive role: permanently bound which speeds inductive responses
Active role: pioneer factor only binds when needed

71
Q

What are topologically associated domains (TADs)?

A

Enhancers are cell type specific and can’t interact with all promotors
TADs determine which enhancers can interact with which promotors
Are fundamental units of three dimension (3D) nuclear organisation
Regions bordering TADs are called TAD boundaries
Enhancers and promotors can only interact within TAD boundaries
Look at image in notes
Mutation disrupting TAD boundary = enhancers can activate genes (PAX3) that are not meant to be activated

72
Q

How do cohesin and CTCF define TADs?

A

Loop extrusion model
Transcription factor (CTCF) binds CTCF binding site
Cohesin ring pulls DNA out of the ring until 2 CTCF molecules meet and TAD boundary is closed
Loop extrusion model determines the formation of the TAD
TADs can be very long (880kb in mice) and have similar sizes in non-mammalian species
Enclosure of genes and their respective enhancers and promotors: used to get them closer to optimize usage of space

73
Q

Explain phosphorylation of the CTD in RNA polymerase II

A

RNA polymerase II: transcription to synthesize mRNA
Pre initiation complex recruits RNA pol II –> paused RNA pol II –> elongation –> termination
C-terminal domain of RNA pol II is important in transcription
Long domain with many repeats and highly conserved
Phosphorylation of repeats in CTD is important in transcription
Stage dependent phosphorylation: different phosphorylation label different stages of transcription
Phosphorylation is species specific but the repeats are conserved

74
Q

Explain promotor proximal pausing of RNA pol II

A

Widespread genome wide (not only early response genes)
Regulated by many proteins ex. NELF and DSIF that block RNA pol II
Important to quickly transition to productive elongation
Keep genes poised/prepared to be activated when needed
Will only need a few phosphorylation to activate genes quickly at a specific time
Important in development and heat shock genes
Transcription is stopped after 50bp
Ser5 of CTD is phosphorylated

75
Q

What is epigenetics?

A

Epigenetics: external modifications of DNA that don’t affect the amino acid sequence to regulate gene expression
Example DNA and histone methylation
Epigenetic modifications can be inherited
Are reversible and self-perpetuating

76
Q

What is the function of histone H1?

A

Histone H1: additional histone that keeps the nucleosome together

77
Q

How are histones covalently modified?

A

Methyl / acetyl group can be added to a histone
Histone methylation is the most common epigenetic modification
This is NOT the same as DNA methylation
HISTONE methylation: can either repress or activate gene expression depending on which groups are methylated

78
Q

How are heterochromatin and euchromatin distinguished in terms of their histone modifications?

A

Histones present different modifications on heterochromatin and euchromatin
Act as a flag to label the different states of chromatin
Euchromatin has lysine acetylation and arginine methylation
Heterochromatin has lysine methylation, lysine ubiquitination

79
Q

What is constitutive heterochromatin?

A

Methylation of lysine 9 of histone 3 (H3K9me2/me3)
3 methyl groups on lysine 9
DNA is always kept as heterochromatin (inactive)
Found in regions that you never want to activate
Example transposable elements (parasitic elements)

80
Q

What is facultative heterochromatin?

A

Methylation of lysine 27 of histone 3 (H3K27me2/me3)
3 methyl groups (trimethylation)
Temporary heterochromatin
Genes that don’t need to be expressed in the moment but need to be activated later on

81
Q

What is the H3K27 methylation and what protein deposits it?

A

H3K27 methylation is deposited by PRC2
PRC2 is a 3 protein complex and found in regions that are methylated temporarily (to inactivate gene expression)
EZH2, EED, SUZ12 proteins are always be present
EZH2 is the most important catalytic subunit (enzyme)
Other subunits may also be involved in the complex

82
Q

Explain how histone modifications affect enhancers?

A

Different types of enhancers (active/repressed/poised) have different types of histone modifications
Active enhancers have different modifications than active promotors
ChIP-sequence = used to map active/repressed/poised enhancers and promotors
H3K27 acetylation are common in active enhancers
If this is removed, enhancers still function and genes are still activated
Don’t know why: maybe because necessary to keep enhancers active
Histone modifications are conserved between species

83
Q

How are epigenetics related to cancer?

A

Epigenetics plays a role in the development of cancers
Epigenetic change that silences a tumour suppressor gene (gene that controls growth of cell) can lead to uncontrolled cell growth
Change that turns off genes that repair damaged DNA = increase in DNA damage = increases cancer risk

84
Q

What is X-inactivation?

A

Females have 2 X chromosomes so either the maternal or paternal X chromosome is randomly inactivated/silenced
Occurs in embryonic development around gastrulation in mammals
Occurs after initial cell division so a different X can be inactivated in individuals cells/tissue
Once it has occurred in a cell, all it’s descendants will maintain the same inactivation
X chromosome (Barr body) is silenced by histone modifications –> DNA condenses as heterochromatin –> transcriptionally inactive
Inactivation carried out by Xist gene - a long non-coding RNA that recruits H3K27me3 to form facultative heterochromatin

85
Q

Explain how X chromosome inactivation leads to tortoiseshell cats

A

Tortoiseshell cats have a unique pattern of coat colour due to XCI
All tortoiseshell cats are female
Black and orange alleles of fur colour gene are on the X chromosome
If cat is heterozygous, fur colour is dependant on what X chromosome is inactivated (random)
If you clone a tortoiseshell cat you won’t get the same copy

86
Q

What are agouti mice?

A

Agouti gene is associated with bodyweight and fur colour
Two mice, mother is light + fat and offspring is dark + skinny are genetically identical
The difference is the mother was provided with a methyl-rich diet 2 weeks before mating
When mouse agouti gene is unmethylated: yellow coat, obese, prone to diabetes
When agouti gene is methylated: brown coat colour and low disease risk
Methylation leads to repression of the agouti promotor = gene not expressed

87
Q

Explain how epigenetic changes affects twins

A

Genes in identical twins are identical so differences are due to epigenetic changes
Environment and diet can affect epigenetics
The older the twins, the more epigenetic changes, the more different
Can label histone modifications on fluorescent probes
Chromosome pairs in each set of twins are superimposed
One twin’s epigenetic tags are dyed red, the other’s green
When red and green overlap, the region shows up as yellow (same epigenetic changes)

88
Q

Explain how epigenetic changes were inherited during the dutch famine

A

Epigenetic changes can be inherited
During the Dutch famine, diet was poor in methyl groups
People who were then conceived had less methyl groups on insulin-like growth factor II (IGF2)
Long term effect: children suffered from obesity and cardiovascular diseases
F2 generation had higher weights and BMI in offspring of exposed F1 fathers than in unexposed F1

89
Q

Explain the erasing of methylation during fertilisation

A

DNA of sperm is highly methylated and eggs also methylated but less
Once egg is fertilized, most of methylation is erased, especially from paternal genome
Methylations are converted to hydroxymethylation which is diluted out as cells divide
As embryo develops, methylation is lost further from maternal genome up to blastocyst stage
Not all methylation is erased so inheritance is possible
After this stage, cells differentiate and DNA becomes methylated again to produce specialized cells

90
Q

How is methylation inherited?

A

Methylation patterns are usually erased in primordial germ cells
Methylations are converted to hydroxymethylation which is diluted out as cells divide
Some residual DNA and Histone methylations persist in the fertilized egg that signal how to remethylate once cell division starts

91
Q

What is imprinting?

A

Inherit 2 working copies of a gene, one maternal and one paternal
Imprinted genes: only inherit one working copy that is either maternal or paternal
The other copy is silenced by epigenetics (permanently methylated)
Epigenetics tags on imprinted genes stay for the life of the organism
80 imprinted genes in humans and mice (out of 22,000)
Imprinted gene is at an increased risk of disease because only one working copy is present

92
Q

What is the genetic conflict hypothesis?

A

Only a hypothesis
Males can father multiple offspring with multiple partners at the same time with low cost of personal resources
Females can only produce one set of offspring at a time and is very resource costly
Costs are greater for the mother than the father (mother has to carry the baby)
Father want’s offspring to be big to ensure survival
Mother needs needs to balance big offspring with costs to herself so wants to limit size
Imprinted genes are involved in growth and metabolism

93
Q

How does epigenetic regulate processes in plants?

A

Flowering and colour is regulated by epigenetics in plants
Flowering is controlled by genes affected by environmental conditions (temperature, humidity, light) that change epigenetics
Ensures production of flowers even when plants are growing under adverse conditions

94
Q

What is DNA methylation and where does it occur?

A

Histone and DNA methylation are different
DNA methylation is an epigenetic modification
It is reversible
Methylation of position 5 of cytosine by methyltransferases
In mammals: occurs at CpG sites called CpG islands (in GC rich regions)
Methylation of CpG islands represses gene expression: forms compacted chromatin which prevents binding of TF’s and represses transcription
learn to draw image in notes

95
Q

How can DNA methylation be used to identify promotor regions?

A

Promotors are rich in GC’s so CpG islands can be used to identify promotor regions

96
Q

What are the 3 types of methyltransferases in DNA methylation?

A

3 types of methyltransferases in mammals: DNMT3a and DNMT3b in normal conditions
DNMT1 only works during mitosis - Hemi methylated DNA is created, where the copied strand is unmethylated. It reproduces methylation of the other strand

97
Q

How is loss of DNA methylation related to disease?

A

DNA methylation patterns in disease tissues are different to those in normal tissues
Permanent loss of methylation in: cancer, Alzheimer’s, neurodegenerative diseases
Loss of methylation = activating promotors = activating genes that are meant to be repressed ex. oncogenes
Abnormal methylation silences tumour suppressor genes

98
Q

In which genomic regions does DNA methylation occur?

A

Promotor regions - loss of methylation leads to disease
Transposable elements are methylated as they need to be repressed via histone and DNA methylation
Intergenic regions - usually methylated ex. enhancers
Repetitive elements - usually methylated
Gene upstream regions - usually unmethylated

99
Q

What is the effect of DNA methylation at splice sites

A

Methylation in a splice site deactivates it (silencing)
Genes have different isoforms

100
Q

What is the effect of DNA methylation at a promotor region

A

Methylation of promotor deactivates it (silencing) and prevents transcription
Genes have more than one promotor to be able to produce proteins of different lengths with different functions
Main promotor is unmethylated and other promotors are methylated
Aberrant promotor = created by the insertion of repetitive elements and want to repress it
If both promotors bind RNA pol II they could collide

101
Q

What is the effect of DNA methylation at repetitive elements (transposons)

A

Transposable elements are highly mutagenic so methylation protects genome from them
Methylation prevents recombination and translocation
Methylated C mutates to T over time so prevents transposition

102
Q

How is DNA methylation linked to cancer? Transposable elements

A

Genomic instability: unmethylated transposable elements can move in the genome and disrupt existing genes
Disruption/deactivation of a tumour suppressor gene (ex. P53) leads to uncontrolled cell growth/cancer
Can also lead to epithelial mesenchymal transition: immobile cell converts to mobile cell which causes metastasis
Transposable elements can land inside a tumour suppressor gene OR in it’s promotor or enhancer
Hypomethylation at intergenic regions, repeats, transposable elements causes genomic instability and is found in all cancers

103
Q

How can DNA methylation be identified using MeDIP-Seq?

A

Methylated DNA immunoprecipitation
DNA of interest (disease and normal) is fragmented by sonication (waves) and denatured
Separate non-methylated from methylated DNA by using an antibody that binds to the methylated (5-methylcytosine)
Antibody in solution is bound to magnetic breads to use a magnet to isolate methylated DNA
Separate by immunoprecipitation
Isolated fraction (methylated) are sequenced by next generation sequencing
Sequences mapped back onto reference genome by alignment software
Example: identifying breast cancer

104
Q

How can DNA methylation be identified using Bisulphite sequencing?

A

Sample is treated with bisulphate
Converts cytosine to uracil but 5-methylcytosine is unaffected (methylated can’t be converted)
After transcription, unmethylated cytosine will appear as uracil and methylated as cytosine
Treated samples are sequenced and compared/mapped to genome to determine methylation in cancer vs normal cells
Higher cost, greater resolution

105
Q

What are long non coding RNA sequences and what is their function?

A

Non-coding RNA is a sequence that does NOT have a start codon, a reading frame or a stop codon. Are polyadenylated and transcribed, but don’t code for a protein
ncRNA longer than 200 nucleotides are IncRNA
Can help regulate gene expression, target different aspects of gene transcription mechanism
Co-regulators to modify transcription factor activity
Can help stabilize a TAD’s

106
Q

What is Xist?

A

Xist is a long non-coding RNA, 17Kb long (17,000 nucleotides)
Xist is randomly expressed on one of the two X chromosomes
Xist works in Cis (functions on same chromosome on which it is expressed)
Xist RNA coats the inactive X chromosome
Expression of Xist is the first detectable event in X inactivation

Xist contains 6 repeats (RepA, RepB, …)
RepA is required for the silencing function of Xist
Rep A binds Xist to histone methyltransferase complex Polycomb Repressive Complex 2
Protein complex with many subunits
Deposits histone methylation (H3K27me3) along the chromosome

107
Q

What is HOTAIR?

A

HOTAIR: Regulator of HOX transcription factors
Works in Trans: expressed from HOXC locus on chromosome 12 but represses HOXD locus on chromosome 2
Binds to PRC2 and LSD1 (H3K4me3 demethylase) so acts as a scaffold
PRC2 adds repressive H3K27me
LS1 removes active H3K4me
Usually methylation is repressive expect lysine 4 which is an active methylation
Combined function produces repressive chromatin structure to repress HOX genes when they are not needed
HOX genes need to be expressed at specific time points in specific regions in the embryo
In cancer: HOTAIR acts on regions other than HOXD (represses regions that are not meant to be repressed)

108
Q

How are long non coding RNAs correlated with disease?

A

Abnormal activity of lnc-RNAs is often associated with disease mechanisms
Abberantly active lnc-RNA can mis-regulate genomic loci in-trans and in-cis
Example: abnormal neuronal death in Alzheimer’s
Abberant activity of lnc-RNA MEG3 triggers necroptosis which induces neuron death
MLAT-1 and NEAT-1 in ALS and Huntington disease. In LAS affect neuronal structure and function of RNA binding proteins
MALAT-1 and HOTAIR in Parkinson’s
FMR4, 5, 6 in fragile X syndrome

109
Q

What are transposable elements and how much of the genome do they compose?

A

Transposable elements and other repeats account for 50% of human genome
Parasitic elements that want to propagate
Move from one location in the genome to another

110
Q

What are the sources of transposons?

A

Retroviruses inserted back into the host genome
Old sequences that integrate back into the genome
Degenerated ribosomal RNA

111
Q

What are class 1 retrotransposons?

A

Copy and paste mechanism to propagate
Transcribe DNA sequence using RNA polymerase from host
Use retro transcriptase to convert RNA back to DNA Retro transcriptase is produced by the DNA of the TE
Use transposase to cut genome at a different location and insert DNA transposable element

112
Q

What are the 4 types of retrotransposons?

A

Autonomous retrotransposons:
LINES (long interspaced nuclear elements) and ERVS (endogenous retroviruses)
Autonomous because they have many ORF’s to encode for the enzymes they need

Non-autonomous retrotransposons:
SINES (short interspaced nuclear elements) also known as Alu elements and SVA’s don’t have the enzymes
Only functions when there is an active LINE that is producing the retrotransposons

112
Q

What are class 2 DNA transposons?

A

Cut and past mechanism
Excision of DNA element out of genome
Use transposase enzyme that cuts DNA out of region and inserts it in a different region
Integration into target DNA
Don’t multiply but can still move in the genome
Autonomous

113
Q

Explain the activity of transposons in the human genome over time and how many transposons are still active

A

Genome represses TE’s but sometimes escape methylation = dangerous
So most TE’s lose their ability to transpose over time
By accumulation of mutations in the ORFs that encode for the proteins that regulate the transposition
Currently only a few LINES, Alu’s and SVA’s are a still active in the human genome
Not a single ERV or DNA transposon that are still active
20% of genome is LINES

114
Q

How are transposable elements silenced?

A

TE’s are primarily parasites and must be silenced
By DNA methylation or Histone methylation (primarily H3K9me3)
Different types of small/short non-coding RNAs (especially pi-RNAs) piRNAs target transposable element for degradation
KRAB-ZINC-FINGER proteins - TF’s that bind transposons and recruit methyltransferases

115
Q

How are timing and context of transposition important?

A

Transposon in somatic cell ex. skin cell –> somatic polymorphism –> not inherited
Can lead to cancer ex. if it lands in a tumour suppressor gene
Transposon in germline cell ex. sperm –> germline polymorphism –> inherited
More detrimental as it is inherited by every cell

116
Q

How is mutualism present in transposable elements and what are the 4 mechanisms?

A

Some TE’s present interesting features which make them functionally useful for the host genomes so are not repressed / methylated
Transposons are domesticated / recruited / co-opted
Mutualistic relationship
4 mechanisms of co-option to keep transposons unrepressed
1. TE-derived promotors and enhancers
2. TEs act as TAD boundaries
3. TE-derived lncRNA
4. Transposase transcription factor fusions

117
Q

Explain TE-derived promotors and enhancers by co-option/mutualism

A

Transposons can be co-opted as functional promotors / enhancers
Transposon moves into enhancer or promotor by chance and contains a sequence recognized by TF which boosts expression of the gene
Many examples of TEs co-opted as functional promotors and enhancers
example SVA’s

118
Q

Explain SVAs as an example of co-option of TE as promotors/enhancers

A

SVA’s (transposons) are important in the development of human brain
Most SVAs are repressed except in the hippocampus
Hippocampus is the part of the brain responsible for memory, communication, learning, navigation AND neurogenesis (makes new neurons)
Human hippocampus is bigger than expected
Differentiation: Stem cell –> neuronal progenitors (INPs) –> neuron
Human hippocampus is bigger because INPs proliferate more than in other species
INPs expresses TBR2 transcription factor
Human specific SVAs have TBR2 binding sites so co-opted as TBR2-modulated enhancers in human INPs

119
Q

What are SVAs?

A

SVA’s make up 0.3% in human genome
Youngest group of transposons so can still move in genome because have not accumulated enough mutations
Hominid specific (found in ancestors of the great apes)
Made of 2 SINE elements with a variable number of terminal repeats
Divided into subgroups (A, B, C, D, E, F)
3000 SVAs in humans: E and F subgroups are human specific

120
Q

How can transposable elements act as TAD boundaries?

A

Most CTCF binding sites come from SINEs and ERVs

121
Q

How are transposable elements derived from lncRNA?

A

Example LINE produces lncRNA important for cortical development
Developmentally timed expression of TE-derived lnc-RNAs

122
Q

How do transposable elements lead to transcription factor fusions

A

Transcription factors need a DNA binding domain
Transposases have DNA binding domains (as they need to bind, cut, insert DNA in new location)
Transposase domain of transposon lands next to ancestral gene = fusion to get a 2-domain gene with DNA binding abilities
Example: PAX family of TF’s evolved from fusion. PAX6 produces eyes

123
Q

Explain the tail loss evolution in humans and apes

A

Tail-loss evolution in humans and apes
Evolutionary advantage of not having a tail in humans
All primates have Alu element in TBXT gene
Only apes (no tail) have another Alu element (random). During splicing, two Alu elements dimerize so exon 6 gets trapped in a loop so mature RNA does not contain exon 6
When remove exon 6 from a mouse, mouse will have no tail
Completely random insertion/coincidence of Alu element caused humans to have no tail

124
Q

What are some examples of TE derived traits?

A

Amylase in saliva in hominoids - insertion of ERV
Prolactin in endometrium (MER39)
Corticotropin releasing hormone and platencin in the placenta (THE1B)