L2: Comparative Genomics Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Is it correct to assert that more genes leads to more diversity/complexity?

A

Gene number is not correlated to organism complexity: Human and c. Elegans have around the same number of genes (~20k). Sea urchins have ~23k while rice has ~50k. Some species have also undergone full-genome duplications!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe an example of three or four genes with high sequence and functional similarity in mammalian genomes

A

Drosophila contain one Notch receptor (dNotch) that is bound by two transmembrane DSL-ligands (Delta and Serrate).

Mammalians possess four Notch receptors (Notch1–4) and five ligands (Jagged1 and 2, which are homologous to Serrate, and Delta-like (Dll) 1, 3 and 4, which are homologous to Delta).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is notch important for and how conserved are the genes encoding it?

A

Notch important for stemness and differentiation- the domains are conserved in the distant homolog in drosophila. They have been highly conserved over millions of years of evolution both ways. There are clear structural changes in both the protein binding domains and ligand binding domains in terms of the length.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How may these changes in Notch have arisen?

A

This can arise from segmental duplication before diversifying. Quite dramatic events where the genome didn’t separate correctly during mitosis. Hypothesised to have been during times in which there was a great diversification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What do Notch, serrate, jagged and delta genes encode for ? (More detail)

A

Notch receptors are expressed on the cell surface as heterodimeric proteins. Their extracellular portion contains 29–36 epidermal growth factor (EGF)-like repeats that are associated with ligand binding, followed by three cysteine-rich LIN repeats that prevent ligand-independent signalling, and a heterodimerization domain.

The intracellular portion of the receptor harbours two protein interaction domains, the RAM domain (R) and six ankyrin repeats (ANK), two nuclear localizations signals (NLS) and a transactivation domain (TAD, which has not yet been defined for Notch3 and 4), and a PEST (P) sequence.

Notch ligands are also expressed as membrane-bound proteins. They all contain an amino-terminal DSL domain (Delta, Serrate and Lag2) followed by EGF-like repeats. Ligands of the Serrate family also harbour a cysteine-rich (CR) domain downstream of the EGF-like repeats.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What significant event happened twice to the vertebrate genome in the last 500 million years?

A

The vertebrate genome duplicated twice in the last 500 million years:

It has been proposed that more than 450 million years ago, two successive whole genome duplications took place in a marine chordate lineage before leading to the common ancestor of vertebrates. A pre-vertebrate genome composed of 17 chromosomes duplicated to 34 chromosomes and was subject to seven chromosome fusions before duplicating again into 54 chromosomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What remnants of these duplication events exist in our own genome? What is the relevance of this for evolution?

A
  • Most genes have multiple paralogs
  • Gene Paralogs could have specialised functions or specialised expression
    patterns
  • The events of genome duplication are at the onset of explosions of diversity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the special case of Xenopus?

A

Species like Xenopus Laevi (claw-frog) have undergone addional genome duplications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How could this special case of Xenopus arose?

A

This could have arose when an ancestral species S and an ancestrous species L which individually had diploid homolougous chromosomes but did not have homology with each other. This means that they were sterile with respect to each other and could not produce fertile offspring.

Mating between the two species would then result in egg cells with hybrid S/L haploids which cannot produce fertile offspring. This could happen a bunch when in the same pond, however a rare event may have caused the haploid chromosomes to replicate to produce homeologous chromosomes (duplicated genes or chromosomes that are derived from different parental species and are related by ancestry). This allotetraploid offspring would then be fertile, can produce sperm, eggs and offspring. Through research they found that chromosomes could sometimes duplicate sporadically and were no longer sterile.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is this chromosome replication relevant to us?

A

Sometimes this can happen in cells in the body without much disruption. It might be better to have this than only some stuff duplicated; we know what happens with an extra chromosome 21; maybe the balance is restored with a complete new set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is often inferred from conserved regions of the genome?

A

Exons are particularly well conserved, but there are also super well conserved non-coding sequences throughout the evolution of vertebrates. The idea is that if they are so well conserved, they must be important. Conservation is a predictor of function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are meant by ultraconserved elements?

A

Ultraconserved elements (UCE) are
>200 bp sequences which have not
changed in > 100 Million years

These ultraconserved elements of the human genome are most often located either overlapping exons in genes involved in RNA processing or in introns or nearby genes involved in the regulation of transcription and development. They are more highly conserved between these species than are proteins.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are many UCEs part of according to Jacobs? What is a surprising finding regarding UCEs?

A
  • Many UCEs are part of (neuronal)
    enhancers - Enhancers are modulators of the main switch, the promoter.
  • Many UCEs are not essential for
    normal development
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are meant by human accelerated regions?

A

In some cases these super conserved regions undergo changes in humans. These are known as human accelerated regions and are often non-coding elements. Elements that have stayed conserved for a long time (= function), which have rapidly changed between chimp and human. Hundreds of Human Accelerated Regions (HARs) have been identified.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What functions do many HARs play?

A

Many HARs are part of enhancers / gene regulatory elements. Some HARs show differential regulation between different species-configurations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Give an example of a human accelerated region (2)

A

HAR1 forms a brain-expressed non-coding RNA molecule with a
human-specific secondary structure, yet with no known function

HAR2 is a developmental enhancer near GBX2 which is expressed in muscles of your hand, particularly your thumb. They found that there were 16 human specific mutations in the HAR2 which regulates the expression of this thumb gene.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What have studies on HAR2 in chimps and monkeys shown?

A

Chimp and monkey versions of HAR2
do not regulate the thumb-gene
HAR2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Give examples of how large scale genomic variation between human and
chimpanzee genome can look?

A

There are deletions and insertions and well as double breakpoint inversions, pericentric inversion and single breakpoint inversions across the genome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are meant by inversions?

A

Inversions are generated when two double-stranded breaks are introduced into the chromosome and are rejoined such that the gene order of sequence between the breakpoints is reversed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are considered to be the two types of inversions

A

There are two types of inversion, paracentric and pericentric, with the difference being whether the centromere is involved in the rearrangement. A pericentric inversion includes the centromere in the inverted segment, while a paracentric inversion does not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What two processes can generate chromosomal inversions?

A

Ectopic recombination and staggered breaks.

Ectopic recombination generates inversions via a recombination event between two homologous sequences (often transposable elements) oriented head-to-head along a chromosome. The homologous sequences recombine and are reintegrated into the genome so that the strand is positioned in the opposite direction.

Staggered breaks can generate inversions via the complete detachment of a DNA segment from a chromosome and its subsequent reattachment in the opposite orientation at the same position in the chromosome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is meant by the term ‘staggered’ breaks?

A

Such breaks are usually staggered, meaning that they will result in fragments with stretches of single stranded DNA at their extremities. The DNA repair mechanism that synthesises the reverse complement of these single stranded stretches then joins the DNA sequences back together in a non-homologous way, sometimes reinserting the DNA segment in the opposite orientation, creating an inversion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What happens when breakpoints are heavily staggered?

A

When the breaks are heavily staggered (i.e., with long single strand stretches), this results in inversions with duplicated sequences at both breakpoints in the derived sequence. In contrast, when the breaks are blunt or slightly staggered, cut-and-paste type breakpoints are created in the derived sequence, with no or small duplications, respectively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the functional implications of these inversions?

A

When segments are flipped this might not matter that much for genes are in the middle, but it might matter for those at the end where they are brought into closer contact with other genes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is missing from these large scale variation maps?

A

Single nucleotide polymorphisms not on the map

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

To what extent are SNPs present in human vs chimp genome?

A

In ~6 million years, after the split from the last common ancestor of human and chimp, about 1% of bases is substituted.- 35 million bp differences between human and chimp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How could these SNPs occur?

A

DNA damage repair can lead to nucleotide substitutions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Name 5 factors that could lead to DNA damage

A

Cellular metabolism
UV light exposure
Ionising radiation
Chemical exposure
Replication errors

29
Q

Name four things that result from DNA damage

A

Cell cycle checkpoint activation
Transcriptional program activation
DNA repair
Apoptosis

30
Q

Name five possible consequences of DNA repair

A
  • Direct reversal
  • Base excision repair
  • Nucleotide excision repair
  • Mismatch repair
  • Double strand break repair
    • Homologous recombination
    • Non-homologous recombination
31
Q

What is an R loop?

A

An R-loop is a three-stranded nucleic acid structure that consists of a DNA:RNA hybrid and a displaced strand of DNA.

32
Q

What is the functional significance of R loops?

A

R-loops occur frequently in genomes and have significant physiological importance. They play vital roles in regulating gene expression, DNA replication, and DNA and histone modifications.

Paradoxically, although they do play essential positive functions required for important biological processes, they can also contribute to DNA damage and genome instability. Recent evidence suggests that R-loops are involved in a number of human diseases, including neurological disorders, cancer, and autoimmune diseases.

33
Q

Where/when do R loops tend to form?

A

Genome-wide mapping studies for R-loop formation have shown that R-loops tend to form near transcriptionally active genes. In particular, they tend to form in promoter regions. R-loops commonly form during transcription when a nascent transcript reanneals to the template strand of DNA behind the extending RNAPII . This is the most accepted model of R-loop formation and is known as the “thread back model.”

34
Q

How else can R loops form and what is a consequence for the DNA?

A

Head-on collision of the transcription and replication machinery can also lead to R-loop formation. In Bacillus subtilis, R-loop accumulation leads to replisome stalling and mutagenesis, Similar results have also been shown in human cells

35
Q

Can R-loops form in cis or trans?

A

R-loops have also been shown to form in trans, using RNA molecules transcribed at a distant genomic locus. This was first demonstrated in a study using Saccharomyces cerevisiae whereby transcripts produced from the yeast genome were shown to form R-loops on a yeast artificial chromosome

36
Q

What protein do R loops depend on?

A

R-loop formation depends on Rad51, a protein involved in double-strand break repair via strand exchange. However, the precise mechanism is not fully understood.

37
Q

What features in DNA strands have high propensity for R loops?

A

R-loop formation is more efficient when a G-rich transcript is produced from a C-rich template; G clusters—short stretches of G nucleotides—on the RNA transcript favour R-loop formation.

Once R-loop formation is initiated, G clustering is less important for extension of the R-loop structure as long as the density of G nucleotides in the transcript is high

38
Q

What is meant by negative and positive supercoiling?

A

Negative supercoiling is the right-handed coiling of DNA thus winding occurs in the counterclockwise direction. It is also known as the “underwinding” of DNA. Positive supercoiling is the left-handed, coiling of DNA thus winding occurs in the clockwise direction. This process is also known as the “overwinding” of DNA.

39
Q

How is supercoiling related to R loops?

A

R-loops can absorb negative supercoiling, thereby relieving topological strain on the overall DNA molecule and stabilising the R-loop structure. Negative supercoiling can also promote R-loop formation

40
Q

Describe how one more factor can increase risk of R loop formation

A

Moreover, the presence of nicks downstream of the promoter in the nontemplate strand favors nucleation of an R-loop structure across a given gene. During transcription, RNA polymerase transiently unwinds duplex DNA, which reanneals efficiently behind elongating RNA polymerase. The presence of a nick in the non-template strand of DNA makes this reannealing less efficient, thus increasing the likelihood of the template strand to hybridise to the nascent RNA

41
Q

Therefore name six things which can induce R-loop fomation/ stabilisation

A

Negative supercoiling
Nicks in non-template DNA strand
Defects in RNase H
G-rich transcripts
Defects in RNA-DNA helicases
Secondary structures in displaced DNA strands

42
Q

Ok back to whats actually relevant… when is DNA more susceptible to damage and what can arise from this?

A

Single stranded DNA is more susceptible to this damage as it is more vulnerable e.g to ions. These spots of vulnerability are available during transcription, even in somatic cells which are not replicating anymore. This can make way for nucleotide substitutions and repeat expansions.

43
Q

How frequent are these malfunctioning DNA damage repair events?

A

A single cell can have up to 70,000 lesions per day apparently. This could be one nucleotide with another or the replacement of one which cannot hybridise with another.

44
Q

Give five types of DNA damage and how frequence they are in a cell per day

A

Cytosine deamination- adenine and/or cytosine nucleotides lose amine groups/Spontaneous deamination converts cytosine to uracil (which is excised from DNA by the enzyme uracil-DNA glycosylase, leading to error-free repair): 192 per cell per day

Depurination: a chemical reaction of purine deoxyribonucleosides, deoxyadenosine and deoxyguanosine, and ribonucleosides, adenosine or guanosine, in which the β-N-glycosidic bond is hydrolytically cleaved releasing a nucleic base, adenine or guanine, respectively.12k per cell per day

Depyramidination: Depyrimidination of a damaged nucleotide in DNA is mediated by a pyrimidine-specific DNA glycosylase. The glycosylase cleaves the N-C1’ glycosidic bond between the damaged DNA base and the deoxyribose sugar generating a free base and an abasic i.e. apurinic/apyrimidinic (AP) site. 600 per cell per day

8-oxoG: modifying guanine, and can result in a mismatched pairing with adenine resulting in G to T and C to A substitutions in the genome: 2.8k per cell per day

Single stranded break (SSB): 55k per cell per day

Double stranded break (DSB): 25 per cell per day

45
Q

What is the difference between single nucleotide polymorphism and single nucleotide substitution

A

Single nucleotide polymorphisms are within a species as there are a few different possibilities; between species are single nucleotide substitutions.

46
Q

How can you check whether a given SNP is pathogenic

A

SNP databases can show whether a given SNP is pathogenic.

47
Q

Which part of the genome has SNPs?

A

These are all across the genome, no part of the genome is devoid of SNPs.

48
Q

What are the consequence of most of these small deletions and insertions?

A

Most small insertions and deletions are harmless and just in the wide environment of the gene and not in coding regions.

Those in non-coding regions (non-coding SNPs) can affect a promoter region or something, however, and affect gene regulation (e.g by modifying a TF binding site).

49
Q

What is meant by synonymous and non-synonymous mutations?

A

Synonymous- change the dna but not the protein translated, non synonymous- gives rise to a different amino acid but may not change the overall function of the protein although are more likely to. (Both coding SNPs)

50
Q

How can you infer the ancestral type of a polymorphic nucleotide in the population?

A

It is logical to infer that if all remaining ancestors share an SNS, then those prior also had this SNS. This is not the case if this is varied and you may have to guess what this nucleotide is.

51
Q

How could you determine whether a given sequence that is present in 50% of the population is an insertion or a deletion?

A

To determine whether a given sequence is an insertion or deletion which is present in 50% of the population then you can look at other related species to see what they have.

52
Q

If, for example, you had a situation where the following species have the following nucleotides in the same position, what could you infer?

Human: A
Chimp: A (6 mya)
Orangutan: C (14 mya)
Rhesus: C (25 mya)
Marmoset: G (35 mya)

A

Human: A
Chimp: A (6 mya)
Orangutan: C (14 mya)
Rhesus: C (25 mya)
Marmoset: G (35 mya)

The ancestral gene when humans and chimps diverged was probably A

The ancestral gene when Rhesus, and when oranguatans diverged was probably C

To know what the nucleotide was when marmosets diverged you need an ‘out-species’ to infer the ancestral allele

53
Q

Aside from SNPs, what else is not on the map of large scale genomic variation between humans and chimps?

A

Small insertions and deletions (INDELs (insertions / deletions))

54
Q

Name three ways indels can disrupt genomic function

A
  • Indels in coding DNA can disrupt protein-coding potential by causing frame shifts
  • Indels in regulatory elements can cause dissociation of TFs
  • Indels can create/delete exon splice sites
55
Q

Name four sources of INDELs

A

1: Repair of DNA damage
2: DNA Polymerase slippage
3: Transposable element insertions
4: Pseudogene insertions

56
Q

Describe how repair of DNA damage can incur INDELs

A

Non- homologous End joining (NHEJ) repair is an emergency type of repair of double strand breaks, without a template strand. Repaired site can contain insertions or deletions compared to original configuration

57
Q

Describe how DNA polymerase slippage can incur INDELs

A

During replication, polymerase slippage and subsequent reattachment may cause a bubble (looping out from the complimentary strand) to form in the new strand. Then, DNA repair mechanisms realign the template strand with the new strand and the bubble is straightened out. The resulting double helix is then expanded.

58
Q

What sort of mutation is often attributed to these DNA polymerase slippages?

A

Slippage is thought to occur in sections of DNA with repeated pattern bases (such as CAG). These series of repaired DNA are like scars, you know something happened there. During replication a repeat expansion may occur.

59
Q

What feature of our DNA often shows these expansion repeats? What negative aspect is there also to them?

A

A lot of our promoters have these repeats, this may be favoured by evolution to encourage transcription. However these sites are quite vulnerable and can lead to pathology (Huntington’s, FTD).

60
Q

How can transposable element insertions be a source of INDELs?

A

Transposable elements can copy paste themselves to new positions in the genome

61
Q

How may pseudogene insertions come about?

A

Tere are twotypes of gene duplication: direct duplication of genomic DNA and retropositional events. Processed pseudogenes are reverse-transcribed intronless cDNA copies of mRNA that have been reinserted into the genome following splicing.

62
Q

Describe meiosis

A

Meiosis consists of two rounds of chromosome segregation following a single replication. At the onset of replication, the sister chromatids are held together by cohesion. Homologous chromosomes pair during prophase I and engage in recombination: at least one crossover (CO) per pair of homologues is always observed. COs and sister chromatid cohesion ensure that recombined chromosomes are physically linked to form a bivalent (homologous chromosomes associated in pairs) at metaphase I.

At anaphase I, this cohesion is released except at the centromeres: homologous chromosomes segregate to opposite poles, whilst sister chromatids remain together. At metaphase II, destruction of the centromeric cohesion allows the segregation of sister chromatids to opposite poles starting at anaphase II.

63
Q

What is a source of healthy genomic variation in Meiosis?

A

This homologous crossover leads to recombination of chromosomes and shuffling of alleles. This shuffling in an important and necessary source of genetic variation in offspring. Multiple crossover events per cell occur during meiotic division

64
Q

What is a source for diversifying genomic variation also associated with meiosis?

A

non- homologous crossover: unequal crossover during meiosis can lead to duplications and deletions. The tetrad is mispaired at the meiotic synapse and this leaves unequal chromosomes; one with a duplication and one with a deletion. (e.g one is missing a C region and one has two)

65
Q

How can non-homologous crossover come about?

A

This can occur due to atypical shuffling of alleles; Islands of similarity such as retrotransposons can lead to a joining at the improper site (repetitive DNA is often involved). There is an active double stranded break at this site and it is an active process, possibly leading to a non-homologous pairing.

65
Q

What can arise form non-homologous crossover?

A

It can be a source of new genetic material; New genes! This can also lead to a piece of DNA being missing and is integral, therefore leading to a disorder.

65
Q

What is a difficulty in mapping these hotspots for non-homologous crossover?

A

We all do not have the same hotspots, this is variable in the population but also highly variable between species.

66
Q

What can be an issue with these variability in hotspots in the population?

A

This is apparent with interbreeding that they cannot create viable germ cells due to not having matching hot spots. This can also be the case with fertility between partners, two partners from different parts of the world might have a difference in hotspots and this issue may not arise with other partners.