How genes and genomes evolve Flashcards

1
Q

what are several different mechanisms that can alter genes and genomes?

A

Small mutations (mutation types)
Duplication (chromosomal rearrangements)
Exon shuffling
Rearrangements (chromosomal rearrangements)
Transposition of mobile genetic (transposable) elements
Horizontal transfer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How are most mutations categorized?

A

Most mutations are categorized into 3 classes
1. Point mutations: small-scale mutations
2. (chromosomal) rearrangements: large-scale mutations
3. Mobile genetic element (transposable element) - induced mutations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

copy number variation CNV

A

large chunks of DNA around 10,000-5,000,000 bases long are inserted, repeated, or lost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how is genetic variation generated?

A
  1. In sexually reproducing organisms, only changes to the germ line are passed on to progeny
  2. Point mutations are caused by failures of the normal mechanisms for copying and repairing DNA
  3. Mutations can also change the regulation of a gene
  4. DNA duplications give rise to families of related genes
  5. Duplication and divergence produced the globin gene family
  6. Whole-genome duplications have shaped the evolutionary history of many species
  7. Novel genes can be created by exon shuffling
  8. The evolution of genomes has been profoundly influenced by mobile genetic elements
  9. Genes can be exchanged between organisms by horizontal gene transfer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what are zygotes and how do they form?

A

The gametes (eggs and sperms) contain only half the number of chromosomes than do the other cells in the body (red full circles)
When two gametes come together during fertilization, they form a fertilized egg (aka zygote)
–> the zygote gives rise to BOTH germ-line cells and to somatic cells

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In sexually reproducing organisms, only changes to the ___ line are passed onto PROGENY (offspring)

A

germ (eggs and sperms)

A mutation that arises in a somatic cell affects only the progeny of that particular cell and will NOT be passed onto the organism’s offspring
- Somatic mutations are responsible for most human cancers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how are carbohydrates classified?

A

Monosaccharides
Disaccharides (2 monosaccharides)
Polysaccharides: compounds of many monosaccharides

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are 3 examples of monosaccharides?

A

Glucose
Fructose
Galactose

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are 3 examples of disaccharides and what are they composed of?

A

Maltose = glucose + glucose
Lactose = glucose + galactose
Sucrose = glucose + fructose

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are 3 examples of polysaccharides?

A

Starch
Glycogen
Fiber

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do point mutations in regulatory DNA sequences of lactase gene affect our ability to digest lactose?

A
  1. Our earliest ancestors were lactose intolerant:
    - Lactase is made only during infancy
    - Adults (no longer exposed to breast milk) do NOT need lactase
    –> After around 5 years of age, most people (around 75% world population) stop producing the lactase enzyme
    - Lactase gene (LCT) encodes lactase enzyme
  2. Around 10,000 years ago, humans began to get milk from cattle
    –> Point mutations in regulatory DNA sequence of the lactase gene express lactase → can digest milk as adults
    (aka people who HAVE a point mutation in the lactase gene CAN digest milk as adults)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what are 2 point mutations in the lactase gene LCT that allows adults to digest milk?

A

C → T point mutation: the 1st identified variant associated with lactase persistence
G → C point mutation: north europe and central africa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are regulatory DNA sequences and give 2 examples of them

A

Regulatory DNA sequences: regions of the genome that control the expression of genes (aka they determine when, where, and how much of a gene product (typically a protein) is produced
- don’t encode proteins themselves but contain instructions for turning genes on or off or modulating the level of gene expression
- are crucial for ensuring that each gene is expressed at the right time, in the right cell type, and in the right amounts

promoters and enhancers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

promoters

A

Promoter: a DNA sequence (the binding site) near the transcription start site of the gene at which the RNA polymerase binds to start transcription
Function: to initiate the process of transcription)

TATA box: a type of promoter sequence which specifies to other molecules where transcription begins
- Non-coding DNA sequence
- Is named for its conserved DNA sequence: most commonly TATAAT

In E.Coli: recurring sequence of TATAAT is centered on position -10
- Transcription is initiated at the TATA box in TATA-containing genes
In Eukaryotes: TATA box is the most commonly recognized cis-acting element for genes transcribed by RNA polymerase II on the basis of its consensus sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the most common form of gene control during development and in different cell types?

A

regulation of transcription (ie. promoters and enhancers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

enhancers

A

enhancers: cis-acting elements that have no promoter activity but can stimulate the effectiveness of promoters even when located thousands of nucleotides from the start site of transcription
- do not need to be close to the gene
- can be located upstream or downstream or even in the middle of a transcribed gene it regulates

  • When bound by transcription factors, they enhance the transcription of an associated gene (stimulate transcription above basal levels)
  • Enhancers operate in conjunction with specific enhancer-binding proteins
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

horizontal gene transfer

A

the process by which genes are transferred BETWEEN organisms, often across DIFFERENT species, rather than being passed down from parent to offspring (aka vertical gene transfer)
- So far we have considered genetic changes that take place WITHIN the genome of an individual organism
–> however, genes and other proportions of genomes can be exchanged BETWEEN individuals of DIFFERENT species

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

horizontal gene transfer difference between eukaryotes and bacteria?

A

horizontal gene transfer is rare among eukaryotes but common among bacteria

why though??
The cellular complexity of eukaryotes, with their nuclear membrane and tightly regulated gene expression, makes HGT less likely
eukaryotic cells have robust immune and repair mechanisms to detect and remove foreign DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

gene family

A

if several genes are STRUCTURALLY or FUNCTIONALLY ANALOGOUS, they collectively form a gene family

Gene family members should be designated by Arabic numbers placed immediately AFTER the gene stem symbol without any space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is the level of organization for family, subfamily, and superfamily?

A

Superfamily: a broader grouping of genes
Subfamily: a narrower grouping of genes
Superfamily > family > subfamily

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

pseudogenes and their characteristics

A

pseudogenes: duplicated DNA sequences in the alpha-globin and beta-globin gene clusters that are NOT functional genes and do NOT produce a functional protein
- Generally untranscribed and untranslated
- Have a high level of homology to a functioning gene: their DNA sequences are similar to the functional globin genes
- This kind of gene duplication and divergence occur in many other gene families in human genome

aka a DNA sequence that closely resembles that of a functional gene but contains numerous mutations that prevent its proper expression –> most pseudogenes arise from the duplication of a functional gene, followed by the accumulation of damaging mutations in one copy

  • Suffix by a “P” (or PS in the specific cases)
    Ie. OR2W5P ⇒ olfactory receptor family 2 subfamily W member 5 pseudogene
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

exon shuffling

A

Exon shuffling: a process where exons from one gene are added to another gene
–> Leads to a new exon-intron structure → drives the evolution of new genes

Novel genes can be created by exon shuffling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How are duplications made from crossovers?

A

when crossovers occur unequally and one chromosome may end up with an extra copy of a gene while the other chromosome has a corresponding deletion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

It has been proposed that nearly all the proteins encoded by the human genome (around 19,000) arose from the ______ and _____ of a few thousand distinct exons

A

duplication & exon shuffling

This generates diversity of protein structures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what are crossovers and how do they occur?

A

crossovers occur when corresponding regions of homologous chromosomes align and swap DNA segments
- each crossover involves double-strand breaks in the DNA, which are then repaired by joining corresponding pieces from each chromosome
- for crossovers to occur, the DNA sequences involved must be highly similar or nearly identical

–> the result is a pair of hybrid chromosomes that each contain segments from the other homolog
- the chromosomes still retain the SAME ORDER of GENES they had initially

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what are unequal crossovers and what do they result in?

A

when a crossover occurs between a pair of identical or very similar short DNA sequences that fall on either side of a gene BUT the short sequences are not aligned properly during recombination –> unequal crossovers

results in…
- 1 long chromosome that has an EXTRA copy of the gene (aka gene duplication)
- 1 shorter chromosome with NO copy of the gene –> chr will eventually be lost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

gene duplications via crossovers between homologous chromosomes characteristics?

A
  • Many gene duplications can be generated by homologous recombination
  • Homologous recombination can catalyze CROSSOVERS in which 2 chromosomes are broken and joined up to produce hybrid chromosomes
  • Crossovers take place only between regions of chromosomes that have NEARLY IDENTICAL DNA sequences (usually occur between homologous chromosomes) and generate hybrid chromosomes in which the ORDER OF GENES is EXACTLY the same as on the original chromosomes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Give a real-life example of a disorder caused by unequal crossovers and describe it

A

red-green color blindness (aka Daltonism): example of chromosomal duplication

The OPN1LW (red) and OPN1MW (yellow, green) genes are located on the X chromosome
- Both genes have very similar DNA sequences and are closely located
→ Because both genes are very similar in their sequences, UNEQUAL CROSSING over between the two genes can result in different combinations of the genes (ie. duplication) or even hybrid genes
- OPN1LW is thought to have undergone a DUPLICATION event that leads to an extra copy of the gene, which then evolved independently to become OPN1MW

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

globin gene family

A

globin gene family: a group of related genes that encode globin proteins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

globin protein function and examples

A

globin proteins: specialized proteins for binding and transporting oxygen in the blood and other tissues
- essential for cellular respiration, as they allow oxygen to be carried efficiently from the lungs to tissues and cells
- ie. hemoglobin and myoglobin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

globin gene superfamily in vertebrates

A

a superfamily of heme-containing globular proteins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

what is the simplest globin protein? (amino acid length and found in what organisms)

A

around 150 amino acids and found in marine worms, insects, and primitive fish

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

globin proteins in vertebrates structure?

A

4 globin chains of 2 types (alpha-globin and beta-globin): ⍺2β2 <– hemoglobin structure which consists of 4 globin chains arranged in a tetrameric structure of alpha2beta2

alpha-globin and beta-globin are the result of a gene DUPLICATION

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

how did the globin gene family originate and describe it

A

Duplication and divergence produced the globin gene family –> exemplifies how gene duplication and divergence can drive evolution

  1. The unmistakable similarities in amino acid sequence and structure among present-day globin proteins indicate that ALL the globin genes must derive from a SINGLE ancestral gene
  2. multiple rounds of gene duplication occurred and thus each duplicated globin gene diverged in function –> giving rise to a variety of globin proteins (ie. alpha globin and beta globin)
  3. this divergence enabled the specialization of globin proteins
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

how does exon shuffling affect proteins?

A

Exon shuffling during evolution can generate proteins with new combinations of protein domains
- These different domains were joined together by EXON SHUFFLING during evolution to create modern-day human proteins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

describe the evolution of hemoglobin in vertebrates

A
  1. Single-Chain Globin: Early globin proteins were simple, single-chain proteins bound to a heme group (similar in structure to myoglobin)
    - Single-chain globins are common in simpler organisms: ie. marine worms and some primitive fish
  2. Gene Duplication and Mutation: Over time, gene DUPLICATION occurred, creating multiple copies of the globin gene –> Mutations in these duplicated genes allowed each copy to DIVERGE in structure and function Results: 2 different types of globin chains (alpha-globin and beta-globin) eventually co-evolved to form a cooperative structure
  3. Formation of Tetrameric Hemoglobin: The combination of two alpha (α) and two beta (β) globin chains enabled the formation of the tetrameric structure of hemoglobin
    - This tetrameric configuration (α2β2) introduced cooperative binding, a feature that allows hemoglobin to efficiently load and unload oxygen depending on the partial pressure of oxygen (pO₂) in different tissues

Mammalian hemoglobin molecule is a complex of 2 alpha-globin and 2 beta-globin chains –> alpha2beta2
Each chain contains a tightly bound heme group that contains a central iron ion and the heme group is responsible for binding oxygen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

describe “reconstructing life’s family tree” (aka how mutations and evolution work)

A
  1. Beneficial: on rare occasions, the mutation might cause a change for the better
    - These mutations will tend to be perpetuated since the organism that inherits these mutations will have a increased likelihood of reproducing itself
    - beneficial mutation –> gives selective advantage –> preserved by natural selection and is passed down
  2. Neutral: Mutations that are selectively neutral may or may not persist depending on factors: population size, whether the individual carrying the neutral mutation harbors a favorable mutation located nearby (aka hitchhiking)
  3. Harmful: Deleterious alternations in a gene that codes for an essential protein or RNA (ie. DNA and RNA polymerases) CANNOT be accommodated so easily
    deleterious alternations are harmful –> typically eliminated from population through natural selection
  4. A segment of DNA that does NOT code for protein or RNA and has no significant regulatory role is free to change at a rate limited only by the random mutation frequency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

difference in mutation rates in non-coding DNA vs highly conserved genes

A

ing DNA: Segments of DNA that neither code for proteins nor have regulatory functions are free to accumulate mutations with few consequences
These regions evolve faster since mutations are not strongly selected for or against, allowing changes to accumulate at the natural rate of random mutation (aka random mutation frequency)

Highly Conserved Genes: For essential genes, mutations that disrupt function are usually deleterious, leading to a slower mutation rate in these regions over time. Since any significant alterations are typically eliminated, the sequences of essential genes remain highly similar (conserved) across diverse species, indicating their crucial roles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

what are essential genes and their conservation?

A

Genes that codes for an essential protein or RNA (ie. DNA and RNA polymerases)
These essential genes are highly conserved: the products they encode (RNA or protein) are very similar from organism to organism

deleterious alternations to essential genes cannot be accommodated so easily –> the faulty organisms will almost always be eliminated or fail to reproduce so that these harmful mutations are lost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

mobile genetic elements

A

DNA sequences that can move locations from one chromosome to another within a genome
- Classified according to the mechanisms by which they move (transpose)
- Each mobile genetic element typically encodes a transposase enzyme that mediates its movement
- they encode the components they need for movement

38
Q

what are transposons and viruses and what is the difference between them?

A

transposons: DNA sequences that can move from one chr to another but stay WITHIN an organism’s genome
viruses: can move BETWEEN cells and organisms, carrying genetic material and sometimes altering host DNA.

39
Q

the human genome contains how many major families of transposable sequences?

A

2
LINEs and SINEs ?? maybe

40
Q

retroviruses

A

a unique group of viruses characterized by their ability to reverse the normal flow of genetic information to integrate their genetic material into the host cell’s genome
- retroviruses contain RNA as their genetic material and when they infect a host cell, they convert this RNA into DNA (mediated by reverse transcriptase enzyme)

41
Q

what are other names for mobile genetic elements?

A

transposons
transposable elements
jumping genes

42
Q

how has the evolution of genomes been influenced by mobile genetic elements?

A

The insertion of a mobile genetic element (MGE) into the coding sequence of a gene or into its regulatory DNA sequence can cause spontaneous mutations

MGE can severely disrupt a gene’s activity if they land directly within its coding sequence
–> Such insertion mutation destroys the gene’s capacity to encode a useful protein

The insertion of MGE into the regulatory DNA sequence will often affect on where or when genes are expressed

43
Q

give 2 examples of mobile genetic elements causing insertion mutations

A
  1. Hemophilia in human: example of if the mutation is directly within the gene’s coding sequence
    - hemophilia is mainly caused by chromosomal inversions but can also be caused my mobile genetic element-induced mutations
  2. fruit fly grows legs in place of its antenna: example of if the mutation is in a regulatory DNA sequence –> affected where the genes were expressed for the fly
44
Q

what is the exception to chromosomal inversions being relatively rare events and it being unlikely that multiple patients with the same inversion are found?

A

exception: if the inversion breakpoint falls within or near a gene that has PREVIOUSLY been associated with the disorder through other types of mutations
ie. hemophilia A: example of recurrent inversion mutations in the coagulation factor VIII gene located on the X chr –> causes this disease

45
Q

in bacteria, what is the most common mobile genetic elements and what are the 2 types of mechanisms?

A

In bacteria: the most common mobile genetic elements are the DNA-only transposons (move as DNA only )
→ move by 2 types of mechanisms (both are facilitated by transposase)
1. Cut and paste transposition
2. Replicative transposition

46
Q

in humans , what is the most common mobile genetic elements?

A

most mobile genetic elements move NOT as DNA but via an RNA INTERMEDIATE –> retrotransposons

47
Q

there are ___ kinds of tRNA for ____ codons ____ for amino acids

A

31, 61, 20

48
Q

retrotransposons

A

mobile genetic elements are called retrotransposons because at one stage in their transposition their genetic information flows BACKWARDS (RNA → DNA)
Retrotransposons are UNIQUE to eukaryotes (prokaryotes do not have retrotransposons)

49
Q

what are the two major families of retrotransposons in the human genome and how much of the genome do they constitute?

A

L1 element: referred to as LINE-1; a long interspersed nuclear element that can move on their own (autonomous) since they themselves encode reverse transcriptase and endonuclease
Constitutes about 15% of the human genome

Alu sequence: a type of SINE (short interspersed nuclear element) and does NOT encode their own reverse transcriptase; instead, depends on enzymes already present in the cell to help them move
Constitutes about 10% of the human genome

LINEs: long interspersed nuclear element
SINEs: short interspersed nuclear element

50
Q

describe bacterial cut and paste transposition

A

the transposon physically leaves its original location and inserts itself into a new position within the genome

  1. the transposase enzyme recognizes short, specific sequences at each end of the transposon and excises it out from its original site
  2. transposase creates a break at the target site where the transposon will be inserted
  3. the transposon is ligated/attached into the new DNA site
51
Q

describe bacterial replicative transposition

A

the transposon is copied, and the new copy is inserted into a different site in the genome
–> the original copy remains in place, resulting in two copies of the transposon after the transposition event

  1. Transposase binds to the transposon and facilitates the creation of a replication fork, where the DNA at the transposon is replicated
  2. the original transposon remains in place while a second copy is synthesized and integrated into the target site
  3. The cell’s machinery then separates the two DNA molecules, leaving one copy of the transposon in the original site and another at the new target site
52
Q

difference between cut and paste transposition and replicative transposition?

A

Cut-and-Paste Transposition:
- SINGLE copy of the transposon MOVES from one site to another.
- The number of transposon copies in the genome remains CONSTANT

Replicative Transposition:
- A NEW copy of the transposon is generated –> INCREASING the total number of transposon copies within the genome
–> can lead to a rapid increase in the number of transposons within a bacterial population, potentially spreading traits such as antibiotic resistance

53
Q

describe the mechanism of retro transposition

A
  1. Transposable elements are first transcribed into an RNA intermediate (DNA –> RNA)
  2. Reverse transcriptase enzyme converts RNA back into dsDNA copy
  3. This newly synthesized DNA copy of the retrotransposon is inserted into the target DNA
54
Q

difference between LINEs and SINes

A

LINEs: Long Interspersed Nuclear Elements
- AUTONOMOUS retrotransposons: long retrotransposons that contain the machinery needed to move INDEPENDENTLY within the genome –> can copy and insert themselves elsewhere in the genome without assistance from other elements
- contain 2 open reading frames that encode the enzymes reverse transcriptase and endonuclease
- typically 4,000-7,000 base pairs long
ie. L1 element

SINEs: Short Interspersed Nuclear Elements
- NONAUTONOMOUS retrotransposons: short retrotransposons that RELY on other elements (such as LINEs) to move within the genome –> do not encode any proteins and lack the machinery needed for their own movement
- typically 100-300 base pairs long
ie. Alu sequences

55
Q

most viruses that cause human disease have genomes made of either __ DNA or __ RNA

A

double strand DNA
single strand RNA

56
Q

what are 2 types of bacterial DNA-only transposons and what are their chaarcteristics?

A
  1. IS3: a type of Insertion Sequence element
    - IS: the simplest forms of DNA-only bacterial transposons (consists primarily of the transposase gene flanked by short inverted repeat sequences)
    - IS3: an example of an IS element that moves via cut-and-paste mechanism
  2. Tn3: an example of a composite transposon
    - composite transposon: more complex and include additional genes besides the transposase
    –> often contain genes that confer ANTIBIOTIC RESISTANCE, which makes them especially important in the spread of drug-resistant bacteria
    - Tn3 an example of a composite transposon containing both a transposase gene AND an ampicillin resistance AmpR gene
    - Tn3 can move via replicative transposition
57
Q

transposase enzyme function

A

recognizes specific DNA sequences, usually found at the ends of the transposon, and catalyzes the cutting and rejoining of DNA
–> allows the transposon to insert itself into a new location

58
Q

How do transposons affect antibiotics?

A

Some transposons carry genes that encode enzymes that INACTIVATE ANTIBIOTICS (ie. ampicillin and AmpR)

aka some transposons (ie. Tn3) can carry antibiotic resistance genes that enable bacteria to survive exposure to specific antibiotics
–> when these transposons insert into plasmids, they can spread the antibiotic resistance genes rapidly through the bacterial population via HORIZONTAL GENE TRANSFER
–> this spread of resistance complicates treatment strategies

59
Q

About how many nucleotide pairs are there in the human genome? what is the number of protein-coding genes and number of non-protein-coding genes?

A

about 3.2*10^9 (3 billion) nucleotide pairs
protein-coding genes: approx 19,000
non-protein-coding genes: approx 5,000

60
Q

facts about examining the human genome

A
  1. The nucleotide sequences of human genomes show how our genes are arranged
  2. Differences in gene regulation may help explain how animals with similar genomes can be so different
  3. The genome of extinct Neanderthals’ reveals a lot about what makes us human
  4. Genome variation contributes to our individuality
61
Q

If each nucleotide pair is drawn to span 1 mm, the human genome would extend ___ km. At this scale, there would be on average a protein-coding gene every __ meters and an average gene would be ___ meters long. However, the exons in the gene would add up to only just over __ meter.

A

If each nucleotide pair is drawn to span 1 mm, the human genome would extend 3200 km (approximately 2000 miles) which is far enough to stretch across central Africa where humans first arose (red line in B)

At this scale…
- Average, a protein-coding gene every 150 m
- An average gene would extend for about 30 m, but the coding sequences (exons) in this gene would add up to only just over a meter (the rest would be introns)

62
Q

how is the human genome categorized and what are the elements within each one?

A
  1. repeated sequences:
    mobile genetic elements
    - LINEs
    - SINEs
    - retrotransposons
    - DNA-only transposons
    simple repeats
    segment duplications
  2. unique sequences:
    introns
    protein-coding exons
    nonrepetitive DNA that is neither introns nor exons

The bulk of the human genome is made of repetitive nucleotide sequences and other noncoding DNA
–> about 1/2 of our genome consists of repeated sequences

63
Q

what are simple repeats?

A

short nucleotide sequences (less than 14 nucleotide pairs) that are repeated again and again for long stretches

64
Q

what are segment duplications

A

large blocks of the genome (1000–200,000 nucleotide pairs) that are present at 2 or more locations in the genome

65
Q

what are non-repetitive DNA sequences that are neither introns not exons and what do they encompass?

A

often referred to as intergenic regions (don’t code for proteins or forming part of introns, but still serve various roles in gene regulation and genome organization)

Regulatory DNA sequences
Sequences that code for functional RNA
Sequences whose functions are unknown

66
Q

noncoding RNAs

A

A functional RNA molecule that is transcribed from DNA but NOT translated into proteins
–> instead they play various regulatory, structural, and catalytic roles in the cell and are crucial for cellular function and gene expression control, influencing processes from protein synthesis to gene silencing

67
Q

what are abundant and functionally important types of non-coding RNAs?

A

tRNAs
rRNAs
small RNA (ie. microRNAs, siRNAs, etc…)

68
Q

Question: Looking at a given piece of raw DNA sequence, how can we tell which parts represent protein-coding segments?
(aka how to distinguish the rare coding seqs from the more abundant noncoding seqs in a genome?)

A

Answer: In bacteria and simple eukaryotes (ie. yeasts), look for open reading frames (ORFs) using a computer program

69
Q

open reading frame

A

ORF: the part of the reading frame that when translated into amino acids contains NO STOP CODONS
(the transcription termination site is located after the ORF)
- “Open”: open to keep reading → the ribosome will be able to keep reading the RNA code and add another amino acid one after another

  • aka an ORF is the length of DNA or RNA which is transcribed into RNA through which the ribosome can travel, adding 1 amino acid after another before it runs into a codon that doesn’t code for any amino acid
70
Q

what are some reasons as to why identifying true open reading frames is difficult in animals and plants?

A
  1. Only ~ 1.5% DNA sequence are exons in human genes
  2. In animals and plants, identifying true ORFs is difficult due to large introns
  • 50 codons (ie. as few as 50 codons to contain for an exon) is too short to generate a statistically significant ORF signal since it is not that unusual for 50 random codons to lack a stop signal
  • Introns are so long and are likely to contain by chance a bit of “ORF noise” (sequences lacking stop signals)
71
Q

describe the steps of RNA sequencing and what it is used for

A

RNA sequencing can be used to characterize protein-coding genes

Steps:
1. Isolate RNAs from the cells or tissue

  1. Determine RNA nucleotide sequences using RNA sequencing
    - the isolated RNA is converted to complementary DNA using reverse transcriptase to allow the RNA sequences to be sequenced as DNA
    - the complementary DNA is then fragmented and sequenced –> short cDNA fragments (aka sequencing reads) are generated and provide a snapshot of the DNA transcriptome
  2. Map the RNA sequences back to the genome to locate their genes
    - The sequencing reads are mapped back to the reference genome. The positions where the reads align to the genome indicate the location and structure of expressed genes
    - exon segments are more highly represented among the sequenced transcripts (aka show higher read coverage) compared to introns which tend to be spliced out and destroyed

–> By quantifying the number of reads that map to each gene, researchers can estimate the expression levels of protein-coding genes. Highly expressed genes will have a greater number of reads, reflecting their abundance in the sample.
RNA-seq also allows for the identification of alternative splicing events by showing which exons are included or skipped in different transcripts.

72
Q

how do mobile genetic elements cause exon shuffling?

A

In a typical transposition event, the transposase enzyme recognizes specific DNA sequences at the ends of a SINGLE mobile genetic element
However, in some cases, the transposase recognizes the ends of TWO DIFFERENT mobile elements located on either side of an exon, rather than just the ends of one element
–> this can lead to the movement of an exon

73
Q

describe the steps of mobile genetic elements causing exon shuffling

A
  1. Recognition of Separate Ends:
    The transposase mistakenly identifies the two ends of different mobile genetic elements as a complete unit
    –> These two elements may be situated on either side of an exon in the gene’s DNA sequence
  2. Excising a Gene Segment:
    When the transposase cuts between the two recognized ends, it removes the DNA segment in between, which can include an exon or multiple exons from the host gene
    –> The excised segment (exon plus mobile element sequences) is now free to move within the genome
  3. Insertion into a New Location:
    The mobile genetic elements, along with the attached exon, can insert into a new location within the genome, potentially near or within another gene
    –> This process introduces the exon from the original gene into a different gene, and if integrated properly, the new gene might adopt this exon as part of its sequence, allowing it to take on new functions or altered expression
74
Q

what are some effects of exon shuffling?

A

new protein functions
modular proteins (combined functions from different proteins)
evolution of gene families (ie. globin gene superfamily)

75
Q

what is DNA sequencing depth?

A

The total number of reads generated during RNA sequencing
High sequencing depth increases the accuracy and sensitivity of detecting low-abundance transcripts and improves coverage across genes –> revealing more details about exon usage and alternative splicing

76
Q

how do you interpret the sequencing read generated?

A

The HEIGHT of each trace is proportional to how OFTEN each sequence appears in a read

77
Q

how are exon and intron sequences shown in the sequencing read and at what levels?

A

Exon sequences are present at high levels (reflecting their presence in mature β- actin mRNAs)

Intron sequences are present at low levels (most likely reflecting their presence in pre-mRNA molecules that have not yet been spliced or spliced introns that have not yet been degraded)

78
Q

single nucleotide variant SNV

A

a DNA sequence variation that occurs when a single nucleotide (A, C, T, or G) in the genome sequence is altered
- single nucleotide variant = single base substitution = point mutation (most genome variations are due to SBS)
May be rare or common in a population

79
Q

Single nucleotide polymorphisms SNP and where in the genome do they usually occur?

A

single-nucleotide variants that are present in at least 1% of the population
aka points in the genome that differ by a single nucleotide pair between one portion of the population and another

  • Most but not all SNPs in the human genome occur in regions where they do NOT affect the function of a gene
80
Q

When comparing any two humans, on average, about what is the rate of SNP you will find?

A

When comparing any two humans, on average, about 1 SNP per every 1000 nucleotide pairs

81
Q

How does RNA sequencing provide a more accurate estimate of the number of genes in a genome than DNA sequencing?

A
  1. RNA seq focuses specifically on the transcribed portion of the genome –> allows for the direct identification of expressed genes
  2. provides insights into exon-intron structure by aligning RNA reads to the genome –> reveals alternative splicing (which parts of the genome were spliced together to form mature mRNAs)
  3. can differentiate functional genes from pseudogenes (nonfunctional copies of genes that resemble functional genes at the DNA level but are NOT transcribed) by identifying only transcribed sequences –> more accurate count of actively used genes
  4. quantifies gene expression levels –> helps distinguish between genes that are actively used by cell and those that are rarely used
82
Q

When the same region of the genome from two different humans is compared, the nucleotide sequence typically differ by around what percentage?

A

0.1%

Possible except for some identical twins, no two people have exactly the same genome sequence

83
Q

1000 Genomes Project: goal, populations

A

goal: find genetic variants with frequencies of at least 1% in the populations studied

6 big populations (and 26 subpopulations within these categories)
1. ALL: combined
2. AFR: african
3. AMR: admixed american
4. EAS: east asian
5. EUR: european
6. SAS: south asian

84
Q

what was the global reference for human genetic variation. Nature 2015 for the 1000 Genomes Project (data, data collection) and what were its analysis?

A

Data: 2504 individuals, 26 populations. 500 samples of 5 ancestry
groups
Data Collection: Low-coverage whole-genome sequencing,
targeted exome sequencing, dense microarray genotyping
Includes >99% SNPs (with frequency >1%) for a diverse set of population

Analysis: Rare variants are usually more GEOGRAPHICALLY restricted, more likely to be shared WITHIN people of the SAME populations or BETWEEN populations with known recent admixture

85
Q

Exome Sequencing Project (ESP) goal

A

Goal: Discover novel genes/mechanisms that contribute to heart, lung, blood disorders by pioneering the application of NGS (aka next-gen sequencing) of protein coding regions (exons) of the human genome across diverse, richly-phenotyped populations
and share these datasets and findings to extend and enrich the diagnosis, treatment, etc of these disorders

86
Q

ESP 6500 (6503 exomes): goals and populations

A

Actionable exomic incidental findings in 6503 participants:
challenges of variant classification. Genome Res. 2015
- called ESP 6500, but actually the total number of samples is 6,503 exomes

Goal of ESP dataset: release the frequency counts of specific variants without regard to phenotype
Population: samples were selected from US populations only
In general ESP samples were selected to contain:
1. Controls
2. The extremes of specific traits: LDL and blood pressure
3. Lung diseases and specific diseases (ie. early onset myocardial infarction, early onset stroke)

Goal: Investigate the pathogenicity of specific variants
- Estimate frequency by classifying potentially actionable SNVs
- considered 112 gene-disease pairs as related to genetic disorders that may be undiagnosed in adults

87
Q

ExAc: full name, goal, data type, analysis`

A

ExAc is the 1st release of the Genome Aggregation Database (gnomAD) project

Goal: create a public database of “normal” variation and their frequencies that can help us identify less frequent and potentially harmful variations
Data: 60,000+ people
- ONLY EXOME DATA!!
- Performed PCA to distinguish populations and ancestries
Used exome sequencing: has been applied to tens of thousands of patients with rare, severe diseases

Of the high-quality variants:
- 99% are <1% frequency
- 54% are singletons (variants seen only once in the data set)
- 72% are absent from 1000g and ESP

publications: Analysis of protein-coding genetic variation in 60,706 humans. Nature 2016

88
Q

compare 1000g, ESP 6500, and ExAc in terms of type of data, populations, publications, and goals of each

A

Goals:
1000g: to find most genetic variants with frequencies of at least 1% in relatively large, major populations studied
ESP (EVS): to discover novel genes and mechanisms contributing to heart, lung, and blood disorders –> to investigate pathogenicity of specific variants
ExAC: to create a database of “normal” variation that tells us which of those changes are seen in healthy people and how common they are (could be a shared goal among the 3 though)

DATA:
1000g: whole genome (2504 human genomes), targeted exome sequencing
ESP6500 (EVS): whole exome (6503 exomes)
ExAC: high-quality exome (60,706 exomes)

Population:
1000g: 6 big populations, 26 subpopulations, 5 continental ancestry groups (diverse geographical ancestries)
ESP (EVS): US populations ONLY! (African Americans, European Americans, richly-phenotypes populations)
ExAC: diverse geographical ancestries

Publications:
1000g: Nature 2015 (stopped collecting data for the 3rd phase after May 2, 2013)
ESP (EVS): Genome Res 2015
ExAC: Nature 2016; Nucleic Acid Res 2017

89
Q

anonymous gene families

A

related by sequence but NOT by homology or
function
- Get a “FAM#” symbol (Ex: FAM9A, FAM9B, etc) temporarily until more is known
–> compared to the suffix “P” for gene families
Ie. OR2W5P

90
Q

Gene definition

A

DNA segment that contributes to phenotype and function

91
Q

genomic regions definition

A

chromosome regions that are associated with certain syndrome or phenotype

92
Q

Locus

A

A point in the genome that can be mapped
- Doesn’t necessarily correspond to a gene

93
Q

Gene symbol rules (nomenclature)

A

gene symbol must be less than or equal to 6 characters
1st letter symbol = 1st letter name
Latin letters and Arabic numbers
No punctuation, no ending with G
No reference to species

Ex: COX8 (Cytochrome c oxidase subunit VIII)

94
Q

gene name rules (nomenclature)

A

Start with lowercase (unless person’s name)
Common abbreviations (ex: ATP) can be used

Ex: TP53 (tumor protein p53)

95
Q

gene family symbol nomenclature

A

Ex: CYP1A1,

96
Q
A