Topic 7: Genomics Flashcards

1
Q

Transposable Elements (TEs) (i.e., transposons, jumping genes)

A

Transposons are mobile genetic elements that can move or transpose themselves within the genome of an organism. McClintock observed unusual patterns of inheritance in maize that she could not explain by traditional Mendelian genetics. She noticed that certain genetic elements seemed to move from one position to another within the genome, disrupting the normal functioning of genes and causing mutations.
- Part of the moderately receptive sequence class
- Viral origin
- 45% of the human genome
- Discovered by Barabara McClintock, in Zen Maize (i.e., corn) the mutations were causing different corn colours
- Jump around, and encode the gene that they need to do so (i.e., Transposase Genes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Transposase (Genes)

A

Transposons do typically encode the necessary genes that allow them to move or transpose themselves within a genome. These genes are called transposase genes, and they code for enzymes that catalyze the transposition process.

The transposase enzyme recognizes specific sequences on the ends of the transposon and uses these sequences to cut and paste the transposon to a new location within the genome. Some transposons also contain other genes, such as antibiotic resistance genes, which can spread through a population by transposition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Transposition

A

Movement of the transposons; During transposition, the transposable element is first recognized and cut out of its original location by an enzyme called a transposase. The transposase then inserts the transposable element into a new location within the genome, either by pasting it in directly or by creating a new copy of the element and inserting the copy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Moderately Repetitive Sequence Class

A

Moderately repetitive sequences are DNA sequences that occur in multiple copies throughout a genome, but not to the extent of highly repetitive sequences, which can occur in thousands or even millions of copies.

The moderately repetitive sequence class includes sequences such as transposable elements, which can occur in hundreds or thousands of copies throughout a genome. Other examples of moderately repetitive sequences include ribosomal RNA genes, which are necessary for protein synthesis and occur in multiple copies in the genome, and satellite DNA, which consists of short repetitive sequences that are tandemly repeated at specific locations in the genome.

These moderately repetitive sequences can have functional roles in the genome, such as regulating gene expression or contributing to chromosome structure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Transposable Elements (Direct Repeats)

A

The direct repeats are not part of the transposable element but are generated by the transposon.

They are short DNA sequences that are repeated at both ends of a transposable element and are oriented in opposite directions.

During the transposition process, the transposase enzyme recognizes and binds to the direct repeats, and uses them as a recognition site to excise the transposable element from its original location in the genome. The transposase then inserts the transposable element into a new location in the genome, often creating a short duplication of the direct repeats at the target site.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Transposable Elements (TEs): what are the Direct Repeats that are associated with TEs?

A

Are part of the transposable element, they direct the transposase. They are short DNA sequences that are repeated at both ends of a transposable element, but are oriented in opposite directions, such that the sequence at one end is the reverse complement of the sequence at the other end.

During transposition, the transposase enzyme recognizes and binds to the inverted repeats, and uses them as a recognition site to excise the transposable element from its original location in the genome. The transposase then inserts the transposable element into a new location in the genome, often creating a short duplication of the inverted repeats at the target site.

Inverted repeats are important for the transposition process, as they provide the necessary recognition sites for the transposase enzyme to bind and catalyze the excision and insertion of the transposable element. The inverted repeats also contribute to the stability of the transposable element, as they can help protect the ends of the transposable element from degradation or other genetic modifications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Transposition (Direct Repeat Steps - hint: how direct repeats are created)

A
  1. The transposase enzyme recognizes and binds to the inverted repeats at the ends of the transposable element.
  2. The transposase makes staggered breaks in the target DNA site where the transposable element is to be inserted, creating single-stranded gaps.
  3. The transposase joins the single-stranded ends of the transposable element to the single-stranded gaps in the target DNA site.
  4. DNA is replicated at the single-stranded gaps. The gaps in the target DNA site are filled in by DNA repair enzymes, using the transposable element as a template for replication.
  5. The transposable element is now inserted into the target DNA site, flanked by short direct repeats that were generated during the staggered breaks.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Terminal Inverted Repeats (TIRs)

A

Terminal inverted repeats are the sequences that single to the transposase: “that this is a transposon and this is the sequence that you’re going to be doing transposition on.”
- Recognized by the transposase, and directs transposition.

TIRs are composed of identical sequences that are inverted and oriented in opposite directions so that they form a hairpin-like structure when the transposable element is inserted into the genome. These inverted repeats serve as recognition sites for the transposase enzyme, which binds to the TIRs and catalyzes the movement of the transposable element from one location in the genome to another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

There are two main classes of (TEs): ______ and _____

A
  • Class I: Retrotransposons
  • Class II: DNA Transposons
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Class I: Retrotransposons

A
  • TEs that are mostly found in the human genome.
  • works through an RNA intermediate
  • copy/paste mechanism (replicative): i.e., every time transposition is happening, a new copy of the transposon is being made within the genome.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Class I - Retrotransposons Examples:

A
  • SINE and LINE: short and long interspersed elements
  • Alu Element: ~300 bases long. Found between 300,000 and one million times in the human genome (copy and pasting over and over again over evolution).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Class II - DNA Transposons

A
  • Found only in DNA
  • Transposition mechanism: cut/paste (i.e., not replicating itself every time, not making copies of itself)
  • some can do copy/paste, but most don’t
  • Not found often in humans
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Class I: Retrotransposons (HOW DOES IT WORK)

A
  • Retrotransposons encode a reverse transcriptase that can create and integrate cDNAs into the genome
  • This copy-and-paste mechanism is hard to select against and causes bloating of the genome

The mechanism of transposition by retrotransposons involves the transcription of the retrotransposon DNA into an RNA intermediate by the host cell’s RNA polymerase enzyme. This RNA intermediate called a retrotransposon RNA, is then reverse-transcribed back into DNA by the retrotransposon’s own reverse transcriptase enzyme. The resulting DNA copy of the retrotransposon RNA is then integrated back into the genome at a new location, typically in a different location than the original retrotransposon.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe and define “cDNAs in the genome”

A

cDNAs, or complementary DNAs, are DNA copies of messenger RNAs (mRNAs) that are reverse-transcribed from the RNA molecule using an enzyme called reverse transcriptase. The resulting cDNA is complementary to the mRNA template and lacks introns, which are non-coding regions that are removed from the primary RNA transcript during the process of RNA splicing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

“Bloating of the genome” DESCRIBE

A

The term “bloating of the genome” refers to an increase in the size and complexity of a genome beyond what is necessary or advantageous for the organism. This can occur due to the accumulation of repetitive DNA sequences, including transposable elements such as retrotransposons.

Retrotransposons, as Class I transposable elements, have the ability to amplify themselves and move within the genome through an RNA intermediate. When they insert into new locations, they can create additional copies of themselves, leading to a proliferation of retrotransposon sequences within the genome. Over time, this can lead to a significant increase in the amount of repetitive DNA in the genome, which can contribute to genome bloating.

The impact of genome bloating on an organism can be complex and depend on the specific genetic and environmental factors at play. In some cases, the accumulation of repetitive DNA may have little effect on the organism’s fitness or phenotype. However, in other cases, genome bloating can lead to reduced fertility, developmental abnormalities, or other negative consequences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe and define “Reverse Transcriptase”

A
  • only found in retroviruses/RNA viruses, this is the enzyme that allows them to get their RNA into the host cell and reverse transcribe their RNA into DNA, to get it into our genomes
  • When we go from RNA to DNA, we call it “cDNA” where “c” means copy
  • We copy this process in the labs

Reverse transcriptase is an enzyme that catalyzes the reverse transcription of RNA into DNA. It is a key tool in molecular biology, as it allows researchers to generate complementary DNA (cDNA) copies of RNA molecules, which can be useful for a variety of applications.

Reverse transcriptase is a type of RNA-dependent DNA polymerase, meaning that it uses RNA as a template to synthesize a complementary DNA strand. It was first discovered in retroviruses, which are RNA viruses that replicate by reverse transcription of their RNA genome into DNA. Reverse transcriptase is a key component of the retroviral replication cycle, as it allows the virus to integrate its genetic material into the host cell genome.

17
Q

Retrotransposons (HOW they encode reverse transcriptase that creates and inserts cDNAs into the genome) STEPS

A

The steps by which retrotransposons encode reverse transcriptase and create and insert cDNAs into the genome are as follows:
- Reverse Transcriptase: making the cDNA
- Transposon: doing the transposition
- We get a new copy of the retrotransposon every time

  1. Transcription: The retrotransposon DNA is transcribed into RNA by the host cell RNA polymerase.
  2. Translation: The RNA is then translated into a polyprotein by the host cell ribosomes.
  3. Reverse transcription of RNA into DNA: The polyprotein is processed by the retrotransposon protease into individual proteins, including reverse transcriptase. The reverse transcriptase enzyme then uses the RNA as a template to synthesize a complementary DNA (cDNA) strand, using dNTPs (deoxynucleoside triphosphates) as substrates.
  4. Synthesis of the second DNA strand: The reverse transcriptase then switches to the 3’ end of the cDNA and synthesizes a complementary DNA strand using the original cDNA as a template, in a process known as second-strand synthesis.
  5. Insertion of retrotransposon DNA: The resulting double-stranded DNA is then inserted into a new location in the host cell genome by a mechanism called integration.
18
Q

Metagenomics

A

Sequence all DNA from an environment to find out which
species or genes are present.

Metagenomics is the study of genetic material recovered directly from environmental samples, such as soil, water, or microbial communities within the human body or other organisms. Unlike traditional genomics, which involves sequencing the DNA of a single organism, metagenomics aims to understand the genetic diversity and functional potential of entire communities of organisms within a given environment.
- Lots of sequences from microbiota
- Looking at soil biome, water samples, microbiome from gut, etc

19
Q

Why do metagenomics?

A
  • Most microorganisms are unculturable in a lab setting but still may be crucial to the understanding of an ecosystem
  • Metagenomics allows the sequencing of all the genomes present in a sample
  • Assembly to determine the species that are present which allows us to determine the biodviserty
  • Assembly to determine the genes that are present which allows us to determine the function
  • Most microorganisms are unculturable in a lab
    setting but still may be crucial to the understanding
    of an ecosystem
20
Q

How is Metagenomics done?

A
  1. Collect
  2. Filter
  3. DNA Extraction
  4. Clone (don’t need to do this with new tech)
  5. Sequence
  6. Assembly
21
Q

How is Metagenomics done (by cloning and expression)?

A
  1. Sample collection: As with other metagenomic approaches, the first step in functional metagenomics is to collect a sample from the environment of interest.
  2. DNA or RNA extraction: The genetic material is extracted from the sample using standard methods.
  3. Fragmentation: The DNA is fragmented into small pieces, typically by sonication or enzymatic digestion.
  4. Cloning: The fragments are cloned into a suitable vector, such as a plasmid or bacteriophage, that can be introduced into a host cell.
  5. Transformation: The vector is introduced into a host cell, such as E. coli, to create a library of clones representing the genetic diversity of the environmental sample.
  6. Screening: The library is screened for clones that express functional genes or enzymes of interest, such as those involved in the degradation of a specific substrate or resistance to an antibiotic.
  7. Sequencing and analysis: The clones of interest are sequenced to identify the functional gene or enzyme, and the sequences are analyzed to gain insights into their function and evolution.
22
Q

Genome-wide Association Mapping

A

Association mapping involves detecting statistical associations between SNP markers and phenotypes using a large sample of unrelated individuals.
- Allows us to look at natural variation that’s created by natural mutations, instead of lab-induced mutations

In association mapping, a large sample of unrelated individuals is genotyped at a set of SNP markers across the genome, and their phenotypic data is collected. The genotype and phenotype data are then analyzed using statistical methods to identify SNP markers that are significantly associated with the phenotype of interest.

The use of a large, unrelated population is important because it increases the likelihood of detecting true associations and reduces the potential for confounding factors such as population structure, relatedness, or environmental effects.

23
Q

Define and describe SNP Markers

A

Single nucleotide polymorphisms (SNPs) are the most common type of genetic variation that occurs in the DNA sequence of organisms. SNPs are single-base pair differences in DNA sequence that exist between individuals in a population.
- SNP Markers are placed in the genome that is just one basepair that when you sequence it in one individual and another individual you’ll see a difference (i.e., a polymorphism, a difference or character state change when you look at a specific location between different individuals)

SNP markers are specific locations in the genome where an SNP occurs, and they can be used as genetic markers to distinguish between individuals or populations.

24
Q

Describe and Define Polymorphism

A

Polymorphism refers to the presence of genetic variation within a population, resulting in differences in DNA sequence, gene expression, or phenotype among individuals.

Polymorphisms can occur at different levels of genetic organization. At the DNA level, single nucleotide polymorphisms (SNPs) are the most common form of polymorphism and involve a single base pair variation in DNA sequence. Other types of DNA polymorphisms include insertions, deletions, and repeat length variations.

25
Q

What are we doing when are doing the “genome-wide association mapping?”

A

We are looking for a genotype-to-phenotype association; a statistical significant association.

26
Q

What is meant by “mapped SNPs?”

A

We know where in the genotype the SNPs are.

27
Q

Describe the Cancer example of genome-wide association mapping

A

Example: collect a large dataset of mapped SNPs in
- Say you collected 500k SNP markers, this is the GENOTYPE

  1. 1000 people with a particular type of cancer (PHENOTYPE)
  2. 1000 people without cancer (PHENOTYPE)

Ask-> is are there SNP alleles that most of the people with
cancer have but few of the people without cancer have? Is the difference statistically significant?

28
Q

Why use the “Genome-wide Association Mapping” method?

A
  • Some species can not be mutagenized and studied in the lab, for example, humans
  • Some species can not be crossed and studied in the lab, for example, whales
  • Some phenotypes involve many genes (don’t work well, some genes are involved in many traits, they don’t give good clear phenotypes when knocked out), for example, human height, or quantitative traits (involve many genes)
  • Many scientists are interested in natural variation/mutations
    and how they evolve (instead of lab-generated mutation) i.e., the natural mutations are the source of all-natural variation that evolution acts on
29
Q

Genome-wide Association Mapping Flower Example:

A

We want to know what genes are responsible for the phenotypic variation that we are able to observe.
- Catalogue the phenotypes (colours of the flowers)
- Catalogue the genotypes (the genotypes at the SNP locus of the flowers) some are homozygous or heterozygous in C or T (the alleles)
- Doing a chi-square on the C and T amounts present in each of the coloured flowers
- p-value(0.013) <0.05 therefore statistically significant, therefore the SNP is in or near the gene that causes this phenotype

30
Q

Manhattan Plot

A

The Manhattan plot is a graphical representation of the results of a genome-wide association study (GWAS). It shows the statistical significance of the association between genetic markers and a particular trait or disease on the y-axis, plotted against their physical location on each chromosome on the x-axis.
-shows the genotype-phenotype associations for all SNPs in the dataset
- each dot is an SNP, and we lay them out in the order of the chromosomes, the first dot is the very first SNP on chromosome #1, these are all mapped SNPs
- We only look at the autosomes, not the sex chromosomes
- The y-axis is the -log of the p-value, so you’re inverting it so that on the graph, the highest point is the most significantly associated SNP with the phenotype

The plot is called a Manhattan plot because the distribution of data points resembles the skyline of Manhattan, with clusters of points rising above the horizontal axis. The most significant associations are represented by the highest points on the plot, while non-significant associations appear near the horizontal axis.
- In the images, we have 5 loci that are associated with the phenotype (anything above the p-value threshold is significant)