GEN 3: Defining the Genome II - DNA Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

List some methods used to analyse DNA and when they were first developed

A
  • Sanger Sequencing: 1970s
  • Restriction enzymes: 1970s
  • DNA cloning with Restriction Enzymes: 1972
  • Southern Blotting: 1975
  • Polymerase Chain Reaction: 1985
  • Next generation sequencing (NGS): 2006
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe how Sanger sequencing works

A
  • Sanger sequencing requires:
  • the target DNA
  • an oligonucleotide primer of ~20nt complementary to part of that DNA
  • a DNA polymerase
  • extends the primer, using the target DNA as template until ddNTP is added and terminates extension
  • a mixture of deoxyribonucleotide triphosphates (dNTPs) and dideoxynucleotide triphosphates (ddNTPs)
  • the ddNTPs lack the 3’-OH group required for nucleotide chain extension
  • fragments are separated by polyacrylamide gel electrophoresis or capillary electrophoresis, which distinguishes fragments differing in length by only one nucleotide
  • labelling ddATP, ddCTP, ddGTP, ddTTP with different fluorophere produces coloured peaks that provide a direct read of nucleotide sequences up to ~1000 nucleotides long
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe how restriction enzymes work to analyse DNA

A
  • restriction enzymes are endonuclease enzymes that cut double-stranded DNA at specific sequences
  • they cleave phosphodiester bonds to leave free terminal 3’-OH and 5’-phosphate groups
  • they are able to cleve internal bonds and circular DNA, unlike exonucleases which only cleave bonds at DNA ends
  • these enzymes usually recognise short target sequences of 4 to 8 base pairs
  • they can cut DNA into smaller fragments by targeting their specific restriction sites
  • there are two types of restriction digestion
  • restriction enzymes come from bacteria and is named after the species of bacteria from which it derives
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe how DNA cloning with restriction enzymes works

A
  • DNA is inserted into a plasmid
  • to do this, both the DNA and plasmid are digested with the same restriction enzymes
  • this produces complementary sticky ends
  • DNA ligase then ligates the two together
  • the resulting recombinant plasmid is then introduced bacteria to generate a single colony (or clone) of bacterial cells, each carrying the same recombinant plasmid
  • this is called recombinant DNA cloning or molecular cloning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What was DNA cloned with restriction enzymes used to create?

A
  • it was used to create a genomic DNA library
  • genomic DNA was fragmented into millions of small pieces
  • these were ligated into a plasmid vector and introduced into bacteria, so that each individual clone carried a different genomic DNA fragment
  • Sanger sequencing of each member was then used to assemble the sequence of the whole genome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe how Southern blotting analysed DNA

A
  • mixtures of DNA fragments are separated by electrophoresis through an agarose gel and blotted onto a nylon membrane
  • a specific sequence in the mixture can then be detected using a DNA probe that is radioactively or fluorescently labelled
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Using sickle cell disease, describe how Southern blotting is used to help determine the status of the beta-globin gene

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe how PCR works as a way to analyse DNA

A
  • a pair of oligonucleotide primers is designed that flank the region to be amplified and are complementart tro opposite strands
  • reactions contain template DNA, the chosen primer pair, dNTPs and a thermostable DNA polymerase (Taq polymerase)
  • using a programmable temperature block, the PCR reaction is taken through multiple cycles of temperature incubations
  • see image for why PCR is used
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What’s the amplification factor in PCR?

A
  • in principle, the target sequence is duplicated during each PCR cycle
  • if the PCR runs for n cycles, this results in an amplification factor of 2n
  • this doesn’t happen in practice, but it is still powerful
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Next Generation Sequencing (NGS)?

A
  • NGS methods can sequence million of short DNA fragments simultaneously
  • known as massively parallel sequencing
  • without the need for individual fragment isolation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe how Illumina, an NGS method, works

A
  • the DNA is first broken into short (<250nt) fragments that are tagged and hybridised onto oligonucleotides attached to a solid support called a flow cell
  • the bound fragments are PCR amplified in situ (bridge amplification)
  • generating millions of distinct clusters, each derived from a single fragment
  • clusters are then sequenced in parallel (at the same time) by primer-extension, one nucleotide at a time, using dNTPs that are reversibly modified with 3’-end blocks and fluorescence tags
  • after the addition of the first nucleotide, the flow cell is laser-scanned to measure the position and colour of each cluster
  • the information is then stored digitally
  • the flow cell is then treated to remove the fluorescent tags and 3’-end blocking groups on the newly extended primers
  • this process is then repeated enough times to generate sequences reads for each cluster
  • bioinformatics software is then used to compare sequence reads, to identify any overlaps and so assemble the sequence of the starting DNA
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are some applications of NGS?

A
  • whole genome sequencing (WGS):
  • allows sequence variation between individuals to be compared
  • transcriptome sequencing (RNA-seq):
  • sequencing of DNA reverse transcribed from RNA transcripts
  • the most highly expressed genes give the greatest number of sequence ‘reads’
  • targeted sequencing:
  • a small region of the genome is sequenced in samples where there may be many variants
  • e.g. exome sequencing may reveal protein-coding variations
  • ChIPseq:
  • antibody to a protein of interest is used to purify chromatin containing that protein, prior to WGS
  • reveals protein-genome interactions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Referring to the illustration for Sanger sequencing, why does the incorporation of the first ddNTP (ddGTP) not prevent any further primer extension?

A
  • the reaction contains many primed templates as well as a mixture of dNTPs and ddNTPs (the former being excess)
  • at each position in the sequence, therefore, only a small proportion of the primers have their 3’ ends blocked
  • the majority will have unblocked 3’ ends and so will be extended
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

It is important to optimise the temperature at step 2 of PCR (annealing the primer)

Can you predict the consequences of step 2 temperatures that are

a) too high
b) too low ?

A

a) too high:
- primer hybridisation would be impaired so no PCR products would be obtained
b) too low:
- primers would bind with less specificity and may give rise to spurious PCR products

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

At each sequencing step during Illumina NGS, what must happen in between laster scanning of the flowcell and addition of the next nucleotide?

A
  • the 3’-blocks and fluorescent tags must be removed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How many nucleotides is the human haploid genome composed of?

A
  • 3 billion nucleotides
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What percentage of the human haploid genome are genes and gene-related DNA?

How many protein-coding genes are there?

What are some gene-related DNA examples?

A
  • 37.5%
  • there are about 21,000 protein-coding genes
  • most of their DNA is non-coding
  • introns, UTRs
  • non-protein-coding-genes make RNAs with known and unknown functions
  • rRNA, tRNA, miRNA, some lncRNAs
  • other non-coding DNA is known to be gene-related but lacking function
  • pseudogenes
  • gene fragments
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How much of the human genome is made up of highly repeated DNA?

What is it made up of?

A
  • 54%
  • 1740Mb
  • it is made up of dispersed transposable elements and tandemly repeated DNA
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Observe this diagram for the breakdown of the human genome constituents

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How is most of the DNA in protein-coding gene not coding?

A
  • there are non-coding regions such as:
  • 5’ and 3’ untranslated regions (UTRs)
  • enhancer sequences
  • promoter sequences
  • long introns
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is alternative splicing?

A
  • alternative splicing allows a single gene to generate multiple mRNA isoforms
  • and therefore, multiple protein isoforms
22
Q

Describe the gene density across humans, prokaryotes, simple eukaryotes

A
  • human genomes have very low gene density compared to genomes of prokaryotes and simpler eukaryotes
  • genes are tightly packed in bacteria and yeast (with only the occasional intron in yeast) while genes in higher eukaryotes are less tightly packed and routinely interrupted with introns
23
Q

Describe the average lengths of a human protein-coding gene, exons and intron numbers and lengths

A
  • Values range widely, but the average human protein-coding gene is 67 Kb long
  • with 11 exons and 10 introns with mean sizes of 163 bp and 6.4 Kb, respectively
24
Q

Describe the average gene density in the human genome

A
  • The average gene density in the human genome also varies between chromosomes
  • with about 25% of the genome consisting of gene deserts - regions of over a Mb that are devoid of genes.
25
Q

How much of the genome is transcribed into RNA?

How much of this is transcribed into non-coding RNA (ncRNA)?

A
  • about 75% of the genome
  • a third of this represents non-coding RNA (ncRNA)
26
Q

What are non-coding RNAs (ncRNA)?

List their classes

A
  • ncRNA are RNA that is not translated into protein
  • some of this may reflect background transcription of no function significance
  • but much is transcribed from genes with important functions
  • 5 classes:
  • ribosomal RNA genes
  • transfer RNA genes
  • small nuclear/nucleolar RNA genes
  • micro RNA genes
  • long non-coding RNA genes
27
Q

Describe ribosomal RNA genes

  • what do they code for
  • transcribed by what
A
  • ribosomal RNA is essential during protein translation by ribosomes
  • rRNA genes are the best characterised of the ncRNAs

There are 4 rRNAs:

  • 18S
  • 5.8S
  • 28S
  • 5S
  • The genes for these exist in multiple copies to ensure there is sufficient rRNA for translation
  • 18S, 5.8S and 28S rRNA are derived from a 41S precursor transcribed by RNA polymerase I from ~ 300 copies of a gene that is tandemly repeated in 5 clusters on different chromosomes
  • Similar numbers of 5S rRNA genes, also clustered as tandem repeats, are located elsewhere in the genome and transcribed by RNA polymerase III.
28
Q

Describe what transfer RNA genes do

Transcribed by what?

A
  • codes for transfer RNA (tRNA)
  • Transfer RNA (tRNA) also functions during translation by delivering amino acids to the ribosome.
  • tRNAs are small; 76-90 nucleotides, with a folded structure.
  • They can base pair with the codons of an mRNA strand, and this process delivers the attached amino acid to the growing peptide chain
  • Genes for the 49 different tRNAs also exist as multiple copies at various chromosome sites and are also transcribed by RNA polymerase III.
29
Q

What do small nuclear / nucleolar RNA genes do?

What are they transcribed by?

Where in the genome are they?

A
  • Genes for small nuclear and small nucleolar (snRNA and snoRNA) are dispersed throughout the genome
  • transcribed by RNA Polymerases II or III
  • Each gene makes a distinct RNA with a distinct function in the processing of mRNA, tRNA or rRNA into their mature forms.
  • For example, many are essential components of the RNA splicing machinery (spliceosome).
30
Q

Describe micro RNA genes

What are miRNAs?

A
  • MicroRNAs (miRNAs) are small (~22nt) RNAs that control gene expression by RNA interference
  • they physically block translation by binding to mRNAs to prevent ribosomal access
  • Many different miRNAs, each targeting specific mRNAs, are transcribed from multiple genes by RNA polymerases II or III.
  • Piwi interacting RNAs (piRNAs) also work by RNA interference but are expressed only in the germ line where they silence transposons.
  • The number of gene assigned to these categories is growing.
31
Q

Describe long non-coding RNA (lncRNA) genes

What are they transcribed by?

Size

Examples

A
  • Genes for long non-coding RNA (lncRNA) are also being identified in increasing numbers.
  • ln common with protein-coding genes, most are transcribed by RNA polymerase II.
  • lncRNA is between 200 and 17,000 nucleotides in length and can regulate mRNA expression by various mechanisms, including RNA interference, although relatively few have fully defined roles.
  • A famous example is X-inactivation by Xist
32
Q

What are pseudogenes?

How many categories of them are there?

A
  • these are mutated genes that are no longer functional
  • thought to be useless byproducts of genome evolution
  • there are >20,000 pseudogenes in the human genome
  • there are three categories
33
Q

What are the three categories of pseudogenes?

A
  • non-processed pseudogenes:
  • arise by duplication of a functional gene followed by mutational inactivation
  • processed pseudogenes:
  • they are intronless and arise by reverse transcription of a spliced transcript followed by chromosomal integration and mutational inactivation
  • gene fragments:
  • these are non-functional remnants of genes resulting from genomic rearrangements
34
Q

Look and learn the example of non-processed pseudogene at the human beta-globin gene on chromosome 11

A
35
Q

What are the two types of highly repeated DNA?

A
  • dispersed (or interspersed) repeats
  • tandem repeats
36
Q

Where are large amounts of highly repetitive DNA found?

A
  • in higher eukaryotes, including humans
37
Q

What are the functions of highly repetitive DNA?

A
  • they are clearly major determinants of genome DNA sequence organisation
  • They have also been identified as important sites of genetic variation.
  • Furthermore, some are known to influence gene expression and it is thought that they have key roles in 3D folding of the genome.
  • Nonetheless, the full extent of their functional significance remains unclear.
38
Q

What are transposable elements (transposons)?

A
  • a type of dispersed repetitive DNA
  • by far the most abundant dispersed repeat sequences
  • transposons are DNA sequences capable of changing their location within the genome
  • there are two basic categories of transposable elements and these use different transposition mechanisms:
  • RNA transposons (retrotransposons)
  • DNA transposons
39
Q

Describe the difference between RNA transposons and DNA transposons

Describe their mechanisms

A
  • RNA transposons (retrotransposons):
  • transcribe their DNA into RNA and use a reverse transcriptase to convert this back into DNA that inserts into a new site in the genome
  • Remarkably, RNA transposons account for about 40% of the human genome!
  • DNA transposons:
  • make up about 3% of the human genome
  • use a transposase to excise their DNA from one site and insert it elsewhere in the genome
  • image caption: DNA transposons leave their original site to integrate elsewhere (‘cut and paste’), whereas RNA transposons remain at their original site but make copies that insert elsewhere (‘copy and paste’).
40
Q

What are the three main types of RNA transposons?

A
  • LTR retrotransposons
  • LINEs
  • SINEs
41
Q

Use this diagram to describe how DNA transposons work

A
  • DNA transposons consist of a transposes gene flanked by inverted repeat (IR) sequences
  • the transposase protein recognises and cleaves the IR sequences, releasing the transposon DNA
  • the transposase also catalyses the integration of the DNA at new, non-random sites in the host cell genome
42
Q

Describe LTR retrotransposons

A
  • long terminal repeat (LTR) retro-transposons are also called endogenous retroviruses (ERVs)
  • their DNA consist of a coding region, flanked by a pair of LTR sequences (100 bp - 5 Kb in length)
  • LTR retrotransposition involves an integration mechanism related to that used by retroviruses, which also have LTR sequences
  • the genomes of LTR retro-transposons include two genes:
  • gag: which encodes a protein needed to make a cytoplasmic virus-like particles
  • pol: codes for reverse transcriptase
  • unlike retroviruses, LTR retro-transposons are non-infective, as they lack the env gene that retroviruses need to leave their host cells and infect other cells
43
Q

Describe LINEs RNA transposons

  • length
  • abundance
  • structure
A
  • long interspersed nuclear elements (LINEs) are around 6 Kb in length
  • their mRNA codes for a reverse transcriptase that can copy the RNA back into DNA which can then integrate into a new site in the genome
  • reverse transcription is often incomplete leading to a truncated, non-functional product being inserted back into the genome
  • among the different families of human LINEs, one called LINE- 1 or L1 is very abundant
  • it accounts for 17% of the human genome!
  • structure:
  • 5’ UTR: includes a strong promoter driving transcriptional initiation by RNA polymerase II
  • Open Reading Frame 1: encodes an RNA binding protein required for transposition
  • Open Reading Frame 2: encodes a protein with endonuclease (EN) and Reverse transcriptase (RT) acti ity required for transposition
44
Q

Describe SINEs RNA transposons

A
  • they are short interspersed nuclear elements
  • 100-400 bp
  • they are non-autonomous transposons, as they do no code for any proteins
  • they depend on proteins (e.g. those encoded by other transposons) for transposition
  • among the different families of human SINEs, one called Alu is very abundant
  • it accounts for 11% of the human genome
45
Q

What are the effects of transpositions of transposons?

A
  • Even though the vast majority of transposons in the human genome have lost their ability to transpose, they have clearly had an enormous influence genome evolution.
  • Furthermore, they continue to modify the genome, either by rare transposition events or by recombining with each other to cause chromosomal rearrangements .
  • Such changes may be oncogenic in somatic cells or, if they occur during gametogenesis, may cause genetic disease or contribute to genome evolution
46
Q

What are tandem repeats?

A
  • regions of DNA where an array of identical, or highly similar, repeat motifs (also called repeat units) is consecutively repeated
  • less abundant than transposons
47
Q

Why are tandem repeats sometimes called VNTR (variable number of tandem repeats) sequences?

A
  • there is variability between individuals
  • there is interest in these because of this
48
Q

How are tandem repeats classified?

A
  • according to their abundance, size of their motifs and arrays, they are classified as:
  • macro-satellites
  • mini-satellites
  • micro-satellites
49
Q

Describe macro-satellites in eukaryotes

  • motif length
  • array size
  • locations
A
  • they have motifs up to ~220 nt long
  • form large arrays (~ 20,000 to 5,000,000 nt long) at 1-100 locations in the genome
  • sequencing such large arrays is an ongoing challange and why the full sequence of the human genome sequence is still incomplete
  • located in heterochromatin and at or near centromeres and telomeres
  • the major human centromeric macro-satellites DNA, alpha-satellite DNA repeat motif 171 nt, is implicated in centromere function and chromosome segregation
  • see diagram
50
Q

Describe mini-satellites in eukaryotes

  • motif size
  • array size
  • location
  • uses
A
  • motifs of 10-150 nt
  • forming arrays of ~20 - 2000 nt at more than 1000 locations in the genome
  • mostly in euchromatin at telomeres and centromeres
  • these regions have no clear function
  • but in 1984 Alec Jeffreys famously identified that the number of repeat units in mini-satellites varies between individuals.
  • This discovery led to DNA fingerprinting.
51
Q

Describe micro-satellites in eukaryotes

A
  • also called Short Tandem Repeats (STR) or Simple Sequence Repeats (SSRs)
  • repeat units (motifs) of 1-10 nt
  • forming arrays of up to 1400 bp at 1000 - 1,000,000 genomic locations
  • there are more than half a million micro-satellite arrays in the human genome with di-, tri-, tetra-, penta- ncuelotide sequence motifs
  • the number of repeats in any particular array is highly variable between individuals, making them useful for forensic, linkage and population sutides
  • arrays of a 6 bp satellites repeat unit (TTAGGG) are found at the very end of all telomeres
  • many micro-satellites are located close to or within genes and some may even be part of the protein-coding region
  • e.g. the tri-nucleotide repeat (CAG)n array found in the gene HTT
  • this is implicated in Huntington’s disease
52
Q

Describe how Huntington’s disease is caused

A
  • Many micro-satellites are located close to or within genes and some may even be part of the protein-coding region.
  • A famous example is the tri-nucleotide repeat (CAG)n array found in the gene HTT.
  • This codes for the protein Huntingtin, implicated in Huntington’s disease.
  • Arrays in healthy individuals contain between 6 and 35 CAG repeats.
  • Individuals affected with Huntington’s disease, however, have >35 repeats, which cause the protein product to have harmful properties.