Lectures 6,7,8 Flashcards
How large are DNA sequencing reads?
- only 750 bp
- helpful for tracking down causative mutations for genetic diseases
What are the two approached to genome mapping?
- Genetic mapping/linkage analysis - relies on the observation of inheritance patterns during genetic crosses
- Physical mapping - using molecular biology techniques to directly identify features in the genome
Describe crossing over/recombination.
- In prophase 1 of meiosis homologous chromosomes form a bivalent
- Within the bivalent the chromosome arms or chromatids physically break and exchange segments of DNA
- After recombination maternal chromosome 1 will have some genetic information from the paternal chromosome 1
What do semicolons indicate in dihybrid crosses?
- a semi colon indicates that they are on different chromosomes
- no semi colon indicates linked genes
What did thomas hunt morgan discover in reference to recombination?
genes are linked to each other and alleles can be uncoupled by recombination
What did Arthur Sturteuant discover?
1) Recombination is a random event - there is an equal chance that crossover occurs at any position between a pair of chromatids
2) Recombination frequency is therefore a measure of the distance between two genes
How does the distance between genes impact the frequency of recombination?
- Further apart - high frequency of recombination
- Closer together - low frequency of recombination
What does it mean when two genes are linked?
they are located on the same chromosome
How is recombination measured?
- percent recombination frequency = map units or centiMorgans (cM)
- 1% = 1 map unit
What are some limitations to genetic mapping?
- recombination hotspots -areas more likely to undergo crossover
- chromatids can undergo multiple crossovers
How do you calculate recombination frequency?
the number of recombinant progeny / total progeny = % recombination frequency
Name 3 genetic markers.
1) Restriction Fragment Length Polymorphism (RFLPs)
2) Simple Sequence Length Polymorphisms (SSLPs)
3) Single Nucleotide Polymorphisms (SNPs)
What are RFLPs, SSLPs, and SNPs known as?
genetic markers
What is a restriction fragment length polymorphism (RFLP)?
- when there is a restriction site only present on one allele - (Known as a polymorphic restriction site)
- when different alleles are cut with restriction enzymes they create different numbers of fragments of DNA
- this is detected through southern blotting
What is southern blotting?
- Take genomic DNA sequence and cut it with restriction enzymes
- Run them on a gel and get a smear
- Copy gel to a nylon membrane
- Incubate DNA probe with nylon membrane and it will bind to the DNA sequence complementary to it
- Take autoradiograph to locate hybridizing bands (DNA sequence of interest)
What are SSLPs?
- Simple sequence length polymorphisms
- Two variants of an SSLP - two variants have different numbers of repeats of the same sequence
- Microsatellites - repeats are 13bp or less
- Mini satellites - repeats up to 25 bp in length
How are SSLPs identified?
can use PCR to identify how many base pairs/repeats there are on a DNA strand
What are SNPs?
- single nucleotide polymorphisms
- two different alleles have single base pair changes
- one SNP for every 1000 bp - natural variation in the genome
How are SNPs identified?
through hybridization
What are three strategies for hybridization?
A) Hybridization with an oligonucleotide with a terminal mismatch -completely base-paired hybrid if no SNP - hybrid with non-base paired tail - SNP B) Oligonucleotide ligation assay - no mismatch - ligation occurs -mismatch - no ligation of DNA fragments C) The ARMs test -no mismatch PCR product is synthesized - mismatch - no PCR product
What is the purpose of a DNA chip?
allows you to look at 300,000 SNPs from all across the genome
How are genetic markers helpful?
- for forensic analysis
- Allows you to search for if DNA from a crime scene and a suspect are the same without full sequencing
- Could PCR certain fragments and cut the enzyme (RFLPs)
- Could PCR for SSLPs
- SNP analysis
What is a test cross?
an unknown x recessive,recessive -> allows genotypes to be deduced
What is FISH?
- Fluorescent In-Situ-Hybridization
- one way to physically map gene sequences onto genome of interest
- Isolate chromosomes from cells, fix them onto a microsope slide, denature them, add probe (short DNA sequence that is complementary to the gene of interest)
What is restriction mapping?
cut DNA sequence of interest with different restriction enzymes - look at figure 3.28
-very useful for short fragments of DNA
What is optical mapping?
- overcomes genome of interest unidentifiability problems
- put chromosomal DNA on molten agarose containing restriction enzymes
- agarose solidifies and DNA becomes stretched
(Restriction enzymes cut in difference places, but DNA fragments stretch with the gel)
-Fluorescence microscopy - DNA molecules with restriction sites visible
What is a clone library and how is one created?
- A clone library is a collection of vectors carrying DNA fragments of several kilobases
- Insert DNA fragments of different sizes into plasmids
- Each colony contains multiple copies of one recombinant DNA molecule
How do you identify whether components of a clone library are closely linked or not?
- PCR reaction to detect markers
- If closely linked many clones will be positive for both markers
- If far apart a clone would rarely test positive for both markers
What was Frederick Sanger’s first discovery?
- determined the peptide sequence of insulin by degrading the sequence
- tried to do this for DNA but degrading the sequence wouldnt allow you to solve the sequence of the molecule
What does traditional Sanger Sequencing involve?
- incorporate a dideoxynucleotide
- run four separate reactions with different DDNTPs
- run the reactions on a gel to deduce their size
What does more modern sanger sequencing involve?
- Label ddNTPs with fluorescent labels
- Separate labelled fragments by size in a capillary and use a detector, which notes which fluorescent bands move past the detector and in which order
What is a ddNTP?
dideoxynucleotide - a nucleotide with a chemically altered base that has an -H in place of -OH on the 3’ end
What are some features of the polymerase used in Sanger sequencing?
1) High processivity - length of the polynucleotide that is synthesized before polymerase terminates - polymerase goes for a while before it falls off
2) No 5’-3’ exonuclease activity
3) No 3’-5’ exonuclease activity - could remove ddNTPs at the end
What are some examples of the primers used in Sanger sequencing?
- Taq polymerase - processivity of 750bp
- Forward, reverse and internal primers may be used
- Universal primer - vector plasmid has primer site, located upstream of where DNA is inserted
How do you find where the genes are located in the genome?
- to find which genes encode for protein start with mRNA and transcribe to DNA
- RNA -> cDNA (coding DNA)
- Use reverse transcriptase-polymerase that uses RNA as a template to make DNA
- Can use cDNA sequence as a probe, and use a FISH experiment to find where this sequence is located in the genome
What is a radiation hybrid?
- Radiation hybrid is a technique used to map the human genome
- Expose the chromosome to be mapped to X-rays which allows it to become fragmented
- Fragments are fused into a different species (possibly hamster)
- Able to test hybrid cells for markers, if 2 markers are present in the hybrid it can be assumed they are close to each other because they likely crossed over together
What was the main challenge with sequencing the human genome?
- the human genome is 3 billion base pairs
- a sequencing read is only 750 bp - very short fragment
- must have multiple reads of any one segment because DNA polymerase sometimes makes mistakes
How is overlapping termed when sequencing a genome?
-6x coverage = every nucleotide is present in 6 different reads
Who was the human genome project a race between?
- Craig Venter - private company Celera, shotgun sequencing
- Francis Collins - NIH, hierarchical shotgun sequencing w/ genome map
Who and on what was shotgun sequencing first tested?
Craig venter first tested this process on a bacterial cell called Haemophilus influenzae (1.8 million bp)
What is the process of shotgun sequencing?
- Extract DNA from bacterial cells
- Sonicate it
- Run DNA sequences on a gel
- Extract the sequences 1.6-2 kb in size
- Insert the sequences into plasmids to create a clone library
- Assemble short reads into contigs
- Attempt to close gaps between contigs by designing primers for the ends and running on PCR to see if a product is produced
- Generate a second clone library with a different vector
- Probe a second clone library with pairs of oligonucleotides - if a band forms they have been joined
- Sequence the product
What confirmed that shotgun sequencing works?
in 1995 Craig Venter’s group published the full genome of the Haemophilus Infuenzae
What are the pros and cons of shotgun sequencing?
- pro - doesnt require a genome map
- con - hard to sequence through repeitive DNA sequences
What is sonicating?
breaking DNA up into fragments of various sizes by exposing it to a high frequency sound that induces fragments
What are contigs?
different, non-overlapping portions of the genome used in shotgun sequencing
How do you close a gap between contigs in shotgun sequencing?
design primers fo the ends and see if a product is produced on the cloned DNA fragment
What is the process of hierarchical shotgun sequencing?
- First prepare a clones library of large DNA fragments 300kb (1.6-2kb) - clone into BACs
- Next probe clones for a particular sequence and identify all the clones with a particular sequence
- Use chromosome walking, probe for the next contig in a sequence (to find clone contigs)
What are overlapping, large fragments of DNA called?
clone contigs
When was the first publication of the human genome?
Feb 15 and 16 Science and Nature
What did we learn from the human genome project?
complexity doesnt come from the number of genes but rather when genes are tuened on and how theyre spliced
What percent of our genome doesnt encode for genes?
98%
What strange genes were identified through the human genome project?
- ancient viral genomes were retained
- psuedogenes - genes that were once functional, but became mutated or inactivated
What is an Alu sequence?
a 300bp sequence that reappears millions of times in our genome
What percent of our genome is similar to chimpanzees?
96%
What are some differenced between chain termination sequencing vs next generation sequencing?
1) Don’t need to create a clone library - create a DNA library
2) Sequence all DNA fragments at the same time
“massively parallel”
3) Reads are shorter - 300bp
What are the steps of NGS?
- Genomic DNA is sonicated into small fragments
- Adaptors are ligated onto DNA fragments
- DNA is denatured into single strands and attached to a chip
- PCR reaction is carried out on the entire chip (Glass slide method)
- Reversible terminator nucleotides are used to sequence PCR products
- Add nucleotides one at a time, they glow, and a camera takes an image, then they are removed
- This last step is repeated over and over again and millions of clusters may be sequenced at once
What can be used instead of a chip in NGS?
- a bead could be used instead of chips
- to do this attach beads by strptavidin-biotin linkages
- create an oil emulsion with one bead that has one DNA molecule attached to it
What is illumina sequencing?
- A form of NGS
- DNA sequencing reads are short, about 300 bp in length
- 2,000mb of DNA sequences
What is type 1 sequencing?
- A form of NGS
- Add bases sequentially and degrade whichever bases are not incorporated
- Camera watching, captures chemiluminescence
- Reads up to 1000 bp
What is solid sequencing?
- A form of NGS
- Relies on hybridization rather than DNA synthesis
- 1024 5-mer oligonucleotides used
- can’t individually label all of them differently so this process must be repeated
- very accurate, but can’t do it for long sequences of reads
What is a newer strategy of NGS (techonolgical advance)
- camera that keeps up with the speed of replication
- watches fluorescently tagged nucleotides be incorporated
What is a strategy of NGS that allows you find composition of a DNA strand without replicating it?
- Nanopore with helicase above, how ions flow through the nanopore is impacted by the bases
- Allows you to directly read the sequence of DNA without repicating it
What are histones?
special proteins which wind DNA sequences, positively charge (binds negative DNA)
What are the histones?
There are 8 histones: 2 H2A, 2 H2B, 2 H3 and 2 H4
What is a nucleosome?
histone + DNA
Describe how histones and DNA make up the nucleosome?
- 140-150bp of DNA is wound around histones- makes up the nucleosome
- 50-70 bp is linker DNA
What do nucleosomes form?
30nm chromatin fiber
-solenoid or helical ribbon model
What are some components of a chromosome?
Chromatid - an arm, telomere, centromere
What is a karyogram?
a depiction of all of the chromosomes of a particular species
How are chromosomes numbered?
by their size (1 largest - 22 smallest)
What are macro and microchromosomes?
Macrochromosomes (ours) longer than 50mb, microchromosomes shorter than 20mb
What is a holocentric chromosome?
multiple structures that act like centromeres
What are some staining techniques used to produce chomosome banding patterns/
G-banding, R-banding, Q-banding, C-banding
What was used to analyze chromosomes before sequencing/
staining
What are 3 strategies used to count genes?
Bioinformatics, homology, and transcript mapping
What is bioinformatics?
-computational analysis of a genome sequence
Describe the process of bioinformatics.
- Search for open reading frames (ORFs)
What is an open reading frame? How can you filter for open reading frames? What is a challenge when searching in Eukaryotes?
- DNA sequences that likely encode for a protein, look for ATG ->TAG or other termination codon
- There are 6 possible reading frames when looking at a sequence: 3 fwd,3 rev
- Can use length to filter
- Splicing in Eukaryotes complicates ORF scanning
What is a consensus sequence in ORF scanning?
sequence that shows the most frequent nucleotide at each position - allows known/likely splice sites to be projected
What are some strategies for ORF scanning in Eukaryotes?
- Look for Exon-Intron boundaries
- Codon bias - look at popular codon choices
- Upstream regulatory sequences (CPG Islands w high GC content upstream of ORFs)
What is an alternative strategy for ORF scanning?
Can also scann for non-coding RNA, and this may be helpful in elucidating the structure of some of the RNAs
How does homology allow us to count genes?
- all living organisms descend from a common ancestor
- Homology search in other species
- “zoo blot” -run genomes of many species on a gel
- use a radioactive DNA probe to reveal whether or not the sequence of interest is present in other species
What genes are conserved between species?
- Exons (coding part of the genome) are highly conserved
- Introns (intergenic/between genes) are less conserved, more variable between species
What is transcript mapping? What techniques does it rely on?
- If an ORF encodes a protein, we should be able to isolate complementary mRNA
- A Northern blot is used to detect the presence of mRNA
Describe a Northern blot. What is it used for?
- Used to search for mRNA in transcript mapping
- DNA sequence of interest is turned into a radioactive probe
- RNA is extracted from cells, run on denaturing agarose gel electrophoresis, blotting/northern hybridization w DNA probe, DNA probe hybridizes to a single RNA transcript
What are bioinformatics, homology, and transcript mapping used for?
Used to confirm that a gene sequence has a protein product
What is RACE?
Rapid Amplification of cDNA Ends
-used to located the end of a coding sequence of a gene
Describe the process of RACE.
- A primer is annealed to an RNA strand
- cDNA synthesis occurs with reverse transcriptase
- Denature and remove cDNA strand from RNA template
- Add As to the end with terminal transferase
- Anneal an anchor primer (with complementary Ts)
- Second strand synthesis w/ Taq polymerase
- Continue for standard PCR
- Sequence PCR reaction - will now include 5’ end of gene
How dense is the human genome compared to other species?
Very few genes compared to other species
Are genes evenly distributed over chromosomes?
No, Gene deserts exist sequences of DNA with low density of genes
- chromosome 13 has density of 3/mb
- chromosome 19 has a gene density of 22/mb
- not many genes are present in the region close to the centromere
What are some other components that make up chromosomes other than coding DNA?
- LTR -long terminal repeats
- SINES - short interspersed nuclear element
- LINES - long interspersed nuclear elements
- transposons - filler DNA not encoding for genes
- Exons - proteins coding - take up very little space
Describe the trends in gene organization and variation.
- There is an overall trend that as organisms go from simpler to more complex the genome size goes up, however there is also a great deal of variation between organisms
How is splicing different in humans?
Alternative splicing - occurs in our cells more frequently than in other organisms
Involves joining exons in different ways, 75% of human genes are alternatively spliced
What are gene families?
groups of genes of identical or similar sequences
What is an example of a gene family?
- the globin family - hemoglobin
What is a pseudogene?
a sequence of DNA that resembles a genuine gene but does not encode a functional RNA or protein
- may be duplicated - one or several members of the family are non-functional
- may be unitary - no family members to compensate
What are examples of tandemly repeated DNA?
minisatellites, microsatellites
This DNA is repeated immediately in a row
What are genome browsers and what are some examples?
Genome browsers are software packages that display annotation of genomes
- Genbank -by NIH
- Ensembl -Sanger Institute
- UCSC Genome browser - UC Santa Cruz
What are examples of interspersed repeats?
SINES, LINES, LTRs, and transposons
- interspersed throughout the chromosomes
What are two types of repeats in chromosomes?
tandemly repeated DNA - like a tandem bicycle (back to back)
interspersed repeats - throughout chromosomes