Genomic Sequencing Flashcards
Why Sequence genomes?
- To understand genetic variation with respect to phenotypic variation
- Inheritance
- Comparative genomics (ancestry/evolution)
- Forensics
- Understand genetics of extinct species
- Gives insight into normal functions of genes
- Pharmacogenomics: Tailored drug treatments for specific genome
To sequence the human genome:
Whole Genome Shotgun Approach
-Mass cloning of fragments into cloning vectors.
Whole Genome Shotgun Approach Step 1
Extract DNA from cells
Whole Genome Shotgun Approach Step 2
Cut DNA into small, overlapping fragments with restriction enzymes
- Rxn performed in suboptimal conditions, which don’t let enzymes to cut all sites
- This is why fragments overlap
- Fragments are called “contigs” for continuous sequence
Whole Genome Shotgun Approach Step 3
Clone contigs into a cloning vector to make a genomic library.
Whole Genome Shotgun Approach Step 4
Sequence each clone using Sanger Sequencing technique
Whole Genome Shotgun Approach Step 5
Use computers to reassemble sequences of the contigs by puzzling together the overlapping sequences
Whole Genome Shotgun Approach Step 6
Deposit sequence information into NCBI GenBank Database
-Public can use this because it’s paid for by tax dollars.
- AKA “Dideoxysequencing or Chain Terminating Sequencing”
- Based on DNA replication/PCR of a DNA template (what you want to sequence)
- Can be circular or linear
- Polymerase adds nucleotides starting from a primer based on complementary sequences
Sanger method of sequencing
If you don’t know the sequence, how can you design a primer?
Use a universal primer.
1) Can’t design a primer against an unknown sequence
2) Can have a universal primer that can be used for all clones.
Deoxynucleotide vs. Dideoxynucleotide
- Deoxynuc. has OH group on 3’ C, can have phosphodiester bond
- Dideoxynuc. has H on 3’ C, cannot make phosphodiester bond
- Incorporation of ddNTP causes synthesis of that new strand to stop
What’s happening in the PCR tube?
There are:
- polymerase, plasmid, primer, dNTPs, ddNTPs
- fluorescent molecules tag end of sequences
After reaction is complete
Array of products with fluorescent molecules attached are separated by size, using a process called capillary gel electrophoresis
Gel-filled capillary
- when charge is applied, larger products congregate at top and smaller products congregate at bottom
- Smaller products come off from bottom which is when fluorescent molecules are detected.
Capillary Gel Electrophoresis
Reading a capillary gel electrophoresis
- different colored peaks represent a different base
- read the sequence by the order of the colored peaks
- can be some overlap
- read left to right
Final Step: reassembling the sequence
Repeat Sanger sequencing for each clone in the library and then reassemble the contigs using overlapping sequnces.
Things we have learned:
-The sequences of “simpler” organisms like yeast, bacteria, flies, and mice
-3.2 billion basepairs
-About 20,000 protein coding genes
-About 5,000 genes do not code for protein
code for: microRNA, exRNA, tRNA, rRNA, etc…
- Introns are large (can be >100kb)
- Genome is only 2% genes (but 98% isn’t junk!)
- Average gene is 3,000bp (largest is dystrophin=2.4million bp)
- Genes are clustered together on chromosomes
- People have 99.9% of their sequence in common.
What we haven’t learned:
- Long stretches of repeated DNA sequences that were hard to reassemble
- genes vs. pseudogenes vs. dubious ORFs
-What a gene product actually does
Can find out by:
-compare to a known gene product
-mutate gene product and study it
Looks like a gene but doesn’t make a gene product.
Dubious ORFs
Mutated so much that it can no longer make anything.
Pseudogene
How do we find protein coding genes (versus all the other sequences in the genome?)
- Compare the cDNA library to genomic library
2. Use computer algorithms to look for consensus sequences.
Use computer algorithms to predict Open Reading Frames (ORFs)
-Looks for TATA, Start, Stop, certain percentage of GC (genes tend to have more GC than noncoding regions)
Use of Computers to annotate genes
Identification and description of genes and their important sequences
Goal: assign functions to all of the genes of an organism
-Understand variation w/in and among organisms
-Identify where traits come from
Annotation
Alternative sequencing to Shotgun sequencing
- Next generation sequencing
- Exome sequencing
- Analyze genetic markers throughout the genome (SNPs)
- Fast and Cheap sequencing method
- Pyrosequencing
Next Generation Sequencing
General steps for Next Gen. Sequencing
- Extract DNA
- Cut to overlapping contigs
- affix DNA to solid support
- one-by-one washings of dNTPs across the DNA
- If that known dNTP is incorporated, then light is emitted
- Reassemble by overlapping sequences
A specific region of DNA that varies among individuals
ex. SNPs are present 1 in every 1000 bp of DNA
- used to create a detailed map of the individual’s genome.
DNA Markers
Set of SNPs that are close together on a chromosome
Haplotype
Within a family, haplotyes are rarely scrambled by:
genetic recombination
Group of individuals that share a common ancestor because they all have similar haplotypes
Haplogroup
SNP used to represent an enire haplotype
aka. diagnostic SNP
Tag SNP
Is a way to look for a whole bunch of SNPs at once in a genome.
SNP Chips/ Array
- More to do with a population than with individuals
- Is a collection of all the combinations of haplotypes present in a population
- Used to study inheritance of complex traits
- Used to study evolutionary relatedness
Haplotype map (hapmap)
Ethical concerns?
- Misconceptions about genetics by the layperson?
- oversight of personal genotyping services
- Insurance regulations?
- Patenting of genes?
- Are some people “better-suited” for certain careers based upon their DNA?
- Should certain people be discouraged from having children?