New Lectures for the Final Flashcards
What are restriction enzymes?
Enzymes that recognizes a specific sequence of bases anywhere within the genome and cuts sugar-phosphate backbones of both strands
What are restriction sites?
- sequences recognized by restriction enzymes
- usually 4 – 8 bp of double-strand DNA
- Often palindromic – base sequences of each strand are identical when read 5’-to-3’
What are blunt ends vs. sticky ends?
• Blunt ends – cuts are straight through both DNA strands at the line of symmetry • Sticky ends – cuts are displaced equally on either side of line of symmetry – Ends have either 5' overhangs or 3' overhangs
How is the length of fragments generated by restriction enzymes calculated?
General formula for fragment length is 4^n, where n is the number of bases in the recognition site
What are some mechanical forces that can create DNA fragments?
– Passing DNA through a thin needle at high pressure
– Sonication (ultrasound energy)
What information does DNA Gel Electrophoresis provide?
- relative size of DNA fragments
What are the three main features of plasmid cloning vectors?
– Origin of replication – A selectable marker gene (for example antibiotic resistance) – A synthetic polylinker, DNA sequence containing multiple restriction enzyme sites
How is DNA inserted into plasmid vectors?
Digestion of the vector and human genomic DNA with a restriction enzyme results in complementary sticky ends that are put together by DNA ligase
What is Sanger Sequencing?
Gene sequencing technology
What materials are needed for sanger sequencing?
- single-stranded DNA fragments
- hybridized templates and primers
- DNA polymerase
- dNTPs
- ddNTPs (unique fluorescent tags attached)
What is different about ddNTPs from dNTPs?
- ddNTPs lack a 3’ -OH
- halts polymerization
What are the results of Sanger Sequencing?
- DNA fragments separated by gel electrophoresis
- Each DNA fragment is tagged at the 3’ end with a ddNTP attached to unique fluorochrome
- Gel is read by lasers and a computer
- computer puts together DNA sequence based on fluorophores from different length fragments
What is a BAC?
- Bacterial artificial chromosomes
- alternative cloning vector
- carries large inserts
What is the Shotgun strategy?
The shotgun strategy takes DNA and breaks it up into fragments that are then constructed into a BAC library. A computer then sequences all the fragments of DNA and constructs an entire genome based on overlapping sequences (contigs)
Why does a BAC clone give you two sequence reads rather than one? (Paired-end sequencing)
DNA inserts can be too long to sequence so ~1000 bp sequences can be read from both sides of the insert starting at the first and second primer. This also lets you know that these two sequences are ~200-300 kb apart. The read can be done again starting at the end of the two sequences that were learned before until you have reached the overlapping sequence from both ends.
Why do repeat sequences prevent correct assembly of single shotgun sequence reads?
The computer is putting together DNA fragments like they are a puzzle. Repeat regions make it impossible for the computer to differentiate certain puzzle pieces (DNA fragments), meaning there is a possibility it is put together incorrectly
How are cDNA libraries made?
- mRNA is taken from red blood cell precursors
- Add DNA dinucleotide primer
- treat with reverse transcriptase in the presence of other dinucleotides
- denature mRNA/cDNA hybrid and digest mRNA with RNAse
- 3’ end of cDNA folds back on itself to act as a primer
- The first cDNA strand acts as a template for the synthesis of the second DNA strand with DNA polymerase
- results in cDNA double helix that can be inserted into a vector
How are cDNA libraries different from genomic libraries?
cDNA libraries only contain sequences from exons while genomic libraries contain the entire genome sequence
why are cDNA levels different in different parts of the body?
different genes are expressed more or less in different cell. (e.g. brain cells will express different mRNAs than liver cells)
What is the largest DNA fragment that a plasmid can accommodate?
tops out at 20kb
What is an open reading-frame? (ORF)?
a reading-frame uninterrupted by stop codons
how long does a stretch of DNA with no stop codon need to be to indicate there is likely an open reading frame there?
4 bases and 3 bp in a codon –> 4^3 = 64 different codons
3 possible reading frames/strand -> 64/3 = 21 aa
If a frame codes for more than 21 amino acids with no stop codon, it is indicative of an open reading frame
How much of the genome is conserved between species in protein coding regions compared to non-coding regions?
Protein coding regions have a much higher percentage of conservation between species than the entire genome at large
What are examples of non-coding RNAs (ncRNAs)?
rRNAs, tRNAs, and snRNAs (small nuclear RNAs)
How are mRNAs sorted from other RNAs in eukaryotic cells?
PolyA tails can be hybridized to oligo-dT (single strand DNA fragments of 20
nucleotides made of dT only) that can be tagged for identification
Why are cDNAs sometimes misleading when it comes to determining the sequence of the gene in the genome?
- Alternative splicing means a single gene can produce different proteins, complicating the prediction of the proteome (all proteins made in an organism) - Important to sequence many individual cDNA clones from libraries made using mRNAs from different tissues
What is an exome?
The part of the genome corresponding to exons
How much of the genome is made up of exomes?
• Exome = 1.5−2%
• Remainder is introns, centromeres, telomeres, transposable elements, etc.
• Variation in genome size mostly due to changes in noncoding DNA rather than gene
number or size
What are the two types of repetitive DNA?
- multicopy tandem repeats
- transposable elements
what is junk DNA?
Repetitive DNA with no known function
What are gene-rich regions?
• Chromosomal regions that have many more genes than expected from average gene density over entire genome
• Example in human genome –class III region of major
histocompatibility complex
What are gene deserts?
- Regions that have no identifiable genes
- Largest is 5.1 Mb on chromosome 5 with no identified genes
- Biological significance of gene-rich regions and gene deserts is not known
What is the most gene rich region of the human genome?
- Class III region of the human major histocompatibility (MHC) complex
- MHC complex contains 60 genes within a 700 kb region
What are domain architectures?
- different numbers and kinds of protein domains in unique orders
- Shuffling, addition, or deletion of exons during evolution can create new domain architectures
What is a homeodomain consensus sequence?
Function of new protein can be deduced if it contains a domain
known to play a role in other proteins
What is exon shuffling?
Exon shuffling is a molecular mechanism for the formation of new genes. It is a process through which two or more exons from different genes can be brought together ectopically, or the same exon can be duplicated, to create a new exon-intron structure.
What are gene families?
Gene families are groups genes closely related in sequence and
function
How do duplications and divergence of genes create gene families?
Duplicated DNA sequence products start out identical, eventually diverge via accumulation of mutations that eventually lead to new genes with closely related functions
what are orthologous genes?
arose from the same gene in the common ancestor, usually retain
same function
What are paralogous genes?
arise by duplication, often refers to members of a gene
family
What is Homology
blanket term for all evolutionarily related sequences
What are Pseudogenes?
sequences that look like, but do not function as, genes
• Rapidly accumulate mutations
What are de novo genes?
genes without homologs
• Young genes that evolved recently from ancestral intergenic sequences
What are Syntenic blocks?
homologous blocks of
chromosomal sequence
• Mouse and human genomes diverged 85 million years ago, but
can be compared via chromosomes to visualize similarities
How does Combinatorial amplification results in greater complexity from fewer genes?
- Example – human T-cell receptor family
- DNA rearrangement combines V, D, and J segments into a gene
- Result is about 1000 different combinations
- 45 V X 2D X 11J = 990 X 2C = 1980 combinations from 60 elements
How many bp can nanopore sequence sequentially?
NANOPORE TECHNOLOGY allows sequence reads of 1,000,000 bp with modest accuracy
What are DNA polymorphisms?
sequence differences
Why is there no wild-type human genome?
- Too much variation
- The genome sequences of only three people reveal over 5 million
DNA polymorphisms
What are the 4 categories of genetic variation?
- Single nucleotide polymorphism (SNP)
- Insertion/Deletion (DIP or InDel)
- Simple sequence repeat (SSR)
- Copy number variant (CNV)