Module 12 Flashcards
Shotgun sequencing
next-gen and 3rd gen sequencing techniques used
- shear DNA into short sequences
- sequence by next gen
- assembler software looks for sequence overlaps between fragments to assemble them into larger fragments (contigs)
- now the preferred way of sequencing genomes, but has problems with repetitive DNA sequences
- long-read sequences (e.g., nanopore) often used to overcome this problem
- help with assembly and alignment of short reads
What is the ‘read’ or sequencing’ depth?
- the # of times a particular base is represented within all the reads from a sequencing run
- greater read depth gives more confidence a base is accurately read - ‘base calling’
- genome sequenced for 1st time: read depth is usually several hundreds - thousands
- after further sequencing (resequencing), much lower read depths are ok
How is next gen sequencing used for transcriptomics/ gene expression analysis?
- isolate mRNA
- convert to cDNA
- shear cDNA
- sequence by next-gen
- bioinformatics software sorts sequences into different genes (the ‘transcriptome’)
- number of times each gene appears in sequence data = measure of degree to which that gene was being expressed in the individual/ tissue being studied
How is the Sanger technique of DNA sequencing used in identifying species?
mitochondrial DNA (mtDNA) COI gene = most commonly used gene/DNA for identifying ANIMAL species
How is next-gen sequencing used for studying microbiomes?
- isolate DNA from an environmental sample
- amplify microbial sequences using primers that amplify 16s rDNA gene
- sequence using next gen (e.g., illumina)
- run data through databases to see what species are present, and in what relative abundance
How is environmental DNA (eDNA) detected?
- using next gen and qPCR
- isolate DNA from environmental samples (e.g., filtered water)
- using appropriate primers, informative DNA sequences can be amplified and then sequenced => species present can be identified using species-DNA databases
- particular species can also be targeted using taxon-specific primers, followed by qPCR
- can also detect eDNA in air samples
Why do we study genetic variation at the molecular level?
- to determine the genetic basis of inherited diseases or phenotypic traits
- to study the relatedness of individuals or populations, and degree of intermixing of populations (population genetics)
- to identify individuals (wildlife ecology)
- parentage analysis or inferring pedigrees
- to identify criminals (forensics)
What are minisatellites, what were they used for?
- used for DNA fingerprinting
- consist of 10-100 bp sequences that are repeated many times in tandem arrays
- minisatellite arrays (‘loci’) have extremely high allelic variation, due to frequent mutations involving slippage errors and/or unequal crossing over
DNA profiling/ fingerprinting used to be done with minisatellites, what is used now?
microsatellites; also known as short tandem repeats (STRs) and simple sequence repeats (SSRs)
Microsatellites
- like minisatellites, but shorter sequence repeats (2-5 bp)
- arrays show a lot of allelic variation, due to slippage mutations
- arrays can be amplified using PCR
Microsatellite genotyping
(STR, SSR)
- PCR primers designed for flanking sequences
- primers are fluorescently labeled
- amplify products of different sizes
- separate products by electrophoresis
- genotypes identified by size of products
co-dominant:
- heterozygotes produce 2 bands, meaning both alleles are detected
- usually use same capillary electrophoresis machines used for dideoxy sequencing
- multiple microsatellites amplified at once, using primers labeled in different fluorescent colours (= multiplex analysis)
- 13 ‘standard’ microsatellite loci are used in criminal forensics => detect enough variability to distinguish all human individuals (except identical twins)
Use of microsatellite “DNA fingerprinting” in criminal forensics
- PCR-based microsatellite genotyping requires only tiny amounts of DNA: ideal for forensics
- DNA-based methods have helped convict criminals and exonerated many more innocent suspects
- methods are so sensitive, though, that contamination can be a problem
How can microsatellites and mitochondrial DNA be used to establish identities in forensic analysis?
- human remains after disasters or crimes are sometimes badly damaged => need genetic tools to distinguish identity
- DNA can be obtained from bones or teeth
- microsatellites (and other nuclear DNA markers, e.g., SNPs) can enable identification via kinship analysis to relatives
- maternally inherited mtDNA can also be used to establish close relationship via maternal linkages
How can microsatellites cause genetic disorders?
- usually have no effect on health => are selectively neutral
- occur outside of exons (in introns or (mostly) between genes)
- a few cause diseases => all cases: loci involve trinucleotide repeats within genes or other important DNA sequences
- we all have these microsatellite loci, but healthy people have versions (alleles) with a small number of repeats
- humans with disorders (genetic) have versions with too many repeats => cause production of abnormal proteins
- ex: Huntington’s, myotonic dystrophy, fragile X syndrome
How are restriction enzymes used to detect DNA polymorphisms?
(restriction fragment length polymorphism [RFLP] analysis)
- mutations either create or destroy restriction endonuclease sites
- gain or loss (restriction site polymorphisms) can be detected using gel electrophoresis
- restriction site polymorphisms most commonly caused by single nucleotide polymorphisms (SNPs)
Single nucleotide polymorphisms (SNPs)
- caused by single base mutations, most common genetic variations in genome
- occurs about every 800-1000 bp in human DNA
- any 2 randomly chosen humans will have different SNP alleles at several million SNP loci
- usually di-allelic (e.g., and ‘A’ or ‘G’ at a particular position)
- SNPs close to each other on chromosome are usually inherited together (because of limited recombination) forming ‘haplotypes’
- haplotype is an arbitrarily long stretch of DNA characterized by particular alleles at the SNP positions in that sequence
- current technologies allow many SNPs to be genotyped simultaneously
SNP chips
- used to genotype large numbers of SNPs
- aka microarrays; designed to allow many SNPs to be genotyped at once
- use DNA hybridization-based assay to determine genotypes at known SNPs
- have become general method of choice for rapidly screening thousands-million SNPs (loci) at once
Genetic basis of a trait: simple vs complex
simple: entirely or mostly determined by one gene; ex: ear wax type
complex: most!; influenced by many genes interacting with environment
- genome-wide association (GWAS) used to find genetic links (predictors) to diseases (or traits)
- look for SNPs that have alleles correlated with presence of disease/trait
- need to survey many SNPs and many individuals
- basically look at a particular allele in people with a disease and compare frequency to people without disease
What does CRISPR-CAS do?
it functions as a bacterial defence against foreign (mainly viral) DNA; designed to target specific DNA molecules, like our adaptive immune system
What is a palindrome?
a sequence of DNA that reads the same from 5’ => 3’ on both strands
How does CRISPR immunity work?
- spacer acquisition (‘adaptation’): a bacteriophage injects DNA which is converted to a spacer that is put in between palindromes of the CRISPR array (inserted upstream)
- expression of crRNAs: RNA transcript is made from CRISPR array (= pre-crRNA) => cut into crRNAs that each contain one spacer from a foreign organism (one palindrome and one phage RNA together); Cas9 gene is expressed to generate the Cas9 enzyme
- interference: 3 components (Cas9 enzyme, crRNA, tracr RNA) that form the effector complex; crRNA loaded into Cas9 enzyme, binds with tracr RNA, match up with invading phage genome, DNA opens up and Cas9 cuts it
PAM site
- protospacer adjacent motif
- NGG (N = any nucleotide) next to spacer sequence (usually 3 bases long)
- tells Cas system to open DNA to see if there is a matching sequence (to crRNA)
- not found in CRISPR DNA array
- simple and common elsewhere
What was the key innovation in CRISPR technology?
- substitution of chimeric gRNA in place of natural crRNA and tracr RNA
- chimeric = combo of different things)
- the 20-ish bases at the end of gRNA are specific to a target sequence in the genome to be edited
Genome editing with CRISPR-Cas
- (s)gRNA designed to target a specific sequence in genome
- sgRNA assembles with Cas9 protein to form effector complex
- effector complex first finds a PAM, then Cas9 unwinds DNA immediately upstream of PAM => if target sequence is present, 20 b 5’ end of sgRNA pairs with it
- Cas9 makes ds-cut in genome
- cellular DNA repair mechanisms engaged with 2 possibilities: 1) broken ends can be rejoined without any template [non-homologous end joining, NHEJ], 2) broken ends can be rejoined using a template [homology directed repair, HDR]