Lecture 10 Flashcards
genome construction
“shotgun” approach
3 steps: fragmentation, sequencing, and assembly
fragmentation
cut the genome into small pieces
physical shearing and enzymatic methods
physical shearing
nebulization (DNA shreds to fit into a size-adjustable whole & most commonly used method)
sonication (soundwaves that fracture back bone of DNA, won’t lose any DNA)
Enzymatic methods
a chemical method
nuclease mixes (Fragmentase)
cuts DNA
DNA sequencing
determining nucleotide composition of DNA
first, second, and third generations
first generation sequencing
developed by Dr. Fred Sanger in 1977
PCR with Dideoxynucleotides (replaces OH with H– terminates DNA replication)
manual: four separate reactions one for each dideoxynucleotide & gel electrophoresis
automated: single reaction, uses fluorescent markers, and laser detection of sequence
second generation sequencing
Adv: massively parallel (different sequences run together) and high throughput (100 times faster & 33 times cheaper)
(Still) requires amplification of DNA samples
Methods: emulsion PCR and bridge PCR
bridge PCR
still use today
a flat piece of glass and bits of DNA attached to glass (called flowcell)
Illumina platform sequencing (get fluorescent signal where a nucleotide is added, read by computer, and rapid advancement)
third generation sequencing
adv: massively parallel (many sequences at the same time), extremely high throughput (“real-time” results), and does NOT require amplification of DNA samples (can sequence DNA immediately)
methods: SMRT technology (DNA polymerase) and nanopore technology (transmembrane proteins)
Assembly
reconstructing genomes
combine short, overlapping DNA sequences
computer aided sorting
Types of assembly
reference alignment
- comparison to known genome
- must be a closely related organism
De Novo assembly
- novel genome construction
- no closely relative required
De Novo assembly
how many times the genome is sequenced
- number of sequence reads per nucleotide
completion
- how much of the genome is sequenced
- closed (complete sequence, no gaps) v. drafted (incomplete sequence, gaps)
coverage
- how many times genome is sequenced (# of reads per nucleotide)
bioinformatics
analyzing and storing DNA/protein sequences
- utilizes powerful computational tools
comparative analyses
- genome size, content, and organization
bottleneck
Small genomes
140,000 to 1,000,000 bp
170 to 1,000 ORFs
- endosymbionts
- parasites
Large genomes
5,000,000 to 13,000,000 bp
5,000 to 9,000 ORFs
free-living organisms
- unpredictable environment
larger genomes
- like in eukaryotes
more genes - little “junk” DNA
- DNA replication
-translation (protein synthesis)
gene content
annotation
- predicting functional genes from DNA sequence data
- based on comparative analysis
mystery genes
- 30% ORFs unknown roles
- e. coli
genomics
all genetic information in the cell
metagenomics
all genetic information in an environment
- pooled DNA from an environmental samples
- includes genes from many different organisms
transcriptomics
expressed genetic information
- study of total gene expression (genomes = list of parts)
- Microarrays (gene chips)
applications: study of pathogenic bacteria and human cancer cells
proteomics
translated genetic information
study of total protein production (includes protein structure, function, and regulation)
techniques: in vitro (separate and ID proteins) and in silico (predict proteins from DNA)
Microarrays
silica chips containing different genes
sample mRNA hybridizes + fluoresces
- fluorescence indicates expression of each gene
- track expression levels of individual genes
genome evolution
changes in genome content over time
through gene duplication and deletion
horizontal gene transfer
gene duplication
segment of DNA copied in the genome
- main mechanism of new gene evolution
- one copy remains unchanged + functional
- one copy changes (mutates) to a new function
gene deletion
loss of a segment of DNA in the genome
- common in endosymbionts and parasites
- dependence on hosts result in “useless” genes
– no selection pressure to retain these genes over time
Pathogenicity islands
clustered virulence genes transmitted horizontally
pathogenic bacterial strains
core genome
essential genes
in all strains of a species
pan genome
non-essential genes
in some strains of a species