microbial genomics Flashcards
what is human genome projecT?
1990 - goal to sequence al 3*10^9 base pairs of human DNA
- completed in 2003 - thought it would take 30 year.s
who did HGP?
public: US govt, Francis Collin
private: The institute Craig venter
first completed genome sequence was?
haemophilus influenzae
- respiratory disease. took 8 years thru private funding
venter style?
- break dna into pieces, sequence all at same time. computer sees where overlap + sticks together where overlaps.
= contigs : overlap
= less accurate bc gaps btw regions + repetitive sequences.
but cheap + fast
collins style?
simplistic approach
- sequence DNA. find end of gene. create probe that extended from end of piece to start of next piece
slow, systematic. had to create probe between each finding.
sequencing DNA
sequence thousands of clones at one. 7-10 times coverage of entire genome
- assembles contigs using computer algorithms
- fill in gaps using targeted methods - chromosomal walking
what’s sanger sequencing?
DNA polymerase in soln copies after primer.
di-deoxynucleotide (no -OH) addd to soln. DNApol can’t copy if no -OH.
separate based on size. know based on dideoxy what original nucleotide is.
shorter strands at beginning of sequence
next gen sequencing
- increase in amount of genome/metagenome sequence. (many sequences at a time)
- non-specialist labs able to use (signal when nucleotides incorporated; quick, cheap)
- useful for resequencing (comparative - looking for disease)
- shotgun sequencing + computer power.
structure of ORF
open reading frame
- approx 300 bp before stop codon.
ORF =/= gene. may be, but doesnt need to be. if transcribed + translated = gene.
using ORF
- comp finds codons
- computer finds possible stop codons
- combuter counts codons between start + stop.
- computer finds possible RBS
- computer calculares codon bias in ORF
- computer decides if likely to be genuine
- comp gives list of probably ORF
ORF content to genome size
greater genome size = more ORF content
lifestyles of bacteria
endosymbiotic - live in other cell. use host DNA, but have their own
parasitic: may grow inside cell. can’t grow without host DNA
Free-living: independent. dont need other organisms. usually larger than others. fitness cost
gene annotation
compare ORF, if similar sequence annotated as similar function.
- gene annotations help reconstruct metabolic pathways + determine gene complement
problem with gene annotation?
2 sequences may have diff function
- if dont have sequence similar, maybe protein is still same function
pathogens + growth factors
from host cell.
no genes for amino acid biosynthesis
- no genes
what is URF
ORF with unknown function
Re-constructed Genome map
ORF’s in opposite directions.
analyze map of bacteria + determine what kind of bacteria 1
- sugar transport
- peptide transport
- flagellum
- NH4+
Zinc, Fe.
glycolysis, PPP
fermenter. lactate, no O2
analyze map of bacteria + determine what kind of bacteria 2
TCA cycle. has ETC flagella, pilli have NADH oxidizer
analyze map of bacteria + determine what kind of bacteria 3
calvin cycle, carbon fixation.
citric acid cycle
photosynthetic -
NH3 + O2 at 1st etc,
ammonia-oxidizing bacterium.
chemolithoautotroph
aerobe
ORFs in bacterial genome : genome vs function
DNA replication: same amount needed regardless of genome. proportionally decrease tho. same with translation
trancription + signal transduction : more regulation, more pathways when there’s bigger size.
energy: about stable - increase genome doesnt change energy much
what are homologous genes
from same ancestor
diff btw paralog and ortholog
paralog: gene duplication: gene stays within same species or organism (a and b-heme)
- 1 gene retains function, other may vary/pseudogene.
- ortholog: copies of same gene, split into diff species same function in 2 diff species.
define synteny
similar gene order
- more closely related = more syntenous
core genome vs acnillary genome vs pan genome
core: shared by all strains of same species (specific to group of interest)
ancillary: foudn in some strains of same species (one’s not common to all)
pan: all genes found in group
looking for HGT - look for?
gene islands: GC diff than rest of chromosome, codon usage bias. flanked by inverted repeats. transferred by HGT
transposons: move around, inverted repeats.
pathogenicity islands: turn normal into virulent factor block chromosomal genes that encode virulense
what is metagenoics
focus on individual genes previously.
-all genes in enviro. extract DNA and reconstruct community
ocean metagenomic analysis
look for DNA from cloned genes linked to rRNA gene to ID what organisms might be doing
SR: abundant, couldnt culture.. found RNA gene in it + sequenced - related to rhosopsin.
- uses light energy to pump Cl- out.
when put that gene into e.coli - could pump H+ across using light. photoheterotrophs use to gain more energy from food they have. little nutrient in enviro, light gives energy
human microbiome size
10-100 trillion bacteria mostly in gut.
connection btw human health + illness
bacteroides and obesity
bacteroides + firmicutes = break down food.
obese: less bacteroides + more firmicutes than slim ppl
bacteroides increase as obese ppl lose weight
diff btw bacteroides + firmicutes?
firmicutes get more energy out of food =
sequence analysis: bioinformatics
computational analysis of genome data - genome content, structure, arrangement - predict protein structure + function - produce annotation to determine that location of genes (databases contain gene info) = in silico analysis.
reverse transcriptase
enzyme from retrovirus used to produce DNA copies “cDNA” of their genome to integrate into host genome
comparative genomics
look at groups of organisms relate to each other.
- ID the overlaps in genome. find core + ancillary genomes
mutations in E.coli - evolutionary pressures
150000 generations + Id’d similarities.ID’d mutations in relation to ancestral condition
Total mRNA
extracted from control + experimental cells to determine changes in gene expression
creation of genomic microarray + creation of labeled probe representing all mRNA in cell at time
genomic DNA -> PCR -> probe hybridization -> signal detection -> data analysis
probe hybridization: make cDNA porbe and label
Microarraws: detecting differential gene expression
differently labeled cDNA from each culture mixed + hybridized to array.
- labeled probe finds complementary sequence. hybridizes by normal DNA base-pairing rules
- fluorescent signal detected where probe is hybridized.
= require several replications and verigication by other methods
Transcriptome of P.aeruginosa
- assemblage of microarray data
- global gene expression from cells grown wiwith hgih vs low levels of Ca2+
RNa sequence
next gen - avoid microarray
- sequences transciptome with rev.transcripted cDNA
- quantitatively determine copy numbers of genes - tell what’s up/down regulated in experimental relative to control
what are proteomics?
dtermining differences in translation
= what proteins are being made?
proteome?
proteins present in cell, tissue, or organism at any one point in time
mRNA copies in glycolysis vs sporulation
glycolysis, some genes on, some off.
sporulation - all increase
diff regulation to diff levels
proteomics interactome
single cell genomics
- > all proteins being produced in community.
- > looking at genome only within one cell