M3 L19: Genomics and sequencing Flashcards
what is a genome
full haploid seq of DNA in a species
what can genomes tell us
inform understanding of gene function, inform understanding of evolution, inform understanding of microbial ecology for unculturable microbes
what are the 2 original methods for seq a genome
clone by clone
shotgun sequencing
what’s the clone by clone approach? pros and cons?
break a genome into large frags via partial restriction digest –> insert in a large vector (BAC) and clone –> break large frags into small frags –> subclone small frags in plasmids and sequence –> assemble chromosome
pro: reliable
cons: slow, cost ineffective
what’s the whole genome shotgun approach? pros and cons?
break genominc DNA into small fragments –> seq everything at high coverage –> assemble overlapping sequences
pros: cheap
con: less accurate / assembly is harder bc genomes are redundant
what are the modern genome sequencing techniques
1) illumina
2) pac-bio
3) oxfofrd nanopore
when to use illumina? pros and cons?
pros: cheap per bp, low error rate 1%
cons: short reads 150-250 bp (can’t sequence genome de novo)
when to use pac-bio sequencing? pros and cons?
can seq genome de novo
pros: longer reads 15 kb+
cons: more expensive and higher error rate 10% but errors are random –> take the consensus of all seqs to figure out the correct base
when to use oxford nanopore? pros and cons?
can seq genome de novo
pros: very long reads 30 kb+
cons: also very expensive and high error rate 10%; errors are systemic, not random so can’t correct by sequencing more reads
best method for sequencing genome de novo?
mix of short and long read techniques
long –> assembly
short –> accuracy
what is GWAS do you have to assemble the whole genome?
genome wide association study
don’t have to assemble genomes, just sequence all genomes and compare diseased and non diseased to reference
what is an example of the evolve and resequence technique? what’s a chemostat?
evolve two species of yeast in sulfate limited conditions –> sequence their genomes before and after –> mutations that reoccur frequently are likely adaptive
a chemostat is an apparatus that continuously adds new and removes old growing media
what other species were sequenced as part of the human genome project
drosophila melanogaster
mus musculus
c. elegans
saccharomyces cerevisiae
E. coli
what is metagenomics? what’s it used for?
sequencing all genomes in an environmental sample to determine what species are present (especially for unculturable bacteria
what is reverse ecology
sequence environmental sample and identify most optimized genes –> can infer those are the ones that are most important for survival
what is the great plate count anomaly
when you culture an environmental sample, there are way less colonies than actual microbes in the sample
what is the baas becking hypothesis
NS is the strongest force in determining microbial ecoloty
everything could live anywhere but the environment selects
support for and against the baas becking hypothesis for fish in a lake
for: if different fish in the same lake that eat different things have different microbiomes (dif microbiome bc dif food = dif enviro)
against: if fish in the same lake that eat different things have the same microbiome (would mean microbes can’t physically disperse)
what is gene annotation? how do you do it? con to this approach?
determining if a sequence is a gene
look for reading frames that code for more than 50AAs bc that is uncommon for random seqs
con: some functional seqs that code for proteins are less than 50AAs
2 types of annotation?
structural: locate genes
functional: locate genes and determine their function
how many reading frames does each sequence have
6
how to determine gene function from sequence?
genes with similar sequences usually have similar functions and belong to the same “gene family”
how to new gene families arise? what are the 2 possible consequences?
duplication
homologs from duplication –> paralogs
homologs from speciation –> orthologs
what is exon shuffling? what is it trying to explain? why is it maybe inaccurate?
exons are inserted into dif protein seqs –> give protein that function
possible way to get new genes
would mean that proteins are highly modular but this is probably unlikely bc inserting a different exon would change the protein folding and function, likely in a LOF way