M3 L19: Genomics and sequencing Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

what is a genome

A

full haploid seq of DNA in a species

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what can genomes tell us

A

inform understanding of gene function, inform understanding of evolution, inform understanding of microbial ecology for unculturable microbes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are the 2 original methods for seq a genome

A

clone by clone

shotgun sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what’s the clone by clone approach? pros and cons?

A

break a genome into large frags via partial restriction digest –> insert in a large vector (BAC) and clone –> break large frags into small frags –> subclone small frags in plasmids and sequence –> assemble chromosome

pro: reliable

cons: slow, cost ineffective

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what’s the whole genome shotgun approach? pros and cons?

A

break genominc DNA into small fragments –> seq everything at high coverage –> assemble overlapping sequences

pros: cheap

con: less accurate / assembly is harder bc genomes are redundant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are the modern genome sequencing techniques

A

1) illumina
2) pac-bio
3) oxfofrd nanopore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

when to use illumina? pros and cons?

A

pros: cheap per bp, low error rate 1%

cons: short reads 150-250 bp (can’t sequence genome de novo)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

when to use pac-bio sequencing? pros and cons?

A

can seq genome de novo

pros: longer reads 15 kb+

cons: more expensive and higher error rate 10% but errors are random –> take the consensus of all seqs to figure out the correct base

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

when to use oxford nanopore? pros and cons?

A

can seq genome de novo

pros: very long reads 30 kb+

cons: also very expensive and high error rate 10%; errors are systemic, not random so can’t correct by sequencing more reads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

best method for sequencing genome de novo?

A

mix of short and long read techniques

long –> assembly

short –> accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is GWAS do you have to assemble the whole genome?

A

genome wide association study

don’t have to assemble genomes, just sequence all genomes and compare diseased and non diseased to reference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is an example of the evolve and resequence technique? what’s a chemostat?

A

evolve two species of yeast in sulfate limited conditions –> sequence their genomes before and after –> mutations that reoccur frequently are likely adaptive

a chemostat is an apparatus that continuously adds new and removes old growing media

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what other species were sequenced as part of the human genome project

A

drosophila melanogaster

mus musculus

c. elegans

saccharomyces cerevisiae

E. coli

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is metagenomics? what’s it used for?

A

sequencing all genomes in an environmental sample to determine what species are present (especially for unculturable bacteria

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is reverse ecology

A

sequence environmental sample and identify most optimized genes –> can infer those are the ones that are most important for survival

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is the great plate count anomaly

A

when you culture an environmental sample, there are way less colonies than actual microbes in the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is the baas becking hypothesis

A

NS is the strongest force in determining microbial ecoloty

everything could live anywhere but the environment selects

18
Q

support for and against the baas becking hypothesis for fish in a lake

A

for: if different fish in the same lake that eat different things have different microbiomes (dif microbiome bc dif food = dif enviro)

against: if fish in the same lake that eat different things have the same microbiome (would mean microbes can’t physically disperse)

19
Q

what is gene annotation? how do you do it? con to this approach?

A

determining if a sequence is a gene

look for reading frames that code for more than 50AAs bc that is uncommon for random seqs

con: some functional seqs that code for proteins are less than 50AAs

20
Q

2 types of annotation?

A

structural: locate genes

functional: locate genes and determine their function

21
Q

how many reading frames does each sequence have

A

6

22
Q

how to determine gene function from sequence?

A

genes with similar sequences usually have similar functions and belong to the same “gene family”

23
Q

how to new gene families arise? what are the 2 possible consequences?

A

duplication

homologs from duplication –> paralogs

homologs from speciation –> orthologs

24
Q

what is exon shuffling? what is it trying to explain? why is it maybe inaccurate?

A

exons are inserted into dif protein seqs –> give protein that function

possible way to get new genes

would mean that proteins are highly modular but this is probably unlikely bc inserting a different exon would change the protein folding and function, likely in a LOF way

25
Q

what are pan and core genomes

A

pangenome: any gene in any member of that species

core genome: set of genes in all members of that species

26
Q

what is the c-value paradox? is it really a paradox?

A

complexity does not correlate with genome size

not actually a paradox bc genome size does not indicate number of genes, it indicates number of transposable elements

onions have 5x genome size of humans

27
Q

3 points to remember

A

1) genome evo can be rapid bc of transposons but phenotypic evo can still be slow

2) fwd genetic screens only identify 1/2 of the genes in model organisms

3) gene numbers are lower than originally thought

28
Q

2 drawbacks to relying on phys chars for phylogeny

A

1) can’t observe microbes

2) may lead to inaccurate conclusions bc of convergent evolution (for same fxn not due to comm ancestor)

29
Q

why can we infer phylogeny fron sequencing

A

seq divergence is linear with time

30
Q

2 ways to find functional noncoding sequences (regulatory or RNA gene regions)

A

1) phylogenetic footprinting: sequence distantly related species and look for highly conserved ones

2) phylogenetic shadowing: sequence closely related species and look for ones that are conserved in all

31
Q

why articulated how gene duplication could drive evo innovation (especially big transitions like invertebrates to vertebrates)

A

susumo ohno

32
Q

why does duplication allow for evolutionary evolution

A

NS doesn’t tolerate mutations in functional genes but duplication means one copy can be mutated

33
Q

3 main consequences of gene duplications

A

1) pseudogenization: one copy gets mut that inactivates it

2) neofunctionalization: one copy gets a mutation that gives it an additional function

3) subfunctionalisation: one gene has 2 functions and each paralog specializes for one function; loses other

34
Q

what is functional genomics? what categories are contained

A

perform experiments on a genome wide scale

transcriptomics (sequence all mRNAs in cell)

proteomics (seq all proteins in cell)

phenptypic screen on knockout/deletion collections

35
Q

examples of transcriptomics

A

1) hybridize flourescent cDNA to microarrays of known DNA seqs –> lots of fluorescence = lots of mRNAs

2) directly sequence cDNA –> most abundant cDNA = most abundant mRNA

36
Q

how can deletion collections help to infer gene function?

A

examine growth in many different conditions –> 80% won’t have any change in phenotype in rich media but 97% will have some defect in specific conditions

37
Q

Design an evolve and resequence study for antibiotic resistance genes in a species of pathogenic bacteria. What would you look for in the sequencing data to determine if a gene was causal?

A

Put the bacteria in a chemostat with growing media and the antibiotic. Sequence the genome before and after starting the experiment. Compare the genomes and look for changes. If the same change was present in a lot of bacteria that survived the antibiotic, then it is almost definitely causal.

38
Q

What are the two ways we can perform transcriptomics? What does transcriptomics tell us?

A

One method is microarrays: obtain oligonucleotides for every gene in a cell. Reverse transcribe fluorescent cDNA from mRNAs in the cell → cDNA libraries for before and after changing the conditions. Hybridize the cDNAs to the complementary DNA/oligonucleotides and observe where there is a lot of fluorescence → indicates which mRNAs are present in the most copies

Another method is RNA/cDNA sequencing: reverse transcribe mRNA to make cDNA. See which sequences have the most reads

Transcriptomics tells us the relative abundance of each mRNA in a cell/if a cell responds to changing conditions by up-regulating transcription.

39
Q

Suppose I perform a transcriptomics and a proteomics study on a certain genotype of yeast cells growing in rich media. I note that on average protein levels correlate with mRNA levels, but the correlation is not perfect. What do I infer from the outliers? Does this have implications for other transcriptomics studies that do not perform proteomics? Explain.

A

The outliers might have lower translation/protein levels than expected given the amount of transcription/mRNAs present in the cell. This might be because of mRNA interference. For example, miRNAs or siRNAs can bind to RISC, which then binds to the mRNA after transcription and either destroys the mRNA or prevents translation.

This has implications for transcriptomics studies that don’t include proteomics because they might lead to inaccurate inferences. You may assume a gene codes for an important protein because there is a lot of the mRNA present, when in reality, the mRNA doesn’t get translated in high quantities.

40
Q

Explain how the yeast deletion collection can be used to identify groups of genes involved in similar pathways.

A

Yeast deletion collection: collection of thousands of yeast strands that each have a mutation in one gene. Expose them to some experimental condition and observe which strains have a change in abundance. The ones that change have mutations in genes that are in similar pathways and are important for survival in the experimental condition.