Molecular Genetics (Gary Barker 16-18) Flashcards

Question 1

Q

How long did it take to first sequence the human genome?

Answer

A

13 Years (1990-2003) by capillary Sanger sequencing.

Question 2

Q

How many human genomes can an Illumina sequencer read in one day?

Answer

A

20!

1000, billion bases per run

Question 3

Q

Traditionally, how does shotgun sequencing work?

Answer

A

used to decode a genome by fragmenting it into smaller fragments, which can be ligates into BACs to be individually sequenced.
A random BAC is selected and sequences are then ordered based on overlaps in the genetic code.

Question 4

Q

Explain how next generation (Illumina) sequencing works.

Answer

A

the genome is fragmented and tagged with multicoloured fluorescent probes.
no prior knowledge of the sequence is required.

Question 5

Q

Why is it better to have a higher coverage?

Answer

A

Coverage is the number of reads that include a given nucleotide sequence.
-a higher coverage will have fewer gaps in the genome (10-fold sequencing reduces the chance of 0 coverage)

Question 6

Q

Overlap finding for large genomes (e.g. human) is not practical, what is an alternative?

Answer

A

K-mer based assembly:

sequences are fragmented further
every K-mer will overlap with the one next to it, with a single base difference.
a computer van better handle this as it needs to find fewer overlaps
Use the smallest K-mer that produces the best n50 value.

Question 7

Q

Repeats in eukaryotic genomes can cause problems in shotgun sequencing as they can get ‘lost’. What are two solutions to this problem?

Answer

A

1) use Illumina mate-pair libraries, where only the ends of the repeat need to be sequenced.
2) use an Oxford Nanopore, which can cover gaps as they can read long sequences.
- holds the record of sequence 1 megabase at one time

Question 8

Q

Why are prokaryotic genomes easier to sequence than eukaryotic ones?

Answer

A

Prokaryotes have few repeats; eukaryotes have lots of repeats.
They can be assembled directly from the Illumina paired-end reads; eukaryotes need mate-pair reads or nanopores longer than repeats.
cheaper as eukaryotes require many Illumina reads.

Question 9

Q

What are the advantages and disadvantages of Illumina Hisseq and the Oxford Nanopore?

Answer

A

Illumina is highly accurate (>99.9%) but nanopores have a low accuracy (90%).
Nanopores can sequence long reads and is quicker, whereas Illumina can only sequence short reads.
Nanopore can span gaps/repeats and is portable.
Illumina is better for whole genome sequencing

Question 10

Q

Give 4 reasons as to why we bother sequencing genomes.

Answer

A

1) to characterise all genes and regulatory elements.
2) to identify pathways and co-located genes.
3) to compare genomes (e.g. mutant Vs wild type)
4) to identify candidate markers (SNPs associated with phenotypes)

Question 11

Q

What are the advantages of an Exome Capture Array?

Answer

A

time-saving and cost-effective compared to PCR based methods
can concentrate on just exons in genomes that may not be sequenced yet or too expensive (e.g. just look at grain size in wheat for bread).

Question 12

Q

How does exome-capture work?

Answer

A

probes called capture baits are designed based on a cDNA copy of mRNA.
Oligonucleotide/ bait sequences are tiled out on an array.
can extract target genomic DNA and sonicate it, producing fragments with coding regions, non-coding regions and regions with both.
much of the non-coding DNA is not captured by hybridisation to the baits = left with coding regions to study.

Question 13

Q

Locating ORFs in can be difficult, especially where introns break up the coding region. How can we locate genuine ORFs?

Answer

A

usually the longest one
look at codon usage; there are many ways to code for an amino acid but some codons are prefered to others. Real ORFs will show a codon bias ( e.g.AGC for serine) but non-coding ORFs will have equal use of codons.
Introns usually start with AGGGTAAGT and end with 6 pyrimidines followed by and base and then CAG (YYYYYYNCAG)
observe expression levels via RNAseq or a microarray

Question 14

Q

Compare RNAseq and microarrays for analysing gene expression.

Answer

A

RNAseq covers all expressed genes, but microarrays involve making fluorescent probes for already known genes.
RNAseq works for non-model species but microarrays are only useful for previously characterised species.
RNAseq has no setup cost but costs £1000 per sample; Microarrays are expensive to make but can run multiple samples at £250 each.
RNAseq may be dominated by a few highly expressed genes; in Microarrays, any single gene can fluoresce while not affecting others.

Question 15

Q

If a BLAST search comes back as inconclusive, what four questions can you investigate to determine a gene’s function?

Answer

A

What happens to expression under various stresses?
What happens when you knock the gene out?
What happens when you overexpress the gene?
What genes have similar responses to stresses?

Question 16

Q

Give 2 examples where the 16s rRNA gene has been useful in metagenomic analysis.

Answer

A

(it’s universal in bacteria, so can design flanking primers and amplify a variable region.)

1) Sequencing of two regions of the 16s rRNA gene in the gut of IBS patients concluded that IBS suffering children has more GAmma proteobacteria (Dorea & Haemophilus).
2) 16s rRNA sequencing of 27 body sites in 7 adults, at 4 periods of time, showed that microbes cluster by habitat (rather than individual or time).