Genome Analysis Flashcards

1
Q

What are the two basic approaches of next generation sequencing?

A

Random sequencing- Random bits and stitch them together with a computer- better for bacterial genomes.

Targeted sequencing of an overlapping set of large fragments- obligatory for plant genomes as they are so big.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why does random sequencing not work for large genome?

A

These sequence reads are all internal to a sequence that’s multi-copy and dispersed across the genome – you cannot use them to join the two sequence contigs.
Can’t line up all the repeats up as they are all identical.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you analyse larger plant genomes?

A

Sequencing only the expressed genes. 2 basic ways:

  • NGS sequencing of cDNA = “RNA-seq”. No need to clone the cDNA, just do NGS.
  • Exome capture NGS.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does exome capture NGS work?

A

Can harvest bits of interest by synthesising oligonucleotides that are homologous to a bit of them and putting them onto an assay.
Make probes attached to a bead- these will hybridse with DNA- only sequence ones that are stuck to probes.
Need some sequence info in the first place, to design probes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are genetic maps?

What are physical maps?

A

Genetic maps are measured in genetic map distances- mapped by genes or genetic markers (that give characteristics).
Physical- set of clones or inserts with markers on them (can be same as genetic markers).
Can link maps together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can you sequence large plant genomes?

A

Targeted sequencing of an overlapping set of large fragments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is required for sequencing large plant genomes?

A

A high density genetic map with loads of mapped markers.
BAC library(s) – i.e. the entire genome in cloned form.
A “minimum tiling path” set of BAC clones (can be stored in vectors).
Next generation sequencing – LOTS of it!
Bioinformatics- powerful computation resources.
Need backup.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are BAC libraries?

A

Bacterial Artificial Chromosome.
Effectively a plasmid, with hardware to survive as pseudochromosomes in bacteria. Big and stable. Yeast analogs that were used before kept recombining. Don’t get many per cell. These are our bite sized pieces.
Can use these to clone into E.coli.
Insert size typically 100-150kb.
Can be grown like a bacterial plasmid and screened for gene content by hybridization or PCR.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the applications of BAC libraries?

A

Whole genome sequencing.
Fluorescent in situ hybridisation (Can label it with fluorescent labels, and stick DNA to chromos to see where BAC comes from.)
Construction of integrated genetic and physical maps.
Positional cloning.
Fine mapping of interesting genes.
Chromosome walking and contig assembly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is fluorescent in situ hybridisation (FISH)?

A

Visual mapping of DNA clones on the chromosome.
Chromosome Assignment.
Make a DNA probe fluorescent, stick it to chromosome spread and see where DNA sticks.
Can be used to work out relationship between genetic and physical distance of genes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can genetic and cytogenetic maps be compared?

A

Scale can change a lot on a physical map.
Middle bit of physical map disappears from genetic- no recombination in this region.
Genetic maps only work with recombination- won’t get a distance with no recombination.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does BAC fingerprinting work?

A

If clone ends overlap, they will have restriction fragments in common.

Could sequence ends of all of them.
If dealing with a big genome- lots of repeats- sequence an end and it could stick randomly to many places, not just one.
Need to get a fingerprint of bit that overlaps and see what it correlates to.
Minimum tiling path- want minimum overlaps (small ones).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Illumina?

A

Illumina NGS is dominant NGS method.

Short reads (76-150bp).
300M reads per lane.
200Gb+ per run.

Each lane costs about £1500.
Run would be a weekend.
Gb- equivalent to about 7 human genomes.

Start with DNA, chop up, fill in raggedy ends w/ enzyme. Add adaptors- recognition point for sequencing, and hook to stick to machine. Has same adaptor at both ends.Amplify up to a few thousand, then use fluoresence based sequencing to sequence clone.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does Illumina work on a molecular level?

A

Polymerase copies first bit. Melt away blue molecule.
Copy is bolted to machine.
Purple is sticky to blue end- capable of looping round and joining with little blue.
Can copy round from little blue.
Unstick strands- produce clusters of identical sequences all stuck to machine. Amplified up so they can produce signal.
Stitch primer to purple of clones, point down, and then add bases. Light flashes different colours which indicates which base has been added.
Same for all copies since they are clones.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can small gaps in the genome be bridged?

A

Small gaps can be bridged by Paired ended NGS (like illumina, but sequences from both ends of the molecule).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can large gaps in the genome be bridged?

A

Construct mate-pair NGS libraries.

Fragments are end-repaired using biotinylated nucleotides.

The DNA is circularized by DNA ligation.

The DNA circles are fragmented, and biotinylated fragments are purified by affinity capture. Sequencing adapters are ligated to the ends of the captured fragments.
(Fragment into pieces- have inverted arrangement).

The DNA is sequenced at both ends using standard Illumina NGS paired-end NGS.
Stick adapters on end of sequence.

17
Q

How can we bridge huge gaps?

A

Pacific Biosciences NGS.

DNA polymerase is immobilized at the bottom of wells, nucleotides diffuse in. Each of the bases are labeled with a different fluorescent dye. Bases held by the polymerase prior to incorporation emit an extended signal that identifies the base being incorporated.

Tend to be inaccurate, but can seq multiple times and align all seqs to get accurate one.
Slide with lots of little wells (thousands), tether DNA polymerase to bottom of well, add DNA, and DNAp starts copying it.
As DNAp add bases, it has to stop for each base that is added- allows for colour measurement that corresponds to each base.

18
Q

What is nanopore?

A
Nanopore genome sequencer.
Third gen sequencing method.
Sequence a genome in 15 mins.
Next revolution in sequencing.
Uses tethered polymerases. These are tethered to a synthetic pore.

Measures flow of current through pore.
If something is in the way, current changes. H ion flow disrupted by stuff going through.
Each base produces a different current change as DNA is synthesised.

19
Q

What is needed to assemble the contiguous DNA sequence?

A

The amounts of sequence data are huge – so automated assembly of contiguous DNA sequence from overlapping gel readings is essential.

Getting useable DNA sequence:

  • Base identification.
  • Quality Trimming of sequence ends. (Take reads and get rid of junk (adapters) and sorting errors.
  • Vector/linker removal.
Assembly Software:
-Depends upon the sequence type being assembled.
-Sequencer providers provide software.
e.g. Roche-454 = Newbler.
Solexa-Illumina = Velvet.

Different softwares tend not to work with different sequencer outputs

20
Q

How are genomes assembled?

A

Fragment DNA and sequence.
Find overlaps between reads.
Assemble overlaps into contigs (contiguous stretches of DNA).
Assemble contigs into scaffolds (set of backs which are unknown to be stuck together).
Have large pieces of DNA, so need to interpret them.

21
Q

What is tablet?

A

An alignment viewer for NGS data.
Viewer to look as NGS reads- each line is one read (reads vertically).
Shows protein sequence as well as genes, and which codons code for them.

22
Q

What is strudel?

A

Comparative genome analysis.