Week 22: (B) Putting the Genome Together Flashcards by Hannah Maclean

What are the repetitive sequences?

short, noncoding sequences that are repeated hundreds of times in a tandem
Particularly in the centromere

How well did you know this?

Not at all

Perfectly

What are transposons?

jumping genes
ancestral viral bits

Mobile genetic elements – sequences of a few kb that can move about the genome. Thousands of copies in eukaryotes

How well did you know this?

Not at all

Perfectly

What part of the sequence creates a problem when we try and put our genomes together?

repetitive sequences

short reads make it hard to overcome repeat regions

How well did you know this?

Not at all

Perfectly

What is a contig?

A ‘contiguous’ (continuous) consensus sequence from an

assembly

How well did you know this?

Not at all

Perfectly

What s a Scaffold?

A series of contigs where we have additional information to place them together in the right order and orientation but the sequence between the contigs is not complet

How well did you know this?

Not at all

Perfectly

What is an assembly? (genome assembly)

The set of scaffolds for one genome.

How well did you know this?

Not at all

Perfectly

What is an N50?

The size of the largest contig/scaffold of which 50% of the assembled data is in a contig/scaffold of that size or larger.
medium length contig where the median is measured interns of the total measured genome.
Can be used to describe how complete an assembly is

How well did you know this?

Not at all

Perfectly

What is coverage?

number of reads covering any one position on average

How well did you know this?

Not at all

Perfectly

What is read length?

length of read

How well did you know this?

Not at all

Perfectly

What is overlap?

number of bases overlapping

Number of bases used to join one read to another

How well did you know this?

Not at all

Perfectly

How do you coverage?

how many bp worth pf reads you do divided by the total genome length

How well did you know this?

Not at all

Perfectly

What is a read?

an inferred sequence of base pairs (or base pair probabilities) corresponding to all or part of a single DNA fragment.
one sequence

How well did you know this?

Not at all

Perfectly

How do we overcome repetitive regions? solution 1

need longer reads to span over repeat regions. Illumina was god at this, up to 300 bp, Sanger was up to ~1500

How well did you know this?

Not at all

Perfectly

How long can repeats spand?

10 bases to tens of thousands

How well did you know this?

Not at all

Perfectly

How can we reduce the number of repeats we have to deal with?

sequencing smaller chunks

THE REPEAT MAY ONLY OCUR ONCEIN THE BAC but many times in the genome

How well did you know this?

Not at all

Perfectly

how do we overcome repetitive genes? solution 2

getting the sequence from the end of long fragments
even though we don’t know what’s in the middle

> If we know how long that fragment is we know how far apart those 2 sequencing are

> paired-end reads

How well did you know this?

Not at all

Perfectly

How do we sequence the end of a fragment?

sequence each end with different primer

How well did you know this?

Not at all

Perfectly

What is paired end sequencing or mate paired sequencing?

When we sequence each end of a long fragment

How well did you know this?

Not at all

Perfectly

What areas in our genome?

Study These Flashcards

Protein coding regions

repetitive regions
tRNA(many)
rRNA(many)
Transposons

What are examples of repetitive regions?

Study These Flashcards

Microsatellites, telomeres, intron sequences

How would you describe protein coding regions?

Study These Flashcards

generally not repetitive but there are some exceptions, e.g. fillagrin and high copy number genes

Why does size matter of the fragments?

Study These Flashcards

The gap between paired-end reads (mate pairs)

can range from 20kb to 500bp

How long ae the longest repeats?

Study These Flashcards

~7kb

What graph shown the distance between 2 paired-end reads?

Study These Flashcards

assemblygram
arcs represent the known number of bases between known sequences of bases
used in Illumina method

What does a coverage of 10X mean?

A coverage of 10X means that each base is on average found in 10 reads. The deeper the coverage, the more clearly any sequence or structure changes can be discerned from sequence error

What is ploidy?

The number of copies of the genome in the organism. • Bacteria =1; Human=2; Potato=4; Strawberry=8 The higher the ploidy, the harder it is to accurately assemble.

What happens the deeper the coverage?

the more clearly any sequence or structure changes can be discerned (distinguished) from sequence error more reliable genuine variant not sequenced correctly

What is an example of variation between genomes? | e.g. humans 2 genomes

SNV | single nucleotide variant

How do you know a gene is variant?

take a reference sequence from a bacterium sequence at a high read-depth If there is a consistent SNV at the same position the it is a genuine variant

What are the challenges of short read and re-sequencing?

is it a sequencing error or os the gene really missing hard to tell when genes are small >duplication, would you see the duplication with small reads > inversions and translocations (structural variants) as we mapping gaits a reference, we are not think that in that other genome sequence come form somewhere else

What are the structural variants?

deletion duplication inversion translocation

What is phasing?

being able to assign different alleles to specific chromosomes (haplotypes)

What happens as poidy increases for sequencing genomes?

the harder it is to analyse structural and sequence variants | need more data and longer reads

What is a type of state of the art sequencing techniques?

PacBio

What type of sequencing does PacBio do?

single molecule real time sequencing long reads (10kb+) high error rate (5-15%)

How does nanopore sequencing work?

membrane impermeable to the current but the pore an pass it through so electrons can flow through that particular pore If we take DNA and a particular accessory protein Will feed DNA through that pore

What is the outcome when different sizes bases block the pore?

change the currently that an flow through the pore

What does the hairpin stand do?

go round and sequence one strand then go round and sequence the other to check the stands to see if you have the same sequence

What are the 2 strategies are there for a Nanopore MinION?

single sequencing | hairpin adapter

What is the graph produced by Nanopore MinION?

Flow electropherogram

What are you measuring in Nanopore MinION?

the raw sequence

How can a base be modified?

- By an epigenetic marker | - Methyl cytosine

What is unique about the Oxford Nanopore MinION?

it can detect epigenetic markers rom the raw DNA sequence, no other sequencing machinery can do this

What is the issue of sequencing long DNA sequences?

snapping them in half | sticky and easy to break

How do you get round all the errors?

do many reads to overcome errors around 95-98% error

Week 22: (B) Putting the Genome Together Flashcards

(45 cards)