Week 22: (B) Putting the Genome Together Flashcards

1
Q

What are the repetitive sequences?

A

short, noncoding sequences that are repeated hundreds of times in a tandem
Particularly in the centromere

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are transposons?

A

jumping genes
ancestral viral bits

Mobile genetic elements – sequences of a few kb that can move about the genome. Thousands of copies in eukaryotes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What part of the sequence creates a problem when we try and put our genomes together?

A

repetitive sequences

short reads make it hard to overcome repeat regions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a contig?

A

A ‘contiguous’ (continuous) consensus sequence from an

assembly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What s a Scaffold?

A

A series of contigs where we have additional information to place them together in the right order and orientation but the sequence between the contigs is not complet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an assembly? (genome assembly)

A

The set of scaffolds for one genome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is an N50?

A

The size of the largest contig/scaffold of which 50% of the assembled data is in a contig/scaffold of that size or larger.
medium length contig where the median is measured interns of the total measured genome.
Can be used to describe how complete an assembly is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is coverage?

A

number of reads covering any one position on average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is read length?

A

length of read

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is overlap?

A

number of bases overlapping

Number of bases used to join one read to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you coverage?

A

how many bp worth pf reads you do divided by the total genome length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a read?

A

an inferred sequence of base pairs (or base pair probabilities) corresponding to all or part of a single DNA fragment.
one sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do we overcome repetitive regions? solution 1

A

need longer reads to span over repeat regions. Illumina was god at this, up to 300 bp, Sanger was up to ~1500

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How long can repeats spand?

A

10 bases to tens of thousands

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can we reduce the number of repeats we have to deal with?

A

sequencing smaller chunks

THE REPEAT MAY ONLY OCUR ONCEIN THE BAC but many times in the genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how do we overcome repetitive genes? solution 2

A

getting the sequence from the end of long fragments
even though we don’t know what’s in the middle

> If we know how long that fragment is we know how far apart those 2 sequencing are

> paired-end reads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do we sequence the end of a fragment?

A

sequence each end with different primer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is paired end sequencing or mate paired sequencing?

A

When we sequence each end of a long fragment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What areas in our genome?

A

Protein coding regions

repetitive regions
tRNA(many)
rRNA(many)
Transposons

20
Q

What are examples of repetitive regions?

A

Microsatellites, telomeres, intron sequences

21
Q

How would you describe protein coding regions?

A

generally not repetitive but there are some exceptions, e.g. fillagrin and high copy number genes

22
Q

Why does size matter of the fragments?

A

The gap between paired-end reads (mate pairs)

can range from 20kb to 500bp

23
Q

How long ae the longest repeats?

A

~7kb

24
Q

What graph shown the distance between 2 paired-end reads?

A

assemblygram
arcs represent the known number of bases between known sequences of bases
used in Illumina method

25
Q

What does a coverage of 10X mean?

A

A coverage of 10X means that each base is on average found in 10 reads. The deeper the coverage, the more clearly any sequence or structure changes can be discerned from sequence error

26
Q

What is ploidy?

A

The number of copies of the genome in the organism.

• Bacteria =1; Human=2; Potato=4; Strawberry=8
The higher the ploidy, the harder it is to accurately assemble.

27
Q

What happens the deeper the coverage?

A

the more clearly any sequence or structure changes can be discerned (distinguished) from sequence error
more reliable

genuine variant

not sequenced correctly

28
Q

What is an example of variation between genomes?

e.g. humans 2 genomes

A

SNV

single nucleotide variant

29
Q

How do you know a gene is variant?

A

take a reference sequence from a bacterium

sequence at a high read-depth
If there is a consistent SNV at the same position the it is a genuine variant

30
Q

What are the challenges of short read and re-sequencing?

A

is it a sequencing error or os the gene really missing
hard to tell when genes are small

> duplication, would you see the duplication with small reads

> inversions and translocations (structural variants)

as we mapping gaits a reference, we are not think that in that other genome sequence come form somewhere else

31
Q

What are the structural variants?

A

deletion
duplication
inversion
translocation

32
Q

What is phasing?

A

being able to assign different alleles to specific chromosomes (haplotypes)

33
Q

What happens as poidy increases for sequencing genomes?

A

the harder it is to analyse structural and sequence variants

need more data and longer reads

34
Q

What is a type of state of the art sequencing techniques?

A

PacBio

35
Q

What type of sequencing does PacBio do?

A

single molecule real time sequencing
long reads (10kb+)
high error rate (5-15%)

36
Q

How does nanopore sequencing work?

A

membrane impermeable to the current but the pore an pass it through

so electrons can flow through that particular pore

If we take DNA and a particular accessory protein

Will feed DNA through that pore

37
Q

What is the outcome when different sizes bases block the pore?

A

change the currently that an flow through the pore

38
Q

What does the hairpin stand do?

A

go round and sequence one strand then go round and sequence the other to check the stands to see if you have the same sequence

39
Q

What are the 2 strategies are there for a Nanopore MinION?

A

single sequencing

hairpin adapter

40
Q

What is the graph produced by Nanopore MinION?

A

Flow electropherogram

41
Q

What are you measuring in Nanopore MinION?

A

the raw sequence

42
Q

How can a base be modified?

A
  • By an epigenetic marker

- Methyl cytosine

43
Q

What is unique about the Oxford Nanopore MinION?

A

it can detect epigenetic markers rom the raw DNA sequence, no other sequencing machinery can do this

44
Q

What is the issue of sequencing long DNA sequences?

A

snapping them in half

sticky and easy to break

45
Q

How do you get round all the errors?

A

do many reads to overcome errors

around 95-98% error