Next Generation Sequencing (NGS) - Euskirchen Flashcards

1
Q

list the steps in Sanger sequencing

A
  1. reaction mixture- contains template DNA, primer, labelled terminators (ddNTP)
  2. Primer elongation and chain termination -
    a. incorporation of ddNTPs (fluorescent) to the DNA sequence (corresponding ddNTP always added to the last base)
    b. elongation of the sequence by filling the last position with labelled ddNTP step by step
  3. Capillary gel electrophoresis - the fluorescent stand is then pulled through a glass capillary gel –> gel electrophoresis and a string of beads passes the capillary and separate the DNA fragments
  4. Laser detection of fluorescence A computational analysis - optic laser detector reads out fluorescence–> results in peaks in different colours which correspond to DNA seq
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

in Sanger sequencing it is not possible to fill up the entire DNA sequence in one go by different terminators, so one has to incorporate 1 labelled ddNTP at a time
T/F

A

FALSE!!
In Sanger sequencing, ddNTPs are incorporated to the DNA sequence at the last position, and this sequence can then be elongated step by step and a new ddNTP will be added at the last position.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are the steps of illumina sequencing?

A

A. library preparation-
1. fragmentation - preparation of DNA by fragmenting the chromosome into small fragments (75-100 bp)

  1. adapter ligation - to make DNA fragments look the same by adding an adapter sequence so the first few bases look the same
  2. PCR amplification - on a glass slide with complementary molecules to the DNA seq. so when the library is poured over the slide the adapters bind to the immobilised primers

B. cluster generation -
4. bridge amplification - adaptors bound to primers –> polymerase makes a double strand from the single stranded DNA.

C. sequencing by synthesis-
5. priming - reading the complement strand and form clusters of DNA. the reading can only stretch to the neighbouring primer so the clusters resulting from it are according to the primer that is bound to the adapter.

  1. synthesis with fluorescence - incorporation of fluorescence labelled nucleotides to determine first base.
    fluorescently labelled nucleotide is always the last base on complement strand –> identification of the seq
  2. imaging - from above and see the clusters
  3. cleavage of terminators - makes the process reversible–> terminators are cleaved
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

illumina sequencing…

a. is reversible and thus quite revolutionary
b. is called short red sequencing
c. is time effective
d. allows for generation of large databases in a single experiment
e. all of the above

A

e. all of the above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

which enzyme is used for bridge amplification in illumina sequencing?

A

polymerase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the principle of nanopore sequencing?

A
  • protein forms a pore which is inserted into a membrane
  • pore is only big enough for a single DNA seq.
  • pore is equipped with a motor protein which translocated the DNA to the pore
  • voltage applied to the membrane –> ion flow through the pore

–> depending on the size of the molecule in the pore (strand size) the amount of current that can flow varies

  • measure the current across the pore –> modulated by the identity of the base at the sensing region
  • ion current changes over time as the DNA is translocated and every time there is a different base at the sensing region
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the coverage track? how is it calculated?

A

% of a given genome covered by sequenced data. It is the % of reads that cover a known reference sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how could deletion be detected in WGS?

A
  1. genome viewer- a software that shows the raw data and its alignment to a ref genome, on many coordinates.
    for any given coordinates which indicate the individual reads in the column–> if there is a ‘gap’ of reads in certain coordinates –> coverage is low in this area–> indicates a deletion
  2. structural combinations- try to fit a random sequence within a range–> if it doesn’t fit, instead of having the sequence in the expected position it would be somewhere else possibly on the other side –> split read (half of the sequence fits in position a and the other in position b
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

a dip is NOT…

a. a deletion of a wide range of the genome
b. a focal deletion
c. a point mutation
d. a potential cause of a disease

A

a. a deletion of a wide range of the genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

in poly A RNA seq…

a. we use mRNA sequences
b. we use cDNA reversed from mRNA
c. we use tRNA
d. DNA regions that consist of a start codon

A

b. we use cDNA reversed from mRNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

why do we use cDNA in RNA seq and not the normal DNA?

A

because in RNA sequencing, the main goal is to check gene expression. Most of the natural DNA is junk DNA which is not coding for genes, and therefore one cannot study gene expression from the normal DNA.
the mRNA from which the cDNA is made contains only exonic (coding) information, and therefore can be used for this purpose.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

how can one extract gene expression levels using RNA sequencing?

A

mRNA contains only coding regions of the DNA.
Therefore, the coverage track fron RNA seq, results in ‘drops’, and the covered regions only correspond to these coding areas. Instead of a full coverage as in WGS, the reads correlate with gene expression (and abundance of expression) of each individual gene.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Multi-dimensional data is a problem of RNA sequencing. How can one overcome this problem?

a. dimension reduction methods such as t-sne/ PCA
b. two way ANOVA test
c. calculation of each variable separately
d. all of the above

A

a. dimension reduction methods such as t-sne/ PCA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is t-sne more informative than PCA in RNA-seq analysis?

A

Both PCA and t-sne are methods for reducing the dimensionality of data.
PCA is a linear Dimensionality reduction technique, and as such, it tries to preserve the GLOBAL structure of the data, and maps the data as a whole.

T-sne on the other hand, is a non-linear Dimensionality reduction technique, so it disrupts the GLOBAL structure of the data and rather preserves the LOCAL structure of it, so data points with mutual features get clustered together regardless of their location on the sequence.

Therefore, when looking for differences between cellular sequences, t-sne is more useful and efficient, as local, mutual differences do not go lost, and the differences in features of different cell types can be mapped and visualised more intuitively

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

If you are interested in structure of chromosomes, which technique would you use?

a. RNA seq
b. WGS
c. ATAC seq
d. variant profiling

A

c. ATAC seq

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is a library?

A

taking a DNA from a sample and select the reads to make them compatible for sequencing

17
Q

what is the difference between coverage and depth?

A

coverage is the % of a given genome covered by sequenced data. It is the % of reads that cover a known reference sequence.

Depth is the ratio between the total number of reads from a sequence and the size of the genome –> counts how many reads there are across the genome

18
Q

what is a contig?

A

a set of overlapping DNA segments that together represent a consensus region of the DNA –> number of structures/fragments that cover the genome

when all contigs are summed up, we get an assembly

19
Q

what is trimming?

A

cutting the ends of the reads to cut the adapter sequence

20
Q

what is barcoding?

A

adding a sequence that is coding a known gene from a library –> as a barcode/identifier for the sequence we are analysing

21
Q

what is (De-)Multiplexing?

A

pulling all sequences together and use the barcode info to know which sequence came from which sample

22
Q

what is alignment?

A

Alignment finding the exact differences between 2 sequences (sample and ref).
It is the process of searching for the location of a given sequence on the genome –> map the reads to the reference genome.

23
Q

what is a consensus?

A

“polishing raw alignment” –> the calculated order of the most frequent residues found at each position in a sequence-alignment.
It represents the result of multiple sequence-alignments in which related sequences are compared and similar motifs are calculated