Core lecture Flashcards

1
Q

Which format usually stores sequencing data?

A

FASTA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does “>” mean in the FASTA files?

A

Indicates the start of a new sequence entry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the ASCII character and Phred score do?

A

It calculates the quality of the data, meaning it helps determining the reliability of each nucleotide sequenced in a run.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does Q20 mean?

A

Q 20 means qualituy value of 20, P error can be calcaulted: 1/10^Q/10 = 1/10^20/10 = 1/100 = 0.01

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the typical purpose of base calling in NGS analysis?

A

To identify the nucleotide sequence from the raw data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are paired end reads?

A

Sequences read from both ends of a DNA fragment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the significance of multiplexing?

A

It allows multiple samples to be sequenced together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is main advantage of storing quality scores as ASCII characters?

A

It decreases the file size by only using 1 byte per score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the implication of having a “mate-pair”

A

The pair of reads are far apart and face away from eachother

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the “Seed and Extend algorithm”

A

The seed and extend algorithm are used to align DNA sequenced reads to a reference genome.

The algorithm starts by finding a small piece of the sequencing read( seed) that matches the reference genome. This is usually done quickly using a hash table.

The extend; once a seed match is found, the algorithm tries to extend this match in both directions, to align the rest of the sequence to the genome. It continues to match the extend, until too many mismatches are found.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does NGS stand for?

A

Next-generation sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which method is commonly used for sequencing DNA in NGS?

A

PCR amplification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a primary application of NGS technology?

A

Genetic disorder diagnosis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does NGS differ from traditional Sanger sequencing?

A

NGS allows parallel sequencing of multiple fragments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a significant advantage of NGS over previous sequencing technologies?

A

Higher throughput

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In NGS, what is the purpose of using barcodes in sequencing libraries?

A

To track the sample origin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What type of biological molecules can be sequenced using NGS?

A

DNA and RNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How has NGS impacted the field of personalized medicine?

A

It has enabled tailored treatments based on genetic makeup

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is one of the challenges in handling NGS data?

A

The high volume of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does the term “read length” refer to in NGS?

A

Length of the DNA fragments sequenced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How has NGS technology influenced cancer research?

A

By identifying genetic mutations associated with cancers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the role of bioinformatics in NGS

A

To analyze and interpret the vast amount of sequencing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does de novo sequencing mean in the context of NGS?

A

Sequencing without a reference genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How does NGS contribute to the study of rare genetic disorders?

A

By enabling the identification of genetic mutations responsible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

In NGS, what does the term “coverage” refer to?

A

The number of times a nucleotide is sequenced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the significance of multiplexing in NGS?

A

Allows sequencing of multiple samples simultaneously

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Which is an important consideration in NGS data analysis?

A

The accuracy and integrity of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How does NGS facilitate the study of microbial communities?

A

Through metagenomics, sequencing DNA from environmental samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What role does NGS play in agricultural genetics?

A

Assisting in the development of genetically modified crops

30
Q

What is one of the future directions or potentials of NGS technology?

A

Enhanced understanding and application in clinical settings

31
Q

What is the main technique used in 1st generation sequencing?

A

Sanger sequencing

32
Q

Which generation of sequencing is known for introducing massively parallel sequencing?

A

2nd generation

33
Q

What is a key characteristic of 3rd generation sequencing technologies?

A

Single-molecule sequencing

34
Q

Which technology is typically associated with 2nd generation sequencing?

A

Illumina HiSeq

35
Q

What was a major limitation of 1st generation sequencing technologies?

A

Low throughput

36
Q

Which is a benefit of 3rd generation sequencing?

A

Real-time sequencing

37
Q

2nd generation sequencing is also known as:

A

Next-Generation Sequencing

38
Q

Which generation of sequencing first introduced the concept of ‘reads’?

A

1st generation, but it was evolved in 2nd generation

39
Q

In which generation does sequencing occur without the need to stop and start the process?

A

3rd generation (3rd generation technologies like PacBio and Oxford Nanopore offer real-time sequencing without the need to stop and start)

40
Q

Which sequencing technology typically generates the longest read lengths?

A

Oxford nanopore

41
Q

What is the first step in assessing NGS data quality?

A

Generating FastQC reports

42
Q

Why is trimming performed on NGS data?

A

To remove low-quality bases from the ends of reads

43
Q

When is k-mer correction typically performed in the NGS data preprocessing pipeline?

A

Before de novo assembly

44
Q

What does a sliding window in quality control processing do?

A

Identifies and trims low-quality regions in reads

45
Q

Why is it important to remove adapters in NGS data preprocessing?

A

They interfere with alignment and de novo assembly

46
Q

In the context of NGS, why might sequences be overrepresented?

A

several reasons, library preperation bias, sequencing bias, biological factors, contamination

47
Q

Which file format is often used for storing compressed NGS data?

A

gzip

48
Q

What is an outcome of merging paired-end reads in NGS data preprocessing?

A

Error correction for overlapping regions

49
Q

What does it mean if the FastQC report indicates poor quality at the start of reads?

A

The first few bases may need to be trimmed

50
Q

What is a common method to handle large amounts of NGS data?

A

Keep data compressed whenever possible and use workflow managers like Snakemake

51
Q

What is a Single Nucleotide Variation (SNV)?

A

A change in a single nucleotide

52
Q

What does a Polymorphism (SNP) imply?

A

Presence in more than 1% of the population

53
Q

What is the significance of transition/transversion ratio in human genetics?

A

Indicates mutation types

54
Q

What are the consequences of coding mutations?

A

Can change amino acid sequences

55
Q

What differentiates germline mutations from somatic mutations?

A

Germline mutations can be passed to offspring

56
Q

What is the role of untranslated regions (UTRs) in gene expression?

A

Regulating gene expression

57
Q

How do pathogenic mutations affect organisms?

A

Can lead to diseases or disorders

57
Q

What is a consequence of non-coding mutations?

A

Can lead to changes in gene expression

58
Q

What is a polygenic risk score used for?

A

Predicting the risk of complex diseases

58
Q

What is the focus of personalized medicine?

A

Tailoring medical treatment to individual genetic profiles

59
Q

What is the primary goal of variant calling in NGS data analysis?

A

Identifying differences from a reference genome

60
Q

Which format is commonly used to store variant calling data?

A

VCF

61
Q

What does ‘hard filtering’ in variant calling refer to?

A

Applying strict criteria to identify true variants

62
Q

What is a ‘genotype’ in the context of variant calling?

A

The genetic makeup at a particular locus

63
Q

How does ‘base quality score recalibration’ enhance variant calling?

A

By recalibrating the probability of base call errors

64
Q

In VCF files, what does the ‘INFO’ field provide?

A

Metadata about the variants

65
Q

What role do known polymorphic sites play in variant filtration?

A

Help distinguish between true and false variants

66
Q

What is indicated by a high Phred quality score in variant calling?

A

High confidence in the variant call

67
Q

Why is ‘depth of coverage’ important in variant calling?

A

Indicates how many times a base is sequenced

68
Q

What is the significance of ‘allele frequency’ in variant analysis?

A

It indicates the rarity of the allele in the population