Genes and Genomes Flashcards

1
Q

Definition

a method of DNA sequencing first commercialized by Applied Biosystems, based on the selective incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication

A

Sanger sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Definition

the material of which the chromosomes of organisms other than bacteria (i.e. eukaryotes) are composed, consisting of protein, RNA, and DNA

A

Chromatin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define

Methyl-cytosine

A

the normal cytosine nucleotide in DNA that has been modified by the addition of a methyl group to its 5th carbon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Definition

non-autonomous, non-coding transposable elements (TEs) that are about 100 to 700 base pairs in length. They are a class of retrotransposons, DNA elements that amplify themselves throughout eukaryotic genomes, often through RNA intermediates

A

SINES

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is considered the fifth base in DNA?

A

Methyl-cytosine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Definition

a unit made up of linked genes which is thought to regulate other genes responsible for protein synthesis

A

Operon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Mobile genetic elements are not usually found in gene exons/introns. Examples are retrotransposons which move via a DNA/RNA intermediate

A

Mobile genetic elements are not usually found in gene exons. Examples are retrotransposons which move via a RNA​ intermediate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Where are CpG islands usually found?

A

Mainly at the 5’ end of genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How many bases does the human genome contain?

A

3162 million bases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Whole genome shotgun (WGS)

A

entails sequencing many overlapping DNA fragments in parallel and then using a computer to assemble the small fragments into larger contigs and, eventually, chromosomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Definition

a functional RNA molecule that is transcribed from DNA but not translated into proteins

A

non-coding RNA/ncRNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the Whole Genome Shotgun Method?

A

Genomic DNA is shred randomly before being read. Repeated many time to ensure at least 30x read depth coverage. The reads are then reassembled into the genome sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Definition

an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA and/or RNA sequences

A

BLAST search

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What makes up junk DNA?

A

Pseudogenes

Mobile genetic elements (i.e. LINES, SINES, incomlplete retroviral-like elements and Transposon remnants)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Definition

Describing a type of messenger RNA that can encode more than one polypeptide separately within the same RNA molecule

A

Multicistronic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is used to sort out the contigs given in de novo assembly?

A

PacBio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a hypothetical protein?

A

A predicted protein that is not similar to any characterised protein

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What BLAST program is used for a protein query search in the protein database?

A

BLASTp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the major characteristics of SINES?

A

They do not encode reverse transcriptase, endonuclease or integrase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Definition

a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes

A

FASTA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Define

non-coding RNA/ncRNA

A

a functional RNA molecule that is transcribed from DNA but not translated into proteins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Define

Draft genome sequence

A

Sequence of genomic DNA having lower accuracy than finished sequence; some segments are missing or in the wrong order or orientation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Definition

Elements that are transcribed into RNA, reverse-transcribed into DNA and then inserted into a new site in the genome

A

Retroviral-like elements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Definition

a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores

A

FASTQ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Define

Genome annotation

A

the process of identifying the locations of genes and all of the coding regions in a genome and determining what those genes do

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Define

Retroviral-like elements

A

Elements that are transcribed into RNA, reverse-transcribed into DNA and then inserted into a new site in the genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Define

Paralogue

A

Either of a pair of genes that derives from the same ancestral gene

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Define

SINES

A

non-autonomous, non-coding transposable elements (TEs) that are about 100 to 700 base pairs in length. They are a class of retrotransposons, DNA elements that amplify themselves throughout eukaryotic genomes, often through RNA intermediates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

The ENCODE project is an editing/annotation approahc that has built a map of functional elements within the human genome, suggesting that over 50%/70% is biologically active

A

The ENCODE project is an annotation approahc that has built a map of functional elements within the human genome, suggesting that over 70% is biologically active

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is the genome data problem?

A

The ever increasing analysis gap that is occurring because our ability to analyse is not keeping up with the data available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Definition

a project that seeks to interpret the sequence of DNA that makes up the human genome

A

ENCODE project

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What were the strategies used by HGP and Celera to sequence the human genome?

A

HGP used an ordered or directed strategy

Celera used a shotgun strategy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Define

Pseudogenes

A

a section of a chromosome that is an imperfect copy of a functional gene

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What were the key findings of the ENCODE project?

A

Around 80% of the human genome is assocaited with at least one biochemical event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

___________ arise by gene duplication followed by gene inactivation - contain introns

____________ are formed by integration of DNA copies of mRNA - do not contain introns

A

Classical pseudogenes arise by gene duplication followed by gene inactivation - contain introns

Processed pseudogenes are formed by integration of DNA copies of mRNA - do not contain introns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Definition

DNA that does not code for a protein, usually occurs in repetitive sequences of nucleotides, and does not seem to serve any useful purpose

A

Junk DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

True or False:

The HGP sequence tells us nothing about the genetic variation between individuals

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What BLAST program is used for a nucleotide quesry searchin the protein database?

A

BLASTx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Definition

Either of a pair of genes that derives from the same ancestral gene

A

Paralogue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Why does the sequence CpG occur at a lower than expected frequency in vertebrates?

A

During DNA damage, deamination of unmethylated C gives rise to U, which is recognised as a fault by DNA repair machinery. Deamination of methylated C gives rise to T, which is not recognised as an error by DNA repair machinery. Over evolutionary time, methylated Cs have been mutated to T, so CpG is under-represented in vertebrate DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Define

CpG island

A

stretches of DNA 500–1500 bp long with a CG: GC ratio of more than 0.6, and they are normally found at promoters and contain the 5′ end of the transcript

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

How do SINES move?

A

Using enzymes produced by other mobile elements e.g. LINES

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Definition

a set of overlapping DNA segments that together represent a consensus region of DNA

A

Contig

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q
A

Zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What is an unbroken consensus sequence called?

A

Contig

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

True or False:

The sequence data found in the HGP is inaccessible by regular people

A

False

It is publically available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Definition

entails sequencing many overlapping DNA fragments in parallel and then using a computer to assemble the small fragments into larger contigs and, eventually, chromosomes

A

Whole genome shotgun (WGS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What are the two types of Illumina sequencing? Which is faster?

A

HiSeq (3 days; 2000 GigaBases)

MiSeq (56 hrs; 20 GigaBases)

49
Q

Define

De novo

A

starting from the beginning

50
Q

True or False:

Only transposon remnants are evident in the human genome

A

True

51
Q

Definition

a section of a chromosome that is an imperfect copy of a functional gene

A

Pseudogenes

52
Q

What are the similarities between a draft and a closed genome sequence?

A
  • Both have all the genes
  • Both predict the encoded proteins
    • Predict function by similarity to characterised proteins
    • Overview of the organism’s genetic capability
53
Q

In reality, how many contigs do we get per chromosome? Why?

A

You expect only 1, but in reality you get many, but the whole genome sequence will be there. This is because there will be several copies of the same sequence on the genome

54
Q

What symbols indicate Bad and Excellent Phred quality scores?

A

Bad - !’#$%”

Excellent = EFGHIJK

55
Q

Define

BLAST search

A

an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA and/or RNA sequences

56
Q

Define

LINES

A

a group of non-LTR (long terminal repeat) retrotransposons which are widespread in the genome of many eukaryotes

57
Q

Definition

a transposon whose sequence shows homology with that of a retrovirus

A

Retrotransposons

58
Q

What are the major characteristics of non-retroviral retrotransposons (LINES)?

A

They have a promotor and encode a protein with combined endonuclease and reverse transcription activity

59
Q

Definition

a group of non-LTR (long terminal repeat) retrotransposons which are widespread in the genome of many eukaryotes

A

LINES

60
Q

What form is each entry in the GenBank database in?

A

A text file containing DNA sequence data and any associated information (annotation)

61
Q

Define

Chromatin

A

the material of which the chromosomes of organisms other than bacteria (i.e. eukaryotes) are composed, consisting of protein, RNA, and DNA

62
Q

Definition

the process of identifying the locations of genes and all of the coding regions in a genome and determining what those genes do

A

Genome annotation

63
Q

Which nucleotides can be methylated?

A

Cytosine but only when next to a guanine

64
Q

What were the aims of the Human Genome Project?

A

To determine the entire nucleotide sequence of human DNA

To identify all the genes within the human genome

65
Q

Why is Illumina and PacBio often used together?

A

Illumina provides good quality reads whereas PacBio provides good read length

66
Q

On completion of the human genome project it was evident that over 50%/70%/90% of the genome does no encode protein/microRNA/tRNA, consistent with the idea of waste/garbage/junk DNA

A

On completion of the human genome project it was evident that over 90% of the genome does no encode protein, consistent with the idea of junk​ DNA

67
Q

What is the read depth coverage equation?

A

Depth = N x L / G

N = number of reads

L = length of each read

G = estimated genome size

68
Q

Retrotransposons move from one point to another in the genome via what?

A

RNA intermediates

69
Q

Define

Processed pseudogenes

A

a type of pseudogene that is are copied from messenger RNA and incorporated into the chromosome

70
Q

Define

Junk DNA

A

DNA that does not code for a protein, usually occurs in repetitive sequences of nucleotides, and does not seem to serve any useful purpose

71
Q

Define

Mobile genetic elements

A

DNA sequences that can move around the genome, changing their number of copies or simply changing their location, often affecting the activity of nearby genes

72
Q

Definition

a fluorescent chemical compound that can re-emit light upon light excitation

A

Flurophores

73
Q

Define

FASTA

A

a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes

74
Q

Define

Reference genome

A

a digital nucleic acid sequence database, assembled by scientists as a representative example of a species’ set of genes

75
Q

Define

Retrotransposons

A

a transposon whose sequence shows homology with that of a retrovirus

76
Q

Define

Flurophores

A

a fluorescent chemical compound that can re-emit light upon light excitation

77
Q

What are the two classes of retrotransposons?

A

Retroviral-like

Non-retroviral-like

78
Q

Define

Multicistronic

A

Describing a type of messenger RNA that can encode more than one polypeptide separately within the same RNA molecule

79
Q

Definition

the normal cytosine nucleotide in DNA that has been modified by the addition of a methyl group to its 5th carbon

A

Methyl-cytosine

80
Q

Define

Contig

A

a set of overlapping DNA segments that together represent a consensus region of DNA

81
Q

Definition

one of two or more homologous gene sequences found in different species

A

Orthologue

82
Q

How many genes are in the human genome?

A

Between 20000 and 25000

83
Q

Define

Orthologue

A

one of two or more homologous gene sequences found in different species

84
Q

Definition

Sequence of genomic DNA having lower accuracy than finished sequence; some segments are missing or in the wrong order or orientation

A

Draft genome sequence

85
Q

What percentage of the genome encodes proteins?

A

2%

86
Q

What does DNA methylation do?

A

Helps turn genes off by altering chromatin structure

87
Q

Definition

a digital nucleic acid sequence database, assembled by scientists as a representative example of a species’ set of genes

A

Reference genome

88
Q

Define

Amplicon

A

a piece of DNA or RNA that is the source and/or product of amplification or replication events

89
Q

Definition

stretches of DNA 500–1500 bp long with a CG: GC ratio of more than 0.6, and they are normally found at promoters and contain the 5′ end of the transcript

A

CpG island

90
Q

Definition

a piece of DNA or RNA that is the source and/or product of amplification or replication events

A

Amplicon

91
Q

Definition

a chromosomal segment that can undergo transposition, especially a segment of bacterial DNA that can be translocated as a whole between chromosomal, phage, and plasmid DNA in the absence of a complementary sequence in the host DNA

A

Transposon

92
Q

What is the name of the modified nucleoties used in Sanger Sequencing?

A

Dideoxy nucleotides

93
Q

Definition

starting from the beginning

A

De novo

94
Q

Define

ENCODE project

A

a project that seeks to interpret the sequence of DNA that makes up the human genome

95
Q

True or False:

Retorviral-like retrotransposons do not encode coat proteins

A

True

96
Q

A query protein is 26% identical to a guide protein. What can we say about these two proteins?

A

They might have similar functions

97
Q

Define

Transposon

A

a chromosomal segment that can undergo transposition, especially a segment of bacterial DNA that can be translocated as a whole between chromosomal, phage, and plasmid DNA in the absence of a complementary sequence in the host DNA

98
Q

Define

Read coverage depth

A

the number of unique reads that include a given nucleotide in the reconstructed sequence

99
Q

Definition

a type of pseudogene that is are copied from messenger RNA and incorporated into the chromosome

A

Processed pseudogenes

100
Q

Definition

the number of unique reads that include a given nucleotide in the reconstructed sequence

A

Read coverage depth

101
Q

Define

Sanger sequencing

A

a method of DNA sequencing first commercialized by Applied Biosystems, based on the selective incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication

102
Q

Definition

DNA sequences that can move around the genome, changing their number of copies or simply changing their location, often affecting the activity of nearby genes

A

Mobile genetic elements

103
Q

What are the components of Illumina sequencing?

A

‘Blocked’ nucleotides

Oligonucleotide primer

ssDNA template

DNA polymerase

104
Q

What proportion of nucleotides are identical in all people?

A

99%

105
Q

What can you say about proteins that are over 35% identical to a guide protein?

A

They probably have a related function

106
Q

The human genome compises 3 million/billion/trillion base paires encoding approximately 10,000/**20,000/50,000 genes. The number, position and order of introns/exons/genes is identical between individuals/proteins/tRNA

A

The human genome compises 3 billion base paires encoding approximately 20,000 genes. The number, position and order of genes is identical between individuals

107
Q
A
108
Q

Define

Operon

A

a unit made up of linked genes which is thought to regulate other genes responsible for protein synthesis

109
Q

Define

FASTQ

A

a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores

110
Q

Long interspersed nuclear element 1 (LINE 1) mobile genetic elements…

Select one:

a. are derived from viruses
b. encode enzymes essential for their replication
c. are found only in A / T rich regions of the genome
d. emerged from the genomes of ancient parasitic bacteria

A

Long interspersed nuclear element 1 (LINE 1) mobile genetic elements…

Select one:

a. are derived from viruses

b. encode enzymes essential for their replication

c. are found only in A / T rich regions of the genome
d. emerged from the genomes of ancient parasitic bacteria

111
Q

Gene annotation is the process of…

Select one:

a. manually sequencing “difficult” regions of the human genome
b. depositing new nucleotide sequence data in a public database
c. adding information on biological function to a nucleotide sequence file
d. deleting redundant data files

A

Gene annotation is the process of…

Select one:

a. manually sequencing “difficult” regions of the human genome
b. depositing new nucleotide sequence data in a public database

c. adding information on biological function to a nucleotide sequence file

d. deleting redundant data files

112
Q

When completed in 2003, the Human Genome Project lacked information on the…

Select one:

a. order of genes in the human genome
b. approximate number of genes in the human genome
c. approximate number of alleles in the human genome
d. percentage of protein-encoding genes in the human genome

A

When completed in 2003, the Human Genome Project lacked information on the…

Select one:

a. order of genes in the human genome
b. approximate number of genes in the human genome

c. approximate number of alleles in the human genome

d. percentage of protein-encoding genes in the human genome

113
Q

What is the current thinking about junk DNA?

Select one:

a. It serves no useful purpose
b. It consists of non-functional ancestral genes
c. It makes up less than 10% of the human genome
d. It is largely made up of mobile genetic elements

A

What is the current thinking about junk DNA?

Select one:

a. It serves no useful purpose
b. It consists of non-functional ancestral genes
c. It makes up less than 10% of the human genome

d. It is largely made up of mobile genetic elements

114
Q

The figure below represents a visual overview of a part of the DNA sequence of a bacterial genome (approximate base range 2400 to 8000). The overview is produced using the Artemis software and shows reading frame (RF) one through to six. The short black vertical lines indicate stop codons.

The sequence for each stop codon in the sequence displayed can be…

Select one:

a. GGG only
b. any of ATG CTG GTG
c. any of TAG, TAA, TGA
d. any of UAG, UAA, UGA

A

The sequence for each stop codon in the sequence displayed can be…

Select one:

a. GGG only
b. any of ATG CTG GTG

c. any of TAG, TAA, TGA

d. any of UAG, UAA, UGA

115
Q

BLAST search…

Select one:

a. predicts protein function from the predicted 3D structure of the query protein sequence
b. is widely used to map millions of short DNA sequences onto a reference genome
c. finds sequence similar to the query sequence in the subject database
d. is a basic global alignment search tool

A

BLAST search…

Select one:

a. predicts protein function from the predicted 3D structure of the query protein sequence
b. is widely used to map millions of short DNA sequences onto a reference genome

c. finds sequence similar to the query sequence in the subject database

d. is a basic global alignment search tool

116
Q

The Whole Genome Shotgun (WGS) method for genome sequencing…

Select one:

a. uses long read sequencing technology to produce a single read that spans the whole bacterial chromosome
b. is likely to work best when the total number of sequenced bases is the same as the predicted number of based in the bacterial chromosome
c. is rarely used for bacterial genome sequencing
d. is an approach based on the sequencing of randomly selected fragments of the genomic DNA, that collectively cover the whole genome

A

The Whole Genome Shotgun (WGS) method for genome sequencing…

Select one:

a. uses long read sequencing technology to produce a single read that spans the whole bacterial chromosome
b. is likely to work best when the total number of sequenced bases is the same as the predicted number of based in the bacterial chromosome
c. is rarely used for bacterial genome sequencing

d. is an approach based on the sequencing of randomly selected fragments of the genomic DNA, that collectively cover the whole genome

117
Q

For the final three questions, consider the following information:

The genome sequence of the Reference strain was determined using a combination of long-read and short-read sequencing technologies (Assembled genome sequence: one circular chromosome and no plasmids). The genome of the Mutant strain was sequenced using a short-read sequencing (Illumina, paired-end, 150 base reads). Table 1 shows all sequence differences between the Reference and Mutant strains.

Table 1. Sequence differences between strains

The phenotypic difference is that the Reference strain has a flagellum and the Mutant strain does not.

The initiation codon for pwpS is located:

Select one:

a. within 100 bases of position 4,684,444
b. between 100 and 300 bases from position 4,684,444
c. between 301 and 999 bases from position 4,684,444
d. more than 1,000 bases from position 4,684,444

A

The initiation codon for pwpS is located:

Select one:

a. within 100 bases of position 4,684,444

b. between 100 and 300 bases from position 4,684,444
c. between 301 and 999 bases from position 4,684,444
d. more than 1,000 bases from position 4,684,444

118
Q

The genome sequence of the Reference strain was determined using a combination of long-read and short-read sequencing technologies (Assembled genome sequence: one circular chromosome and no plasmids). The genome of the Mutant strain was sequenced using a short-read sequencing (Illumina, paired-end, 150 base reads). Table 1 shows all sequence differences between the Reference and Mutant strains.

Table 1. Sequence differences between strains.

The phenotypic difference is that the Reference strain has a flagellum and the Mutant strain does not.

The phenotypic difference is likely to be caused by:

Select one:

a. any one of the differences observed in protein coding regions
b. all three differences observed in protein coding regions
c. the intergenic difference
d. the intergenic difference, the difference in the pwpS gene, or both

A

The phenotypic difference is likely to be caused by:

Select one:

a. any one of the differences observed in protein coding regions
b. all three differences observed in protein coding regions
c. the intergenic difference

d. the intergenic difference, the difference in the pwpS gene, or both