Molecular genetics 13-18 Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is a gene?

A

A DNA sequence (or RNA in some viruses) that is transcribed into RNA along with all the sequences to control its expression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Features of prokaryotic genes

A
  • No nucleus
  • Usually circular dsDNA
  • Gene’s in operons (several open reading frames encoded from one mRNA)
  • Simple regulation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an example of regulation by inhibition in prokaryotes? How many proteins are involved?

A

Trp operon for tryptophan biosynthesis - linear pathway with 5 different proteins carrying out three different enzymatic reactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What happens in the linear pathway producing tryptophan when there is a lot of tryptophan present?

A

The tryptophan inhibits the production of the second enzyme

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is feedback inhibition example in the tryptophan biosynthesis pathway?

A

Accumulation of tryptophan slows down the rate of catalysis of the first enzyme complex (trp D/E), so reduces the overall rate of production

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is transcriptional regulation?

A

The presence of high tryptophan concentration reduces transcription of the operon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is it important that tryptophan concentrations are not allowed to get too high?

A

Tryptophan is a toxin in high concentrations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When tryptophan is low/absent:

A
  • trpR (trp repressor protein) inactive as no tryptophan
  • trpR can’t bind to the operon promoter
  • transcription not blocked
  • trp operon is expressed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When tryptophan is present:

A
  • trpR is constitutively expressed
  • trpR protein binds to tryptophan (co-repressor) and forms an active repressor
  • Active repressor blocks transcription of trp operon
  • No pathway expression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is an example of an inducible operon in prokaryotes?

A

The lac operon - contains genes that code for enzymes used in the hydrolysis and metabolism of lactose

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Is the lac repressor active or inactive by itself?

A

Active

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What inactivates the lac repressor?

A

A molecule called an inducer (lacI)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is lacY responsible for producing?

A

Lactose permease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is lacZ responsible for producing?

A

B-galactosidase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is lacA responsible for producing?

A

Acetyl transferase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When lactose is absent:

A

The lac repressor is active and switches the lac operon off

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

When lactose is present:

A

The repressor is inactive as it forms a complex with allolactose (inducer), preventing repression and allowing expression of the genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Is the lac repressor usually completely inactive?

A

No, often there is not enough lacI for complete repression, so there is leaky expression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is lacIq?

A

A mutation in the lacI promoter region causing increased transcription and so higher levels of lacI protein, so the lacZ/Y/A promoter is more strongly repressed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is lacI

A

The regulatory gene responsible for producing the protein that represses the lac operon from being transcribed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the other condition needed for the breakdown of lactose, other than lactose being present?

A

Only occurs if glucose is absent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What further regulation is needed so the lac operon is only transcribed if glucose is absent?

A

Carbon catabolism regulation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the link between cAMP levels and glucose levels?

A

Cyclic AMP is present in low levels if glucose concentrations are high.
Cyclic AMP is present in high levels if glucose concentrations are low

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does CRP stand for?

A

Cyclic AMP Receptor Protein

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How does CRP affect the transcription of the lac operon?

A

When cAMP accumulates in low glucose levels it binds to and activated the CRP protein. Active CRP helps bind RNA polymerase to bind to the promoter to cause it to transcribe the protein.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What are the ideal conditions for the lac operon to be transcribed?

A

Lactose present

Glucose absent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are sigma factors?

A

Transcription activators that enable specific binding of RNA polymerase to gene promoters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Why is the lac operon repressed by default?

A

Lactose may only rarely be present, and is a second-choice carbon source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What do sigma factors do?

A

Help RNA polymerase to bind to promoters
Dictate the transcription start?
Activate/amplify transcription
Housekeeping rpod sigma-70 for constitutive genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is a housekeeping gene?

A

A constitutive gene that is transcribed at a relatively constant level, required for the maintenance of basic cellular function and expressed in all cells of an organism under normal conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is a constitutive gene?

A

A gene that is transcribed continually as opposed to a facultative gene, which is only transcribed when needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What are pathway-specific sigma factors?

A

They activate gene families for effective expression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What genes are activated by a specific sigma factor when a bacteria is given a heat shock?

A
rpoH: heat-shock genes
fecI: iron uptake
rpoS: starvation/stationary phase
rpoN: nitrogen starvation
rpoF: flagellar genes for motility
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is Quorum sensing?

A

For coordinating gene expression between individuals - bacteria communicate using chemicals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is AHL?

A

Acyl homoserine lactone signal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

How does Quorum sensing work?

A
  • Constitutively produce AHL
  • When concentration is high, receptor protein activated, switches on transcription of all virulence genes
  • This plays a major role in disease
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What are LuxR-type R proteins?

A

Involved in producing luciferin - easy to measure as visible to scientists

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

When does translation start in bacteria?

A

Before transcription has finished

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is polarity in bacterial expression?

A

Usually more protein of the first ORF made than the later ORFs for operons, due to translation starting before transcription has finished

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What eukaryotic gene properties are not shared with bacteria?

A
  • Chromatin
  • mRNA processing: introns, 5’ cap, 3’ poly A tail
  • Transport of mRNA out of nucleus
  • Uncoupled transcription and translation
  • miRNA/silencing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What factors change the rate of overall expression?

A

1) Rate of transcription
2) Rate of mRNA degradation
3) Rate of translation
4) Rate of protein degradation
5) Chromatin accessible?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What features affect the rate of transcription?

A
  • Each ORF has its own promoter
  • Genes usually not clustered by function/pathway
  • Often in different chromosomes
  • Eukaryotic genomes have an enhancer (upstream, downstream or within the coding region)
  • PolyA tail dictates how far back mRNA gets processed
  • Promoter elements can be immediately or several thousand based downstream of the gene
  • Repressors can block in various places along the DNA strand, blocking transcription
  • Activators are expressed when repressors are not present. The activators bind to the enhancer region, recruit DNA bending, recruit general Tsc transcription factors and recruit mediators
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What is the polyA tail also known as?

A

The ‘terminator region’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Is the polyA tail encoded in the genome?

A

No, it is added enzymatically

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What are general transcription factors?

A

Essential for the transcription of all protein-coding genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What are specific transcription factors?

A

High levels of transcription of particular genes depend on control elements interacting with specific transcription factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What are wide domain areas that specific transcription factors can affect?

A

Carbon, nitrogen, pH

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What are narrow-domain areas that specific transcription factors can affect?

A

Specific metabolic pathways

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What prevents mRNA from degradation?

A

5’ cap and 3’ polyA tail stabilise the DNA reducing degradation
3’ polyA tail helps transport the mRNA out of the nucleus
5’ UnTranslated Region (UTR) and 3’ UTR help define stability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Why is it important to have stable RNA?

A

More stable RNA gets translated more

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

What is miRNA?

A

Micro RNA
Short, non-coding RNA molecules
Bind to specific mRNA (complimentary)
Recruit RNA endonuclease enzymes
Digest specific mRNA (or several related mRNA)
Part of the gene silencing pathway using DICER and RISC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

How is the rate of translation altered through initiation of translation?

A

Different mRNAs have different 5’ UnTranslated Regions (UTRs) before the sequence. These different UTRs have different affinities for ribosome binding and cause either high or low levels of translation to occur

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What is a Kozak sequence?

A

A varying sequence around the start codon which plays a major role in the initiation of translation. Certain bases are more likely to appear in the sequence and lead to a higher rate of translation, such as A or G at the -3 position and C at the -1 position

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

What does translation rate depend on?

A

Ribosome binding
Translation enhancers
Codon usage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

What is codon usage?

A

The use of genetic redundancy to allow the control of translation.
Different codons for the same amino acid are used in different frequencies. For optimal expression use the common codons and avoid the rare ones

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

How are proteins targeting for destruction?

A

They are linked with Ubiquitin (Ubiquitous protein)
Cross-link to targeted protein - many ubiquitous needed
Directs its movement to proteasome (protein complex involved in hydrolysis of a protein)
Triggers it’s digestion
Ubiquitin gets recycled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

What can make DNA inaccessible for transcription?

A

Condensed chromatin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

What is histone acetylation?

A

Acetyl groups are attached to an amino acid in a histone tail. This appears to open up the chromatin structure, thereby promoting the initiation of transcription

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

What is histone methylation?

A

Adding methyl groups to amino acids. This can condense chromatin and reduce transcription

60
Q

How does acetylation of lysine residues affect transcription?

A

Causes chromatin to be looser, better transcription

61
Q

How does methylation of histones affect transcription?

A

DNA more condensed, less transcription

62
Q

What are the group of enzymes that acetylate lysine amino acids?

A

Histone acetyltransferases (HATs)

63
Q

What are the class of enzymes which remove acetyl groups from lysine?

A

Histone deacetylases (HDACs)

64
Q

What are the group of enzymes which methylate histones?

A

Histone methyl transferases

65
Q

What are the group of enzymes which remove methyl groups from histones?

A

Demethylases

66
Q

What can happen when DNA methylation goes wrong?

A

Cancer

67
Q

What is epigenetics?

A

Heritable inactivation if genes

68
Q

What does SAHA stand for and what does it do?

A

suberoylanilide hydroxamic acid (SAHA)

Inhibits HDAC - chromatin stays acetylated longer so maintains expression

69
Q

What does 5AC stand for and what does it do?

A

5 azacytidine

Inhibits histone methyl transferase, leaving the DNA less condensed so maintaining expression

70
Q

How to prevent RNA from being degraded by RNAses as you work

A
  • RNAse-free solutions and disposables
  • Work clean, fast (if exposed to enzymes, no time for degradation) and cold (enzymes not at optimum temperature)
  • RNAse-drew DNAse to remove any remaining DNA
  • Chaotrophic salts - disrupt protein structure so RNAse enzymes not active
  • Wear gloves
71
Q

What does TOTAL RNA include?

A

mRNA, rRNA, tRNA, snRNA, miRNA etc.

72
Q

What is a Southern Blot?

A

Run DNA on gel, shoes size of fragment

73
Q

What is a Northern Blot?

A

Run RNA on gel, blot and probe for gene of interest, because no discrete lines produced on gel, just a smear. Shows size and abundance of fragment (although can only run one known gene at a time)

74
Q

How to separate the mRNA from all the other RNAs?

A

It has a polyA tail (AAAA)
Add beads of oligo(dT) (TTTTT sequence) to the RNA mixture, mRNA will stick to the beads whereas others will not. The mRNA can be eluted into different salt concentration solution as this will change its affinity

75
Q

What is the name of the process which converts mRNA to cDNA?

A

Reverse transcription

76
Q

Where do you find reverse transcriptase activity?

A

In retroviruses

77
Q

How to convert mRNA to cDNA (copyDNA)

A

Prime the mRNA with oligo(dT) which will bind with polyA tail and the entire population of mRNA will undergo reverse transcription

78
Q

How is the cDNA cloned after it is produced?

A

Using adapters

79
Q

What is an example of how the cDNA is modified after it has been cloned?

A

Using site-directed mutagenesis

80
Q

How can you amplify RNA?

A

You can’t use PCR to directly amplify DNA
First the mRNA population must undergo reverse transcription into cDNA, then PCR can amplify the DNA of interest using gene-specific primers

81
Q

What is qPCR?

A

Quantitative PCR

Measures the amount of product per cycle

82
Q

What dye is usually used for qPCR?

A

SybrGreen - fluoresces when bound to dsDNA

Fluorescence measured after each cycle of amplification

83
Q

What is the qPCR expressed as?

A

2^(DeltaCT)

84
Q

More effective version of Northern Blot - testing all genes at once - RNA quantification by hybridisation of an array

A

Put bits of every gene on a support and probe with labelled RNA
Chips are hybridised to the labelled transcripts
Signal from each spot is measured
Shows transcript abundance for every gene on the array

85
Q

What is the sequencing-based method that has superceded RNA quantification by hybridisation

A

Prepare cDNA from chosen condition
Sequence lots of individual molecules
Assess what is expressed and in what abundance

86
Q

cDNA cloning - the old fashioned method

A
Clone cDNA into plasmids, sequence it clone by clone
Called ESTs (Expressed Sequence Tags) which represent portions of expressed genes
87
Q

What type of sequencing is used to sequence RNA directly?

A

Next-generation sequencing

88
Q

Features of GFP

A
  • From jellyfish Aequorea victoria
  • Simple barrel shaped protein
  • Excited by 385 or 480nm
  • Emits at 509nm
  • Needs no other substrate except oxygen
89
Q

What is a reporter gene?

A

A gene that researchers attach to a regulatory sequence of another gene, to determine its rate of expression

90
Q

Examples of commonly used reporter genes

A

B galactosidase (lacZ)
Glucuronidase (gusA)
Luciferase (luc)
Green fluorescent protein (GFP)

91
Q

What is promoter bashing?

A

Analysis of possible control elements by deletion

92
Q

How to determine where a protein will go?

A

Sequence tags within peptides target proteins to specific organelles
“Leader sequence” directs protein - N terminus first 15-30 amino acids direct protein to secretion machinery
To get protein to the ER there is a KDEL/HDEL sequence at the C terminus
To get protein to the nucleus there are 5 positive basic residues

93
Q

What is a Western Blot?

A

Using a specific antibody to quantify a specific PROTEIN

94
Q

What is SDS?

A

Sodium dodecyl sulphate
A strong detergent that denatures proteins so they are linear
They can then undergo electrophoresis

95
Q

What is Poly Acrylamide Gel Electrophoresis?

A

Separates proteins by molecular weight

Different mobility if modified by glycosylation, phosphorylation and acetylation

96
Q

What are the limitations of Poly Acrylamide Gel Electrophoresis?

A

Only suitable for soluble proteins
Cysteine-cysteine bonds may require reducing
Limited by antibody availability

97
Q

What is another way of separating proteins by size that is not Western Blotting? (1)

A

SDS-PAGE gel
Separate proteins by pH gradient with electric charge across it. Protein will move until it is at a pH where it has no charge - depends on size
Uses isoelectric focussing

98
Q

What is another way of separating proteins by size that is not Western Blotting? (2)

A

Protein Mass Spectrometry
Separate proteins by size or hydrophobicity through gel electrophoresis
Feed into mass spectrometer
Find accurate mass of each protein (to 5 dp)
Determine its sequence identity from its mass

99
Q

Why would you add 6 histidine to the start or end of a protein?

A

To use them as a hook to purify specifically this protein
The 6 histidine tag binds to Zn+
This acts as an EPITOPE

100
Q

What is an epitope?

A

The pet of an antigen molecule to which an antibody attaches itself to

101
Q

What are the benefits of sequencing a genome?

A

-Co-located genes may form pathways
-Compare genomes and see different mutations
Identify candidate genes close to a genetic marker associated with a trait

102
Q

When was the first genome sequenced and by who? How many bases did it have?

A

In 1977 by Sanger and his colleagues.

It consisted of 5375 nucleotides

103
Q

What organism was the first to have its genome sequenced?

A

Phage phi X 174 (bacteriophage)

104
Q

How many genes could be sequenced per year by one person in the early 1990s?

A

10 genes

One person could only sequence 1500 bases per day

105
Q

When was Sanger sequencing developed and how many bases could then be sequenced per day?

A

Late 1990s

240,000 bases per day

106
Q

Features of the Human Genome Sequencing Project

A

1990-2003
3.5 billion bases
Cost more than $3 billion
Factory-scale sequencing

107
Q

What type of sequencing was developed in 2006 and how many bases can it sequence per run?

A

Next generation sequencing

Can sequence 1000 billion bases per run

108
Q

What is the other name for next generation sequencing?

A

Illumina sequencing

109
Q

Outline the traditional genome sequencing approach (What is its other name?)

A

c2001
Called hierarchical shotgun sequencing
1. Genome DNA cut into large fragments, producing a BAC library (each 300 kb fragment like small extra genome)
2. Using radioactive hybridisation of these clones they are organised into large clone contigs. You can work out which fragments overlap with each other
3. Select the BAC to be sequenced
4. Break up selected BAC into smaller pieces - a ‘shotgun clone’
5. Reassemble sub-fragments back into order by working out the sequence of each shotgun clone and which other sequences it overlaps with

110
Q

BAC meaning?

A

Bacterial Artificial Chromosome

111
Q

What is a contig?

A

A set of overlapping DNA segments that together represent a consensus region of DNA

112
Q

What is the downside of hierarchical shotgun sequencing (traditional sequencing approach)

A

Highly labour intensive

113
Q

Outline next generation genome sequencing

A
  1. Fragment genome - sonicate into random overlapping fragments
  2. Sequence fragments and assemble
  3. There may be gaps with low coverage, but 99.9% high coverage
114
Q

What is K-mer based assembly?

A

All the sequences created in next generation sequencing must then have their ends sequenced to see if there is any overlap between sequences. Illumina sequences the 100 bases on either end of the sequence to find overlaps. It would take to long to compare all 100 bases from the end of one sequence to the 100 from all the other sequences present, so computer breaks up these sequences into smaller fragments. The computer puts each fragment in a particular memory address and finds overlaps between short fragments going all the way along the 100 bases. E.g. k=25 looks for overlaps of k-1=24.

115
Q

The problem of repeats in shotgun sequencing in eukaryotes

A

Applied particularly to next generation sequencing as there are no large BAC clones to help order sequences. The repeats between genes can be so long that the computer does not know what gene comes first after the repeat

116
Q

Solution to the problem of repeats in next generation sequencing (1)

A

Illumina mate-pair libraries
Several kilobases long, can be used to span repeats. Only the ends are sequenced.
Difficult to make mate-pairs

117
Q

Solution to the problem of repeats in next generation sequencing (2)

A

Oxford Nanopore

Sequences are typically several kb in length, and the entire sequence can span a repeat

118
Q

What size can eukaryotic genomes range to?

A

16 Gb

119
Q

How do we find the open reading frame within a DNA sequence?

A

Feed the sequence into a computer, which will translate the sequence into 3 possible forward and reverse frames. The gene will be found between a Methionine start amino acid and a STOP codon.

120
Q

What type of DNA does ORF finding work well for and why?

A

Prokaryotic DNA, as they have no introns or repeats

121
Q

What type of DNA does ORF finding work less well for and why?

A

Eukaryotic DNA, because genes are interspersed by non-transcribed gaps, repeats and introns. Introns break up coding sequences

122
Q

How can codon usage help identify the real open reading frames?

A

Some codons are more commonly used to encode a specific amino acid in a gene than others. Reading frames with the more commonly used codons for a particular amino acid are more likely to be found within a coding region, whereas non-coding DNA will use all codons equally

123
Q

What other feature of eukaryotic DNA allows identification of genes?

A

Eukaryotic genes have conserved splice sites.

Eukaryotic introns tend to start with AGGTAAGT and end with YYYYYYNCAG (Y = pyrimidine C/T, N = any base)

124
Q

How to confirm ORF expression after finding the ORFs

A

Use RNAseq

  1. Extract nucleus acids from sample
  2. Use oligo dT to extract mRNA
  3. Use Illumina to sequence mRNA
  4. Gene expression profiling: use computer to map RNA reads back onto the genome
  5. Align RNA to a reference and count expression levels
125
Q

What does BLAST stand for?

A

Basic Local Alignment Search Tool

126
Q

What is BLAST?

A

A database containing every known gene that has ever been sequenced

127
Q

How can you use BLAST to find out more about the gene you have identified?

A

The sequence you have found can be compared to the database and the gene name/function can be worked out through similarities to other known genes

128
Q

What is BLASTN?

A

A specific type of BLAST tool for comparing DNA sequences with other DNA sequences
Query: nucleotide
Database: nucleotide

129
Q

What is BLASTX?

A

A specific type of BLAST tool for comparing a translated nucleotide to known proteins
Query: translated nucleotide
Database: protein

130
Q

How to build BLAST hits:

A
  1. Start with one word match (a word is 11 nucleotides by default or 3 amino acids) - a ‘seed’
  2. If possible, extend the alignment either side of the word match. If there are no matches, find another seed and start again
  3. If there are enough hits to pass the threshold value, return an alignment to the user
131
Q

When interpreting BLAST results, what is an E-value?

A

The number of matches as good/better than the results expected by chance - smaller sequences likely to have a larger E-value

132
Q

What must you beware when analysing BLAST results?

A

E-value cut off at 10
E-values greater than 0.00001 are not considered reliable
Sequence similarity doesn’t prove functional homology

133
Q

Why is BLASTX so useful?

A
  1. Finding a protein match helps to confirm that the DNA you have sequenced is expressed
  2. Matches to proteins can show up possible introns in genomic sequence
  3. Protein sequences are more likely to have useful annotation than DNA sequences
134
Q

Why is it beneficial to search within a certain subset of organisms when using BLAST to compare your gene?

A

Speeds up the search as there are less sequences to search through
Increases sensitivity - reduces likelihood that same pattern has been found due you chance

135
Q

What is genomics?

A

The study of genomes

136
Q

What is metagenomics?

A

The study of multiple genomes in complex (usually environmental) samples

137
Q

What is the problem with growing microbes in labs?

A

<1% will grow in culture or they grow really slowly over years
Bacteria are also so small that it can be near impossible to determine the species under a microscope

138
Q

In metagenomics, how does one determine what species are present?

A

Sequence a marker gene

139
Q

What gene is usually sequenced to determine what bacteria and archaea are present and how are the variable regions amplified?

A

Partial 16s ribosomal RNA gene (16s = 16 Svedberg)
V1, V2 and V3 are not conserved so are variable between species
There are conserved regions between the variable regions so you can synthesise primers complementary to the conserved regions

140
Q

What is a 16S rRNA pipeline?

A
  1. Extract DNA from sample e.g. blood, faeces, soil, slime, dust etc.
  2. Target region of 16S gene which can categorise which bacterial gene is present using forward and reverse primer
  3. PCR carried out which amplifies variable regions of all genes present
  4. Each sample contains multiple genomes in a complex mixture - amplicon pool
141
Q

How is Illumina sequencing made cost-effective when sequencing an amplicon pool?

A

Barcoding and multiplexing

  1. Add a unique barcode to sample 1 amplicon primers
  2. Amplify 16s rRNA from each sample and include a sample specific barcode in the forward primer
  3. Can sequence up to 96 barcoded samples can be ran at once
  4. Output: 400 million sequences (4 million sequences per sample) which gives a good snapshot of microbial diversity in that sample
142
Q

How to process 16s rRNA data

A
  • Start with millions of sequences
  • Cluster sequenced together to make OTUs (Operational Taxonomic Unit)
  • Assign OTUs to: domain/class/order/genus/species using a BLAST search
  • Different taxonomic levels come back, as not all sequences can be assigned to specific species
143
Q

How to discover what the species in the sample are doing?

A

Sequence genomic DNA using shotgun sequencing

144
Q

Outline the process of shotgun sequencing to sequence genomic DNA from sample

A
  1. Extract DNA from sample. Each sample contains multiple genomes in a complex mixture
  2. Fragment DNA into 500bp fragments and sequence with Illumina
  3. The fragmented pieces are assembled into genes (contigs) and the number of sequences in each contig is noted
  4. Identify the contigs: BLAST search them to a database of known proteins
145
Q

Why can only 4 DNA samples be multiplexed at once as opposed to 96 RNA samples?

A

DNA genomes are much larger, and more data is needed to cover multiple whole genomes

146
Q

What is 16s rRNA sequencing useful for?

A

Taxonomic composition of samples
Overall diversity
Differences between samples due to factors of interest