Lecture 2 Flashcards

1
Q

How can regions of low GC content been acquired

A

By horizontal transfer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What did analysis of K-12 genome in regards to HGT lead to

A

Concluded that 755 of the 4288 genes were likely derived from HGT. These were acquired in at least 234 separate events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is E.coli O157:H7 strain of e.coli

A

Is an emergent human pathogen which was first identified in 1982. It’s an enterohaemorrhagic E.coli which produces shiga toxin and is associated with maemorrhagic colitis and haemolytic uraemic syndrome (can lead to kidney failure)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Whats the size of E.coli OH157:H7 strain

A

The genome is 5.5Mb - 1Mb bigger than K-12
It’s colinear
It was the second genome to be sequenced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What was the extra DNA in the o157 strainb

A

It was clustered into genomic islands.

There were also some K-islands with regions unique to E.coli K-12.

The O and K islands were located at the same position in the genome. The genome has a patchwork structure with a shared co-linear backbone interrupted by strain-specific islands

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are genomic islands

A

An extension of the previously used term “pathogenicity islands”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What’s the CFT073 strain of E.coli

A

It’s a strain of uropathogenic E.coli (UPEC) and was the third E.coli genome to be sequenced in 2002. It’s an example of extraintestinal E.coli (ExPEC) and is associated with UTIs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is ExPEC and UPEC

A

Can be harmless when in intestines but become pathogens when they invade the urinary tract, blood or CSF.

UPEC strains are responsible for 70-90% of the 7 million cases of acute cystitis and 250,000 cases of pyelonephritis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Whats the CFT073 genome like

A

Is 5.2Mb so similar size to O157:H7 genome, the extra sequences relative to K-12 are not the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What did the 3 way analysis of the 3 e.coli strains find

A

Of the total non- redundant set of proteins encoded by any of the 3 genomes, only 2996 are encoded by all 3 genomes.

The total gene set in all 3 strains is 7638, only 2996 are found in all 3 so less than 40% is conserved

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does core genome mean

A

Genes conserved across all strains of a species

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does dispensible/ accessory genome mean

A

Genes from a genome which are not conserved in at least one other member of the species

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does pan genome mean

A

The total set of (non-redundant) genes present in any strain of the species

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How big was the S. agalactiae core genome

A

Estimated at 1800 genes representing 80% of each individual genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How big is the E.coli core genome and how do we estimate

A

2200 genes

Estimate the size of the core genome by randomising the order of the genomes and looking at how the size of the core genome reduces as additional genomes are added. This is done lots and the median size of the core genome is calculated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How big is the E.coli pangenome and how do estimate

A

Infinite

Estimate in a similar way as the core genome. However, the trend line does not plataeu, it approaches a straight line sloped upward because E.coli have open pangenomes

17
Q

What is an open pangenome

A

There are effectively infinite in size

18
Q

What species have a closed pangenome

A

Yersinia pestis

19
Q

How do you estimate the open or closed nature of a pangenome

A

Estimate how many new genes are discovered with each genome sequenced for E.coli. This plateaus to a non zero value of around 300 genes - you can continue to sequence even large numbers of E.coli genomes and will keep on identifying new genes indefinitely

20
Q

What does illuminia sequencing involve

A

Similar to sanger sequencing but rather than sequencing a single molecule at a time, illumunia can sequence millions of molecules simultaneously (massively parallel sequencing)

Uses florescently labelled nucleotides. We need to amplify each individual fragment of the genome by PCR-like reaction known as bridge amplification

21
Q

How do we determine the sequence of each cluster in illumunia

A

Synthesising a complementary strand using fluorescently labelled nucleotides

22
Q

What are the outcomes of illuminia sequencing

A

Can generate short sequence reads - can be assembled into contigs but still require finishing.

23
Q

What does illumunia not involve

A

A cloning step, instead most assembly gaps are due to repetitive regions where the assembly is ambiguous

24
Q

Disadvantage of illuminia and how is this overcome

A

Finishing is much more expensive than generating a draft so most genomes are left at the draft stage.

Contigs can be placed into order by comparison with a closely related complete genome

25
Q

Whats the average protein coding content for a bacterial genome

A

88% for the 2671 finished genomes in genbank.

26
Q

Whats the largest genome and the smallest genome

A

Sorangium cellusosum at 14782125 and the smallest is Candidatus nasuia deltocephalinicola strain coding 137 proteins and is 112091 bp in length

27
Q

What do genomes of bacteria from complex environmental habitats tend to have

A

A larger size and have a greater GC content than host associated bacteria. Most of these bacteria are mesophiles but there are growing numbers of extremophiles such as thermotolerant

28
Q

Disadvantage of sanger sequencing

A

Finishing draft genomes was more labour intensive and required a separate production line to be efficient

29
Q

What did increase in high throughput “next generation sequencing allow”

A

Cost of producing raw sequence data declined to the point that it currently cost less than $1 to generate a draft bacterial genome, made sequencing bacterial genomes cost effective and obligatory for any research team

30
Q

What did NGS produce

A

Shorter reads than sanger sequencing. So the cost ratio between a draft and a complete sequence was changed - wasn’t as cost effective

31
Q

What does single molecule sequencing produce

A

Examples: pacbio and MinION produce longer reads than NGS. Generate more sequence for less money but may eventually eliminate the concept of draft microbial genomes

32
Q

What could the size range of E.coli be due to

A

Due to the large number of available sequenced strains. Less frequently sequenced species can vary by more than a megabase such as haemophilus influenzae HK1212 and F3047

33
Q

What do all bacterial genomes have at least one copy of

A

23S, 16S and 5S rRNA genes - these exist as an operon with a conserved structure of 23S gene followed by one or more transfer RNAs

34
Q

What do transposable elements range in size from

A

1 -52kb and work with several families of insertion sequences and integrative and conjugative elements

35
Q

What is the CRISPR-Cas system

A

A general stress response, provides a type of immunity and those that are pathogenic to the host. 40% of bacteria have a CRISPR-Cas system that allows them to fend off viral attacks

36
Q

What does HGT play a role in (regarding defence islands)

A

Plays a role in maintenance and evolution of these defence islands (on average 5.7 genes)

37
Q

What’s the first step in metagenomics

A

Is the collection and processing of environmental samples such as water, soil etc.