VIROMICS Flashcards

1
Q

define virome

A

a virome is the total viral genome content of the DNA/RNA found in a biological sampling area like an entire ocean or the human body

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

define viromics

A

interaction of a viral genome with its environment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

difference between the genomes of prokaryotes and eukaryotes

A

prokaryotes have 16sRNA and eukaryotes have 18s DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the most successful organism in the biosphere

A

the phage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what are viruses responsible for in seawater

A
  1. high rate of mortality
  2. changes in geochemical cycles
  3. contribution to carbon due to killing of 80% of single celled organisms
  4. large contribution to global carbon cycle.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how many viruses exist in the ocean

A

4x10^30

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

2 methods used to identify marine viruses

A
  1. by counting of viral particles using:
    - flow cytometry
    -transmission electron microscopy
    -epifluorescence microscopy
  2. by metagenomic analysis:
    - metagenomics of fractioned water
    -total DNA sequencing of seawater
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

eukaryotic viruses in seawater

A
  • found that many stressors can lead to the production of herpes like viruses in coral called porites compressa.
    -eukaryotic viruses threaten more than a third of coral species
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are the stressors that can cause herpes like viruses in coral

A

eutrophication, decreasing pH and thermal stress

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

method used to identify eukaryotic viruses in seawater

A
  1. purify virus sized particles
  2. extract the DNA
  3. do PCR for 16sRNA and 18sDNA to make sure there is no cellular DNA present
  4. amplify the 18sDNA
  5. send the DNA for pyrosequencing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what did case study 1 study

A

viruses in the indian flying fox bat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what was the rationale for case study 1

A
  1. most emerging infectious diseases that infect humans are viral and of zoonotic origin
  2. rate of emerging infectious diseases has increased which is just indicative of the large unknown virodiversity in wildlife waiting to be discovered.
  3. measuring virodiversity is difficult
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

why is measuring virodiversity difficult

A
  1. high number of host species
  2. their large geographical distribution
  3. their often remote habitats
  4. sampling, collecting, sequencing is expensive
    - so no surprise that complete virodiversity has not been discovered for even a single species
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

the method for case study 1

A
  1. they took urine, faecal and saliva swabs from indian flying fox bats
  2. immediately stored the samples in lysis buffers at -80 degrees
  3. then extracted the nucleic acid and did cDNA synthesis
  4. did cDNA PCR assays targeting 9 specific viral families
  5. cloned the PCR products in a PCR cloning vector
  6. then they sequenced the 12 white colonies produced per PCR product using standard M13 primers
  7. trace sequences were then analysed and edited and a phylogenetic tree was generated
  8. sequence identity could then be done using mega5
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

the method for case study 1

A
  1. they took urine, faecal and saliva swabs from indian flying fox bats
  2. immediately stored the samples in lysis buffers at -80 degrees
  3. then extracted the nucleic acid and did cDNA synthesis
  4. did cDNA PCR assays targeting 9 specific viral families
  5. cloned the PCR products in a PCR cloning vector
  6. then they sequenced the 12 white colonies produced per PCR product using standard M13 primers
  7. trace sequences were then analysed and edited and a phylogenetic tree was generated
  8. sequence identity could then be done using mega5
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Results from case study 1

A

they found 55 viruses from the 9 viral families in the samples
- interesting to note that 4 of the viruses they found were coronaviruses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

conclusions from case study 1

A
  • they had previously estimated that the viral carrying capacity of the indian flying fox was 58 but here only found 55. so other 3 viruses must be extremely rare ones
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what was the purpose of case study 2

A

to identify a novel SARS-COV2 causing pneumonia in humans

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

rationale for case study 2

A
  • coronaviruses infect both mammals and birds
  • of the 6 coronaviruses that infect humans; all are zoonotic and mainly from bats
    -in 2003 SARS COV2 caused an outbreak and since 2012 MERS-COV2 has killed 1000s
    -so this raised public concern of the potential emergence of a novel zoonotic COV strain.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

the method for case study 2: part A

A
  1. took 200ul of BAL fluid from 5 patients
    2.then extractd the nucleic acid in BSL 3 labs using RT-PCR
  2. DNA was fragmented using transpososome
  3. adapters added to the fragments so fragments can bind to flow cell for illumina sequencing
  4. after illumina is performed we need to remove all host sequences and all ribosomal reads
  5. now clean reads undergo taxonomic assignment against archaea, bacteria, viruses, fungi etc.. using kraken 2
  6. identified we have a viruse causing the pneumonia and compared the reads to viral databases to determine which virus.
  7. confirmed using sanger and PCR
  8. through this experiment we also generated a negative control sample from a healthy person to act as a contamination control- this was run in parallel to the reads
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

the method for case study 2: part B

A

next they had to do the virology step ie; which coronaviruses are we dealing with
1. they took the BAL samples and inoculated onto vero cell lines.
2.then studied the CPE over 7 days
3.took the supernatant of the cells that did show CPE and mixed with paraformaldehyde, dried and stained onto grids
4. studied the morphology of the cells
5.then stained slides with serum from healthy and convalescent individuals and added secondary antibody goat-antihuman IgG that fluoresces to see which cells were infected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What did case study 3 investigate

A

studied cervical DNA using NGS to identify more HPV types not identified using commercial kits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

rationale for case study 3

A
  1. they have the genomes of 102 HPV types
  2. commerical kits only detect 37 types
  3. HPV types 16 and 18 are the most important oncogenic HPV types
  4. high regional variation for the cervical cancer types caused by HPV
  5. HPV/HIV co infections highest in SA women and women with HIV are more inclined to develop HPV
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

why use NGS over commerical kits

A
  1. commercial kits are PCR based and rely on prior knowledge
  2. so they dont pick up novel or rare HPV types
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
method of case study 3 overview
1. they took cervical swabs from 109 women at an ARV clinic using digene cervical sampler 2. they stored these in digene transport medium 3. they then extracted the DNA using roche linear array HPV genotyping test. 4. this step only detects 37 types 5. found 97/109 women were positive and one sample was positive for 12 HPV types 6. they took this sample and did illumina sequencing on it to find anymore types
25
method for case study 3
1. so the DNA is extracted using roche linear array and then enriched using RCA. 2. In this process random hexamers will bind the circular DNA introducing multiple replication forks 3. then phi29 polymerase will displace the non template strand so it can be bound by more primers 4. the products of the RCA is double stranded tandem copies of the starting circle 6. the products are then fragmented, purified and amplified using illumina cluster generation kit V2 7. the amplified DNA then undergoes single read sequencing using kit V3 8. now the reads can be aligned and searched 9. any reads not detected in the roche linear test are PCR amplified to see their prevalence
26
results for case study 3
1. the new reads were HPV types 30, 74, 86 and 90 and found their prevalence to be between 8-15% 2. the high prevalence of types 30 and 74 should warrant their inclusion in future genotyping tests.
27
conclusions for case study 3: bypassing the problems associated with PCR
1. found that using RCA and NGS bypasses the problems associated with PCR: a. needing prior knowledge b. false positives due to cross reactivity c.false negatives due to low viral load
28
what was the purpose of case study 4
the origin of SARS-CoV-2 was still unknown
29
rationale for case study 4
1. as of late 2021 the origin of SARS was still unknown 2. essential to ascertain the coronavirus diversity in bats living in SE asia 3. found that SARS poorly infects humans and all known bat SARS-COV2 like viruses cannot bind hACE2 to enter human cells 4. none of them have the furin cleavage site associated with increased pathogenicity in humans 5. needed to find a bat virus that had a RBD genetically close to that of SARS and it must be able to bind hACE2 successfully
30
overview of the methods for case study 4
1. they screened 645 bats from 6 different families and took faecal, urine, saliva and blood samples 2. looked at the faecal samples first as they have the highest load. 3. had to do this experiment in a way that prevented potential exposure to public, little harm to bats and with appropriate PPE
31
method step 1: case study 4
1. first they took the 539 faecal samples and they extracted the DNA, generated cDNA and did a pan coronavirus nested RT-PCR on the cDNA. this RT-PCR targeted the RdRp gene found in RNA viruses and used degenerate and non degenerate consensus primers. 2. this means it is highly sensitive for coronaviruses 3. then the RT-PCR products are sent for sanger sequencing 4. sequences confirmed using blastn to confirm we have coronaviruses
32
method step 2: case study 4
1. next they knew they were looking for a subgenus of betacoronavirus so they did betacoronavirus enrichment 2. they downloaded 2000 beta coronavirus genomes and chose a representative 185 sequences for further analysis 3. because betacoronavirus has many subgenera they seperately aligned all the subgenera and generated 13 MERS spiked primers for each cluster; generating a total of 416 spiked primers
33
method step 3: case study 4
1. they could now take the sequences from the faecal samples and do RT again but this time using the spiked primers to find which subgenera is present. 2. RT generates cDNA which is fragmented and the fragments are loaded onto a flow cell for illumina sequencing 3. the reads undergo taxonomic assignment through which we found sarbecovirus
34
method step 4: case study 4
1. then they could look at recombination events occuring through the evolutionary history of sarbecoviruses and they chose 36 sequences that were phylogenetically closest to SARS-COV-2 2. used these sequences to generate a S-pseudotyped lentivirus 3. inoculated HEK293T human cells with this lentivirus and 48hrs later collected pseudotyped particles 4. then took HEK293T cells stably expressing hACE2 and infected with S-pseudotyped lentivirus and luminescence to see if the cells were succesfully entered.
35
method step 5: case study 4
1. did neutralisation assay to see if serum from healthy people could neutralise the bat virus
36
method step 6: case study 4
1. did virus isolation 2. they inoculated the rectal swabs onto vero cell line and studied for CPE 3. collected supernatant from the cells expressing CPE. 4. took the RNA from the supernatant and did RT-PCR targeting a conserved sequence in the E gene.
37
case study 4: results
1. they found a sarbecovirus genetically close to SARS whose RBD differs by only half a contact residue 2. it successfully binds hACE2 3. even though it doesnt have the furin cleavage site it is likely it contributed to the origin of SARS 4. from the pan coronavirus nested RT-PCR they found 24 bats were positive for coronaviruses 5. the blastn searched for all types of coronaviruses 6. the 7 sarbecovirus samples all came from rhinolophus bats living in the same district in laos 7. found the S1 spike protein showed low conservation in several bat coronavirues suggesting that this domain reflects a degree of adaptation of the virus to its host
38
conclusions: case study 4
1. had found a virus that had the same potential of infecting humans as early strains of SARS-COV-2 even though it had no furin cleavage site 2. people living near caves most at risk.
39
what did case study 5 examine
profiled the vaginal microbiome of HIV positive women using massively parallel semiconductor sequencing
40
why is MPS better than NGS
1. greater sequence read length 2. greater turnaround time from sample prep to sequence generation 3. cost
41
method CS5
1. they took 20 of the cervical swabs from case study 3 and extracted the DNA 2. then did roche linear array test to find the 37 HPV types 3. enriched the DNA using the RCA but this time with primers with greater specificity than the random hexamer primers for HPV. 4. then used ion proton technology to fragment the DNA 5. the reads were then mapped against human reference genome and 143 known HPV types 5. any reads not mapping to either underwent de novo analysis 6. ran a blastx search to find the proteins of these new HPV types and then added sequences of these new types to the database 7. they found the abundance of each HPV type by measuring the median coverage over the HPV references using the reads 8. then used SNP info to find the variance between sequences. for 2 sequences of the same HPV type to be considered variants they needed a distance of 30 SNPS
42
case study 5: finding co-infections
1. they then had to find all non HPV infections by doing a de novo assembly for all reads not mapping to any HPV types or to the human reference genome 2. some sequences didnt map to any reference sequences available
43
case study 5: results
1. they found 2 novel HPV types 2. found co infections of torque teno, JC and SEN-V. 3. found bacterial co-infections 4.found 40 HPV types in the 20 women - the fact that in only 20 women they found 2 novel HPVs indicates the diversity of HPVs yet to be discovered.
44
how did they find disease causing genes in the pre genomic era
used pedigree diagrams. these study familial disease inheritance. need to have greater than 3 generations
45
how did they find disease causing genes in the post genomic era
they used GWAS. ie; they would screen 1000s of individuals for genetic variants with calculated statistical estimates regarding the level of increased risk associated with the variants
46
what insights did they gain from the 1st human genome
1. that there are 3 billion base pairs in the haploid genome and 6 billion in the diploid genome 2. 18 million SNPS and 3.3 million for each individual 3. 25000 genes overall 4. first was 13 individuals so had lots of gaps but it did indicate the high degree of allelic variation amongst members of the same species
47
what are the 4 types of genetic variants and how often do they occur
1.SNPS- every 1kb- 18 million 2. indels-every 10kb-200000 3. SSRs- every 30kb-100000 4. CNVs- every 1Mb-8600
48
More about SNPs
1. single nucleotide substituion. -more than 1% of the population must have the alternate nucleotide for it to be considered a SNP -you can be hetero or homozygous for a SNP; take the top 2 nucleotides.
49
how do you determine which SNP is the wild type SNP
compare genotype to that of a chimp -chimp has the wild type SNP and all other SNPs must have arisen after the 2 species diverged
50
why does genomics fail to represent diversity
1. because most databases only consist of europeans 2. even though africans have the highest diversity; they are least represented in databases
51
how does one go about genotyping SNPs
-use a microarray -this method has SNP loci on the array -then you generate oligonucleotides from your genome of interest and hybridize to the array -then fragment your DNA and was over the array -the fragments that are complimentary to your oligos will hybridize to the array -then DNA ploymerase will add a fluorescently labelled nucleotide complimentary to the SNP because the oligo ends just before the SNP on our DNA fragment -this is happening simultaneously at mutiple loci remember you have 2 chromosomes and so can through this way determine if you are homo or heterozygous at that loci
52
what disease are SSRs linked to
huntingtons disease - this happens in the coding region of the HD gene -here there is a triplet repeat of CAG -if there are more than 42 repeats this leads to a disease allele - if there are less than 34 repeats this leads to a normal allele
53
incomplete penetrance
this is when not all individuals with the disease genotype will express the disease phenotype
54
genetic heterogeneity
this is when disease genotypes are responsible for the same disease phenotype in different families
55
polygenic determination
this is when a mutation at more than one locus is responsible for the disease expression in 1 person
56
example of complex inheritance
breast cancer: -this is the inheritance of 2 unlinked disease that predisposes women to breast cancer -all women have these 2 genes but only some have mutations in these genes -mutations in these genes show incomplete penetrance where only 66% of women with mutated BRCA1 will have breast cancer -reason for this is that these 2 genes encode for tumour suppressor proteins
57
what were the aims of the kirov paper on SCZ
1. to identify novel CNVs associated with SCZ 2. to compare novel CNVs to inherited CNvs 3. to compare SCZ CNVs to CNVs in other disorders 4. to understand the pathophysiology of SCZ
58
what cohorts did they use in the kirov study on SCZ
1. used parent proband trios from bulgaria- case only 2. parent proband trios from iceland-control only and this group had a controlled gene pool because they are geographically isolated 3. data from autism case control study 4. data from publically available databases
59
what is gene ontology
the study of gene function -done in this study to find the pathways in which the genes associated with SCZ are implicated in
60
what 3 aspects does gene ontology look at in gene function
1. molecular processes ie; at protein level 2. biological processes 3.cellular component
61
how does GSEA
- you start with the case and control groups -then you generate a gene list of the genes that have different genetic variants between the 2 groups -then you introduce candidate pathways and see which pathways are enriched
62
example of enriched pathway in SCZ
- Found 19 CNVs in the 664 genes found in the PSD pathway. this mean that some genes were dysregulated in this pathway which leads to enrichment of this pathway and thus; SCZ
63
define de novo
this is a genetic alteration occuring for the first time in 1 family member due to a mutation in the germ cell from either parent
64
results from the kirov paper on SCZ
1. found more rare or de novo CNVs in SCZ than in the control group 2. found some SZ CNVs were implicated in the pathogenicity of other diseases too 3. found 34 de novo CNVs ie; not inherited CNVs 4. of these 34; 8 were found at known SCZ loci and the rest at new SCZ loci 5. found CNVs in genes implicated in the post synaptic density pathway in SCZ patients 6. also found many genes involved in glutamatergic transmission were implicated in SCZ
65
what is a GWAS
1. microarray based 2. SNP loci on the array 3. identifies SNPs differing between case and control groups 4. need 1000s of individuals genomes 5. studies 1000s of SNPs simultaneously 6. does so with calculated statistical estimates with regards to the increased risk associated with each SNP 7. the GWAS looks for tag SNPs instead of looking at each SNP individually - because a small number of tag SNPs may have significant association with a trait
66
what is a haplotype
1. group of SNPs located on the same chromatid and are associated statistically ie; inherited together 2. every person has a specific haplotype at each locus and this determines their traits
67
what is a tag SNP
these are single representative SNPs in a genomic region with high linkage disequilibrium -they represent a haplotype ie; group of linked SPs
68
what are manhatten plots
these plots show chromosomes and the p values for each SNP spread across the chromosomes and their association with a trait. so the dots show the SNPs and the lower they lie; the lower their P value which means they have little to no association with the trait being observed. the higher they sit; the higher the probability that they are associated with the trait at hand. - see peaks at a specific locus - this SNPs are associated and show the same signal ie; haplotype at that locus. this group of SNPs have a very high association with the trait -ie; that genomic region has a high association with the trait
69
how to deal with multiple testing
1. make use of statistical correction 2. make sure replication is successful 3. make note of the false discovery rate
70
how to assess a GWAS publication
1. quality control 2. sample size needs to be large 3. no confounders ie; other traits differing between the case and control groups 4. biology ie; the data must support a functional hypothesis 5. replication: the results need to be replicated in independent studies
71
what are QQ plots
these plots show expected P values on the x axis against observed P values on the Y axis - if the SNPs skew off of the general trend line then we know there are some genomic regions with higher association with the trait than we expected and so the genetic influence on the trait is higher than what we expected