SAQ Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

when thinking about sequencing platforms what is normally the trade off between the different generations

A

trade off between producing lots of reads (short) or long reads but not many
read depth vs length of reads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is a flow cell cluster in illuminca

A

each cluster corresponds to a separate read.

has been amplified by bridge amplification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what can be used to increase cluster density and how

A

patterned flow cells
flowcells with nanowells in a distinct pattern. Each nanowell contains DNA probes to capture DNA strands for amplification but the regions within wells do not contain probes and thus are free of reads.
+ it reduces the problem of adjacent clusters overlapping
+ allows you to control the sizing of clusters
+ position of well is known so cluster can be easily identified
+ packed very densely so can get out more sequence data
- generates duplicated sequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  1. Why does the quality of a read decrease over its length in illumina?
A

PHASING

  • illumina relies on sequence by synthesis approach in which errors can occur
  • usually 4 dNTPs washed over, one incorporated and terminator. Terminator then removed and another dNTP can be incorporated
  • phasing occurs when this terminator is not successfully removed. The next nucleotide cannot bind so fro now on this DNA sequence will be one base behind the rest in the sequencing.
  • Over time these errors accumulate and pollute the fluorescence signal
    • also prephasing - terminator cap defective so one fragment can go ahead and incorporate 2 nucleotide in one cycle
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

2 colour illumina sequencing

A

also known as 2 channel sequence by synthesis
generates data faster that 4 colour whilst maintaining quality and accuracy
- only 2 images per cycle are required
- CONS:
incorrect base calls because of phasing lead to a rising pollution of the light signals over time, making it more difficult to differentiate the bases and to interpret the base quality
No colour could also mean that no base has been incorporated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

list the major properties of the E.coli K12 genome

A

most frequent strain in labs

  • commensal organism that survives in the lower intestine
  • able to survive in culture only under very specific conditions
  • unable to survive at all in gut
  • 4288 protein coding genes
  • regions of low GC content
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

describe whole genome shotgun sequencing and how gaps in sequence can be informative of regions with potential biotechnological applications

A
  • shotgun sequencing requires shearing of DNA, selection for specific size fragments and placing in a vector insert into bacteria. inserted fragments can then be sequenced using sanger with primers that overlap the vector backbone. then mapped onto a reference genome and assembled into contigs
  • Gaps in genome following seqencing must mean that insertion of these sequences into plasmid in bacteria leads to their death (toxic)
  • can identify these gaps - sequences/genes that are toxic to e.coli by interacting with the replication initiator DnaA
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what was found when the EHEC O157:H7 genome was compared to K12

A
  • genome was 1Mb bigger than K12
  • O islands - only present in O157 eg type III secretion system/ shiga toxin
  • K - islands - regions only present in K12
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what kind of e.coi is CFT073

A

UPEC - uropathogenic e.coli
assocaited with UTIs
harmless in intestines but become pathogens when they invade urinary tract, blood or CSF
genome similar size to O157 but the extra sequences relative to K12 are not the same as O157

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define the terms “pangenome” and “core genome” and how they can be estimated

A

Core genome represents all the genes present in all the strains of a species. Typically estimated by comparing WGS of multiple genomes. As more genomes are compared this number decreases. Rasko paper estimated the E.coli core genome to be around 2200 genes which are mainly involved in metabolic processes
pangenome is the entire gene set of all the strains of a species including the core genome and the variable/accessory genome
broad sample of the diverse pathogens that
comprise this speciesx. doesn’t come to a plateu showing that the e.coli genome is effectively infinite (open) - suggests it must still be evolving

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what was the estimate of the number of unique genes per e.coli genome sequenced

A

300 genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

since advent of illumina more draft genome sequences have become available why arent they finished

A

bc finishing and annotation remains a laborious process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what can you do when you have lots of genomes for a bacterial species

A

bacterial genome wide association study
eg study looking at campylobacter. Sequenced genomes of campylobacter from a whole range of different hosts (chickens, cows, birds)
looked at phylogeny between strains
not one evolutionary lineage is associated with one host
GWAS found genes from vitamin B5 synthesis pathway (PanBCD) are present in bacterial strains that infect cows but not chickens
- (isolates from cattle grew better, on average, in a low vitamin B5 environment than isolates from chickens)

** gene Cj0299 which encodes an enzyme giving resistance to beta lactam antibiotics found at highest frequency in cattle and was rarest in bird isolates **

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain how multiple genome sequences can provide an insight into genome evolution and horizontal gene transfer

A

By comparing multiple genomes of the same species a core genome can be derived. Different strains accessory genomes can then be identified that give rise to their overall phenotype. Some of this accessory genome can be derived by horizontal gene transfer recently in which regions of the genome would have abnormal GC content due to the fact that it hasnt gone through ameriolation yet.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe how the development of new technologies has influenced our understanding of E.coli/Shigella diversity

A

before the advent of molecular biology: serotyping: based on the immune recognition of cell surface antigens - bacteria of the same serotype cross react to the same antibodies (doesnt correlate v well with similarity on genetic level)

hybridisation of different strains to see how similar they are at a molecular level - found that E.coli and shigella are comparable - shigella tended to be more diverse

MLEE - multi level enzyme electrophoresis: characterises organisms depending on the electrophoretic mobility of its proteins. - allowed construction of ECOR collection

comparison of gene sequences found that shigella is within the evolutionary diversity of e.coli and has arisen on multiple occasions from e.coli - some properties shared were examples of convergent evolution

MLST - multi locus sequence typing- sequence multiple genes and compare (usually housekeeping 400bp chunks)
- found lineages of e.coli have acquired the same virulence factors in parallel including a pathogenicity island involved in intestinal adhesion and phage-encoded Shiga toxins.
- Sequence 8 HK genes in 46 shigella strains representing each of the 4 serotypes
Shigella strains are well distributed within the diversity of E. coli
presence of three major clusters and five forms not closely related to any other suggests that the Shigella phenotype has arisen eight times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what can be used to compare the diversity across species

A

16S rRNA sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

how can 16S rRNA profiling be used to investigate microbial diversity

A

16s rRNA can be used to investigate microbial phylogeny due to the fact that all microbes have it bc of its importance in translation. evolves slowly due to its fundamental function in cell. Has variable regions that vary more and can be compared to build a picture of the phylogeny

  • design primers to conserved regions that span these variable regions, amplify up and sequence
  • can be used to determine the evolutionary relationships between strains/species of bacteria
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what are the caveats of 16s rRNA profiling

A
  1. primers may not be truly universal
  2. contamination may be an issue (link to paper about contamination in DNA extraction kits)
  3. sequencing errors can result in overestimation of diversity of organisms present
  4. some organisms have multiple copies of the 16s rRNA gene which vary in sequence (overestimation of diversity)
  5. PCR bias may result in incorrect quantification of species
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what was the first major application of the use of 16s rRNA to study the diversity of organisms

A

Carl woese 3 domain tree of life

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what can the microbial dark matter problem also be called

A

great plate count anomaly - observation that most of. the microbes seen in the microscope cannot currently be grown

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is metagenomics

A

study of genetic material recovered directly from environmental samples
Tells you what genes are encoded by and what bacteria in your sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what did craig venter to double the size of GenBank

A

sample from sea, genome extracted, fragmented and sequenced using sanger. Revealed some “dark matter”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

how can metagenomic techniques be used to study the human microbiome

A

swabs can be taken from different individuals and sequenced for different areas of the body
Revealed link between health and bacteria in body
Obesity: reduced ratio of Bacteroidetes to firmicutes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what is an example of when metagenomic sequencing results can be misinterpreted

A

went round collected swabs from New york and sequenced using shotgun sequencing. claimed plague was present when in fact there were no reads mapped to the yMT gene (toxin) - in first paper they claimed it was present but that was a mistake.. they were actually looking for related plasmids
when sequencing one strain - they found most closely related in database was anthrax so concluded anthrax was present - it wasnt ( no evidence of pIcR-SNP - a defining feature of anthrax)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

what is single cell genomics (exploring unculturable microorganisms)

A

the amplification and sequencing of DNA from single cells obtained directly from environmental samples
single cells isolated: FACs, laser microdissection, optical tweezer, micropipetting
PCR amplification and sequencing
• Amplification is challenging and the assembled genomes will often have patchy coverage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

what is iChip

A

a method of culturing previously uncultural bacteria
environmental sample eg soil is in diluted molten agar and nutrients until one cell is in one well
the chip plate is then placed back in the soil to access nutrients unavailable in the lab
50-60% of species are able to survive

Teixobactin antibiotic discovered using iChip in 2015
Anticancer agents, anti-inflammatories and immunosuppressives also discovered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

the main points of the 2011 E.coli outbreak

A

mainly young women affected
Haemolytic ureamic syndrome
found to be caused by the unusual serotype O104:H4
wrongly found spanish cucumbers as cause - problem with a public case/sharing
BGI sequencing showed that the strain had EAEC properties and closely resembled the 55989 strain found in a HIV patient in africa
- all sequence data published online (crowdsourcing) to construct phylogeny - outbreak closer to EAEC
- Recognition of outbreak strains was hampered by the inappropriate use of diagnostic tests focused on O157:H7

After outbreak
PacBIO sequencing showed that the strain had evolved from EAEC and acquired EHEC like properties
- PCR confirmed shiga toxin
- plasmid bearing beta lactamase gene
confirmed to be caused by beansprouts grown in lower saxony from egyptian fenugreek seeds
- separate smaller outbreak in france where these seed were also used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

what did O104:H4 not have showing it isnt a EHEC

A

Type III secretion system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

main points from salmonella outbreak

A

isolates from 16 patients sequenced on MiSeq
- found they were all part of same outbreak strain
- availability of definitive typing data so early on enabled identification of transmission between hospital wards and action to be taken
- can be serotyped in 40min and determined to be part of outbreak in 2h
outbreak chain found on door seal of a food trolley

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

what was used to sequence ebola in africa

A

MinION

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

main points from ebola case study

A

142 ebola virus samples sequenced in real time in Guinea using MinION
combined with 603 sequences from other studies to create phylogenetic tree
- allowed people to be quarantined
- found transmission across border of guinea by integrating data with another team in sierra leone
- Within 48 hours the new sequence could be added to the phylogenetic tree. Could isolate individuals and prevent further spread of the outbreak, infer were an individual had got the virus- narrow down the transmission chain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

what are the challenges of sequencing in the field

A

Power supply, head torches, communication was a large problem.
Had to use uninterruptable power supplies (UPS)
Internet was a challenge- responding to emails impossible and the upload of reads for bioinformatic analysis in the UK a daily challenge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

coronavirus main points

A

one month after first case the full genome was published 82 per cent similar to Sars but also 90 per cent similar to a bat coronavirus

  • helps with the design of diagnostic kits
  • very little genetic variation between the first 10 patient samples sequenced (RNA virus has v high mutation rate) sign the virus recently jumped from animals to humans. suggests one transmission - questionnaires point towards a meat market
  • clusters near bat virus suggesting it originated in bats and was transmitted to humans
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

listeria outbreak key points

A

19 people infected in 9 states whole genome sequenced
Sowed they were genetically related eg 1 common source
Epidemiologic and laboratory evidence indicated that packaged salads were the cause

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

what signatures do active enhancers usually have

A

nucleosome free and the regions flanking them have characteristic post translational modifications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

how can enhancers in the genome be identified

A

DNase hypersensitivity assay

cant chop DNA where there are histones – lots of cutting in nucleosome depleted regions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

main points of encode paper

A

used biochemical definition of function
analysed different cell lines (-majority ES or cancer)
looked for 5 signatures:
RNA expression (RNA-seq), DNA/protein interactions (TF ChIP-Seq), chromatin accessibility (DNase hypersensitivity), 3D structure, methylation (RBBS)
- 3 different tiers depending on which assays they did
- found 80% to perform some reproducible biochemical function and defined this as functional

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

3 definitions of function

A

causal role - a sequence has a function if that sequence causes F to happen (heart adding weight to body)
Selected effect - a sequence has a function if the sequence exists because of the function (heart pumping blood)
genetic function: a sequence has a function if the sequence is required for the function and deleting the sequence affects the function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

what are the problems that you come across when working out how much of the genome is under selection

A
  1. how do you know what is expected (and therefore what goes against this)
  2. if a distance species is used you only find function conserved across time (best estimate is to use indels)
  3. if too close species are used there are not enough mutations to find conserved regions and areas susceptible to mutation
40
Q

by using indels and comparing the percentage of the genome that has fewer indels than expected how much of the human and mouse genome are conserved (under purifying selection)

A

3%

41
Q

by using indels and comparing the percentage of the genome that has fewer indels than expected how much of the human genome is under purifying selection

A

7.1-9.2%

42
Q

how do you look for positive selection in the coding sequence

A

ratio of non synonymous to synonymous changes if over 1 then there must be some selection

43
Q

how do you look for positive selection in the non coding sequence

A

intraspecies diversity compared to inter species diversity
use 1000 genomes 4%
using GnomAD expect a new estimate based on a larger population of humans

44
Q

what is compensatory evolution

A

when a mutation results in loss of fitness a new mutation can arise that raises the fitness back to one and thus compensates for the first mutation

45
Q

give an example of compensatory evolution

A

4 species of yeast with same amount of TF binding. Over evolutionary time you get more change in sequence than in binding energy - conservation of function without conservation of sequence

46
Q

what did the encode project conclude about the conservation of the encode elements

A

17-90% are under no constraint

47
Q

give some examples of function without conservation

A

genes for eye colour in northern europe
genes for alzheimers - happens after reproduction (no selection)
genes for rolling your tounge

48
Q

main points of compensatory evolution in the animal livers study

A

mostly where theres a promoter in one species there is in the other species - location is conserved
enhancer location is not conserved between species

49
Q

what does transcription factor binding sites NOT imply

A
transcription factor binding
enhancer state
contact with promoter
regulation of a promoter
phenotypic consequence for the cell
phenotypic consequence for the organism
50
Q

what does transcription factor binding sites NOT imply

A
transcription factor binding
enhancer state
contact with promoter
regulation of a promoter
phenotypic consequence for the cell
phenotypic consequence for the organism
51
Q

what main points did dan graur make

A

used the wrong definition of function inconsistently
adopt a strong version of the causal role definition
DNA segment may display a property without manifesting a function
100% of the genome replicates - 100% functional
they imply that a function can be maintained without selection which implies that no deleterious mutations can occur??
just because somethings transcribed doesnt mean its functional - psuedogenes/transcriptional noise/introns/mobile elements
-biased sensitivity over specificity
- any random DNA sequence of sufficient length will contain transcription factor binding sites
- number of CpGs is higher than number of protein coding genes - how can these all be functional - used some cancer cells where methylation is completely different to normal cells
C value paradox - if majority of genome is functional and this extends to other species - onions are 5x more complex

52
Q

what is dan graurs other point against encode that he made in a paper after encode in 2017

A

for population to maintain size an increase in fertility must compensate for reduction in fitness caused by deleterious mutations. the increase in fertility required depends on the amount of sites that are functional, the mutation rate and fraction of deleterious mutations.
find how much of genome is functional by taking the average amount of children per couple as 1.8
- 25 percent is the upper limit of functionality, with the actual number likely closer to 10-15 percent.

53
Q

what are the disadvantages of the biochemical definition of function

A

Very sensitive approach and can lead to overestimation of functionality
presence of a biochemical signature does not necessarily mean that the RNA is functional - additional tests need to be taken

54
Q

what are the disadvantages of the genetic definition of function

A

often misses effects that only occur in rare cells or specific environmental contexts
widespread redundancy means that double or triple knockouts may be needed to detect functionality
percentage of functional DNA is always lower than the actual percentage

55
Q

what are the disadvantages of the selected definition of function

A

sequence comparisons depend heavily on the method used to align sequences in different species
non conservation does not imply non functionality - orphan gene may be fully functional even though it occurs in only one species

56
Q

what is proteomics

A

the qualitative and quantitative comparison of proteomes under different conditions to further unravel biological processes

57
Q

what is a proteome

A

the sum of all the proteins in an organsims, tissue, cell or simply the sample being studied

58
Q

why don’t mRNA sequences always correlate with the final protein sequence

A

post translational modifications - permenant and temporary

  • protein splicing
  • additions deletions eg glycosylation/phosphorylation
59
Q

list some proteomic analysis methods

A
1D PAGE
2D PAGE
LC (can be coupled to mass spec)
mass spec
protein microarrays
60
Q

3 steps to mass spec

A

ionisation
separation
detection

61
Q

what are the 2 main ways of generating ionised samples for mass spec

A

matrix assisted laser desorption/ionisation: analyte is mixed with matrix and fixed to a surface. an individual spot is heated with a laser beam. As energy is put into the system the analyte can gain more energy and become ionised. the potential difference across the chamber causes the ionised species to fly to the detector. Usually used coupled to time of flight and in line with 2D page

Electrospray ionisation
analyte is dissolved and forced through a narrow needle at high voltage. A fine spray of droplets enters chamber
in evaporation chamber droplets lose their solvent
enventually droplets become so small that they reach the rayleigh limit. this causes repulsing between and causes the particles to enter gas phase
readily integrated with LC

62
Q

what are the main ways of separation in a mass spec (list)

A
time of flight
quadrupole
ion trap
fourier-transform on cyclotron resonance
orbitrap
63
Q

explain how each of the separators work in MS

A

time of flight - exploits the fact that heavier ions will take longer to travel than lighter ones

quadrupole - set of 4 parallel metal rods that are electrically connected so a voltage is applied across them. when used in scanning mode can select ions with a specific m/z to reach detector by applying different RF

ion trap - has 2 sets of electrodes (ring electrode and 2 end cap electrodes) the voltage applied to the ring electrode determines which ions are trapped. ions above the threshold remain in the trap whilst others are ejected towards the detector

fourier transform ion cyclotron resonance (FT-ICR) - principle that ions in a magnetic field will orbit at a frequency that is related to their mass, charge and strength of the magnetic field. As ions cycle between 2 electrode electrons are attracted to one plate and then the other with the same frequency. the detector can them measure the movement of electrons

Orbitrap - ions enter electrical field and rotate around the central electrode because their attraction to it is balanced by centrifugal force. causes harmonic oscillations whose frequency is inversely proportional to the m/z

64
Q

what are the major steps to analysing a proteins by MS/MS

A
  1. digest the protein into peptides (trypsin or other)
  2. separate peptides eg LC
  3. ionise molecules
  4. measure mass of peptides in MS1
  5. fragment peptides by collision induced dissociation (CID)
  6. measure mass of fragments from peptides
  7. use fragment mass data to determine the sequence of the peptide
65
Q

what are the major problems with protein identification from databases (6)

A
  1. post translational modifications might mislead identification (during analysis look for shifts away from the value youre expecting to see
  2. protein degradation - a fragment created in handling may have a chance match to a known protein
  3. incomplete databases
  4. relative concentrations from single intensities can be misleading if a particular ion is supressed
  5. protein may exists as more than one polymorphic variants and only one may be in database
  6. non specific cleavage
66
Q

main points of why fragment proteins for mass spec

A

aids mass identification
multiple matches give confidence to asignments - algorithms can be used to carry out virtual digests of every protein in database and these can be matched to the experimental values
one caveat could be that you need to know the protein youre working on before but its not a problem as the human genome has been sequenced

67
Q

what are the 3 variations of MS/MS

A

quadrupole - collision cell - quadrupole
quadrupole - collision cell - TOF
quadrupole - collision cell - linear ion trap

68
Q

what are the simple stages to MS/MS

A

parent protein digested with trypsin
samples are ionised to generate a mixture of ions
ions of a specific m/z are selected
these ions undergo collision induced dissociation (CID) to generate product ions that are detected

69
Q

MS/MS can be used to derive protein sequences de novo, how?

A

CID favours breakage at the peptide bond - produces a series of fragments known as b and y. b ions extend from amino terminus and y ions extend from carboxyl terminus. by observing the changes in mass between the b ions can begin to work out the amino acid sequence

70
Q

what is the benefit of using ms/ms when analysing a phosphorylated peptide

A

can do de novo sequencing to determine where the phosphorylation occurs. without this the peptides phosphorylated at different sites would be indistinguishable

71
Q

summarise label free quantitation methods

A

spectral counting - number of recorded spectra corresponding to a peptide correlated with the abundance of that peptide

precursor ion signal intensity - isolating m/z values representing analytes of interest from a chromatogram

72
Q

what are examples of post digestion in vitro labelling

A

ICPL
iTRAQ
TMT

73
Q

what are examples of pre digestion in vitro labelling

A

ICAT

74
Q

what are the principles behind ICAT and what are its drawbacks

A

reagent has 3 parts - cysteine reactive group, linker group of different masses and biotin tag
proteins are denatured and labelled with light or heavy form for different samples
samples are combined and digested with trypsin
can isolate using streptavidin column

  • around 10% of proteins dont contain cysteine and are excluded from subsequent analysis
75
Q

what are the principles of ICPL

A
  • samples are digested with trypsin
  • different samples labelled with different versions of the reagent which differ in mass
  • reagent labels lysine (exposed by trypsin digestion)
76
Q

what are the principles of iTRAQ

A

reagent has a reporter group, balancer group and peptide reactive group
following peptide cleavage peptides are labelled with iTRAQ reagent. different samples labelled with reagent with different mass reporter group (reporter group and balance group = 145 in 4 plex)
- labelled samples mixed together and undergo LC-MS/MS
- in collision cell reporter groups are fragmented off and can be analysed to give information about quantification

77
Q

what are the drawbacks of iTRAQ/TMT

A

reagents are extremely costly and also extremely sensitive to contamination from salts.
sophisticated software is required for analyzing
ariability arising due to the inefficient enzymatic digestion

78
Q

what are the basic steps to the in vivo metabolic labelling strategy SILAC

A

2 populstions of cells are grown in different culture media
after a number of cell divisions the proteins in the cells will contain either light or heavy amino acids depending on which media they have been grown in
cells harvested and proteins extracted
proteins from different samples mixed
undergo tryptic digestion
LC-MS/MS analysis

79
Q

what are some applications of SILAC

A
  1. expression proteomics
  2. dynamic changes in PTM (can enrich for phosphopeptides)
  3. protein protein interaction studies protein complexes are immuno-precipitated from the mixture of SILAC-labeled cell lysates. Combined with SILAC, specifically interacting proteins can be efficiently distinguished from nonspecific background proteins.
80
Q

advantages of SILAC

A

Differentially treated samples can be combined at the level of intact cells or protein, namely at the very first step of the experimental workflow, and can be processed together to minimize experimental error or bias

81
Q

disadvantages of SILAC

A

only appropriate for cell samples, which requires a long time due to cell culture

82
Q

what can be used to analyse the proteins associated with a target protein

A

antibody targetting a specific protein and pulling down this protein and the proteins associated with it

add simple epitope to protein via recombinant expression. use antibody to pull this and its binding partners down

double tags with cleavable linker on bait protein for highly efficient purification without overexpression

83
Q

how do microarrays work

A

antibodies for specific proteins in wells, pour over soup of protein and see what binds using fluorescent label or surface plasmon resonance

84
Q

what are the 3 different type of microarrays you can do

A

analytical protein microarray - aimed at a set of proteins
functional protein microarrays - investigate protein protein, protein ligand and protein cell interactions
reversed phase microarray - • Arrays of complex mixtures such as cell lysates or serum samples.
• Probed with labelled affinity reagents to examine analytes from different sources eg. specific protein level in different cell sample

85
Q

what different types of targetted arrays are there

A

forward phase array - protein attached to slide and probed with antibody

sandwich array - detection requires the binding of 2 distinct antibodies (a capture antibody and a reporter antibody, each binding to a unique epitope), confers greater specificity and lower background signal

reverse phase microarray- proteins attached to chip and probed with primary antibody and secondary reporter labelled antibody

86
Q

list the main proteomics case studies

A
  1. microarrays ErbB proteins
  2. biomarker discovery - enrichment for glycosylation using lectin column
  3. SILAC haploid vs diploid yeast - pheromone signalling enrichment in haploid
  4. pulsed silac of HeLa cells
  5. iTRAQ for investigation of subcellilar locations - APEX adds biotin to nearby proteins
87
Q

main points of ErbB protein study

A

dimerisation of ErbB causes the SH2 or PTB domains of downstream proteins to bind

  • cloned and purified SH2 and PTB domains and spotted out on microtitre plates
  • synthesised residue peptides of the ErbB with fluorescent tag
  • probed the array with 8 concs of each peptide to calculate Kd
  • defined specific binding as when the signal is 2 times better than unphosphorylated
  • found a range of values of interaction strengths
88
Q

main points of the biomarker study

A
  • dynamic ranfe of proteins in blood plasma - interested in low concentration (signalling and diagnostic)
  • cells that shed peptides will be n glycosylated and able to be detected in blood
  • glycoproteins can be enriched for using lectin column
  • release from column using PNGase F
  • identify by MS/MS
  • establish a set of biomarkers present in disease compared to control
  • can now do a targeted approach to quantify key peptides
89
Q

key points of the diploid vs haploid yeast study

A
  • SILAC used - grow cells under different conditions
  • mix protein extract together
  • look for small mass changes to find changes in quantity of proteins in haploid and diploid
  • work out heavy:light ratio for peptides
  • top 10 haploid proteins were targets of pheromone signalling
  • correlation between protein and mrna is poor - after filtering out low level mrna it is better *
90
Q

main points to the HeLa study

A
  • pulsed silac
  • placed in light and medium media. Heavy media then added to medium
  • extract proteins at different times combine, separate out by location using centrifugation (cytoplasm, nucleoplasm, nucleoli)
  • M:L decrease = protein degradation
  • H:L increase = synthesis rate
  • H:M = turnover rate
  • 5% most abundant proteins involved in RNA processing and cell cycle regulation
  • not measuring dynamics
  • protein subunits have faster turnover in compartment where complex assembles
  • poor correlation with mrna levels
91
Q

key points of the study using iTRAQ to identify proteomes in subcellular location of mitochondria

A
  • target apex to mitochondria so that it can add biotin tag to nearby proteins
  • these proteins can then be pulled out using steptavidin column
  • Do itraq labelling 114 (control), 115 (gal4 control), 116 (mitoAPEX replicate A), 117 (mitoAPEX replicate B)
  • calculate iTRAQ ratios of proteins 10X more likely to be in matrix
  • 389 genes passed the threshold
92
Q

what does the human proteome project aim to do

A

publish all protein sequencing data onto one database. can be useful in research and in health and disease.

93
Q

summarises the pandey et al (kim et al) study of the human proteome that was published in 2014

A

examined 30 normal tissue types.

  • Samples from 3 people per tissue type were processed through several steps, and then the protein fragments (peptides) were analyzed on high-resolution Fourier-transform mass spectrometers
  • The resulting draft human proteome map includes proteins encoded by more than 17,000 genes—about 84% of the total known protein-coding genes
  • includes 193 novel proteins from regions previously thought to be non-coding.
  • detected proteins encoded by housekeeping genes that make up 75% of the proteome mass
  • nearly 200 genes begin at locations other than those predicted based on genetic sequence.
94
Q

main points of the wilheim 2014 study (proteomics DB)

A
  • peptide based MS/MS analysis of samples from tissues/organs or cell lines combined with data from literature
  • found some proteins to be resistant to trypsin cleavage eg keratin so used chymotrypsin
  • coverage was evenly spread out across all chromosomes except 21 and Y
  • GPCRs underrrepresented but many have become pseudogenes
  • found housekeeping genes differentially expressed in different tissues
  • PCA showed that cell lines retain protein expression characteristics from their primary tissues
  • covered 92% of protein-coding sequences
95
Q

main points of tissue based map of human proteome

A
  • 24000 antibodies used to probe 44 tissues (precise info about whats present) to create 13 million immunohistochemistry images
  • RNA seq on 32 of these tissues - measure cDNA fragments
  • based on integrated omics approaches (proteomics and transcriptomics)
  • immunohistochemistry has the advantage of adding spatial
    resolution and information on expression pattern in certain cells or subcellular structures.
  • able to classify various subproteomes
  • drug targets: 30% of approved drug targets proteins are expressed in all analysed tissue (off target effects)
  • 60% of genes implicated in cancer are expressed in all tissues (expected as theyre involved in cell cycle regulation)
  • broad spectrum of tissues analyzed in the Human Protein Atlas allows for searches of proteins expressed in certain tissues or groups of tissues, to generate gene lists with potential biomarker candidates
  • many tissue enriched genes are downregulated or completely turned off in cell lines (cell lines are dedifferentiated) - contrasted wilheim study
96
Q

main points from subcellular map of proteome study

A
  • wanted to understand human proteome in a very detailed level
  • integrated approach: transcriptomics and antibody based immunofluorescence with validation by MS
  • defined the proteomes of 13 major organelles and revealed multilocalising proteins
  • Smaller organelles such as the midbody and nucleoli showed a larger diversity than previously recognised
  • 15% proteins showed single cell variation eg ZNF554
  • half proteins were localised to multiple comparments (shared pool of proteins)