Big picture concepts Flashcards

1
Q

In a sentence, describe the central dogma of molecular biology

A

• DNA is transcripted into mRNA, which is translated into proteins that, after protein modifications, have function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What -omic is used to analyse genes and how?

A

o Genes are analysed through genomics, involving DNA sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What -omic is used to analyse mRNA and how?

A

o mRNA is analysed through transcriptomics, involving microarrays and next-gen sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What -omic is used to analyse proteins and how?

A

o Proteins are analysed through proteomics, involving electrophoresis, chromatography and mass spectrometry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What -omic is used to analyse function and how?

A

o Function is analysed through metabolics, lipidomics and mass spectrometry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does transcriptional regulation involve?

A

o Transcriptional regulation involves alternative splicing, cell type specific expression…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does translational regulation involve?

A

o Translational regulation involves masking, mRNA stability…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does post-translation regulation involve?

A

o Post-translational regulation involves modification by O-GlcNAc, phosphate, ubiquitin…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What percentage of the human genome codes for protein coding regions?

A

• Around 1% is protein coding regions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe the first estimate of the number of genes in the human genome and how it compared to reality

A

• First draft (2001) of the human genome estimated 30-40000 genes but by 2007 it was found that there were about 20,500 genes in the human genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How are we similar to E.Coli and yeast?

A

o Metabolically, we are similar to E.Coli and yeast

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Are the genes in the human genome unique?

A

o Most of our genes are shared with close and some with distant relatives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How many more genes do we have more than unicellular organisms?

A

we have 4-5x more genes than unicellular organisms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How many genes do dogs have? Do they have more or less genes than humans?

A

o We have more genes than dogs (19000)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How many genes does the worm have? Do they have more or less genes than humans?

A

We have less than the worm (25000)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How many genes does the arabidopsis have? Do they have more or less genes than humans?

A

We have less than the arabidopsis (28000)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How many genes does rice have? Does it have more or less genes than humans?

A

We have less genes than rice (75000)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What percentage are we identical to chimps?

A

o We are 96% identical to chimps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Do humans know the function of all their genes?

A

• Almost half the genes have an unknown function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Which is more complex, the genome or the proteome? Why?

A

• Complexity resides in the proteome
o Whilst the genome is static, the proteome can exhibit temporal and spatial differences
• The proteome is constantly changing as cells respond to environmental conditions
o DNA is chemically homogenous whilst proteins are heterogenous
• The proteome may be as complex as a whole organism, a tissue or a single cell type
o Proteins are cellular effectors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the proteome?

A

• Proteome- the proteins expressed by the genome at any one time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the functional proteome?

A

o Functional proteome- part of protein that is expressed at this point in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the theoretical proteome?

A

o Theoretical proteome- the genetic basis of the proteome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is proteomics?

A

• Proteomics is the study of the proteome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are metabolites?

A

• Metabolites- small molecules that are chemically transformed during metabolism and that, as such, provide a functional readout of cellular states.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Why are metabolites easier to correlate with phenotype compared to genes and proteins?

A

o Unlike genes and proteins, the functions of which are subject to epigenetic regulation and post-translational modifications, respectively, metabolites serve as direct signatures of biochemical activity and are therefore easier to correlate with phenotype

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is metabolic targeting?

A

• Metabolic targeting- quantification of a specific metabolite

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is profiling?

A

• Profiling- quantification of a group of related compounds or those found in a single biochemical pathway

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What are the definitions of systems biology and why are there so many?

A

o Systems biology- study of living systems/ecosystems (e.g. gut microflora)
o Systems biology- using a global systematic approach studying a living system

• Systems biology is defined by Leroy Hood as:
o Hypothesis-driven
o Requires global/big data acquisition
o Need to integrate different types of data
o Need to delineate biological network dynamics
 Network has spatial and temporal aspects that need to be understood
o Know how every single element in the network influences all other elements-allows for deeper understanding of the system
o Formulate models that are predictive and actionable- hypothesis generating

• But there is no concise definition of systems biology that all system biologists agree upon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are the two main philosophies towards systems biology

A
  • The reductionist approach towards systems biology

* The expansionist approach towards systems biology

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is the reductionist approach towards systems biology?

A

• The reductionist approach towards systems biology
o Systems biology is molecular biology, which is a continuation of mechanistic Darwinism, at a larger scale
o Reductionism-the practice of analysing and describing a complex phenomenon in terms of its simple or fundamental constituents, especially when this is said to provide a sufficient explanation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is the expansionist approach towards systems biology?

A

• The expansionist approach towards systems biology
o Emergence- complex systems have emergent properties which can’t be deduced from a reductionist approach
 Individual components in a living system interact with each other
o If components have to interact with each other, there cannot be an understanding of the living system by only looking at individual parts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What are Koch’s postulates?

A

o Koch’s postulates
 The microorganism must be found in abundance in all organisms suffering from the disease, but should not be found in healthy organisms
 The microorganism must be isolated from a diseased organism and grown in pure culture
 The cultured microorganism should cause disease when introduced into a healthy organism
 The microorganism must be reisolated from the inoculated, diseased experimental host and identified as being identical to the original specific causative agent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Describe Falkow 1988’s Koch’s molecular postulates

A
  • The phenotype (sign or symptom of disease) should be associated only with pathogenic strains of a species
  • Inactivation of the suspected gene(s) associated with pathogenicity should result in a measurable loss of pathogenicity
  • Reversion of the inactive gene should restore the disease phenotype
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Are Koch’s molecular postulates reductionist or expansionist?

A

 Inherently reductionist-relies on a single gene being reasonable for a complete phenotype
 Does not (until recently) consider all the off-target effects of knocking out the single gene
 Many genes have multiple protein functions-no elucidation of the specific protein function affecting the disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What -omics are primarily used in systems biology?

A
  • Genomics
  • Transcriptomics
  • Proteomics
  • Metabolomics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is the difference between genomics and genetics

A

o Organism-scale rather than single-gene (genomics vs genetics)
o Genetics and molecular biology is reductionist
o Genomics is expansionist (how all parts work together)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What are genome wide association studies, their procedure and their purpose?

A

o Large scale SNP and mutation analysis (e.g. GWAS) provide associations
 Genome-wide association studies
• Aims to identify genetic component of multifactorial diseases
• Hypothesis-free or unbiased testing of the genome for association with disease or observable traits
• Using DNA samples from many people
o Disease cases vs matched controls
 Matched controls- people of the same ethnic background
• Rapid scanning of genetic markers (SNPs)
o Across DNA subsets or whole genomes
o DNA microarrays
o Next-generation DNA sequencing
• Searching for variation associated with disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is genomics enabled by?

A

o Enabled by high-throughput sequencing technology

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What was the 1000 genome project, what did it aim to, what it achieved and what it cost

A

 Launched 2008, published in 2012
 Spent about $30-50 million for 1092 genomes (about $50 000/genome)
 Identify >98% genetic variants which have a frequency of >1%
 Achieved by light sequencing of the whole genome and heavy (high replicates) sequencing of the exome
 Aims:
• To characterise the geographic and functional spectrum and to understand genetic contributions to disease by comparing these 1000s of genomes to each other
• Can tell us about evolution and sequence diversity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What was the 10,000 genome project, what did it do and what did it find?

A

 Over 10,000 genomes with 30x-40x exome coverage
 Presented the distribution of over 150 million single-nucleotide variants in the coding and noncoding genome
 Each new sequenced genome contributed an average of 8579 novel variants
 Found that single nucleotide variants (SNVs) are generally rare in transcription factors (due to their essentiality) and occur more frequently in non-protein coding regions and outside of transmembrane receptors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What is Miller’s syndrome?

A

 Miller’s syndrome is a rare inheritable disease that causes facial and limb abnormalities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

How was genomics essential for elucidating genetic associations in Miller’s syndrome? Give an example

A

 Genome sequencing has been essential for elucidating genetic associations in Miller’s syndrome
 Roach et al. 2010
• Sequenced 4 genomes (both parents and two affected offspring) at 99.999% accuracy
o Removes noise as nucleotide variants are accounted for due to familial relationships
• 3.6M single nucleotide polymorphisms within the group
• Clustered the single nucleotide polymorphisms to identify 4 candidate genes that may be responsible for Miller’s syndrome
o Those genes code for proteins
o The major gene associated with Miller’s syndrome is:
 Dihydroorotate dehydrogenase (DHODH)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What is dihydroorate dehydrogenase (DHODH) and what is its purpose?

A

• DHODH is a major enzyme in the de novo pyrimidine biosynthesis pathway
• DHODH is essential for pyrimidine synthesis
o De novo pyrimidine biosynthesis is the major mechanism by which the cell generates pyrimidine nucleotides

45
Q

Describe how dihydroorate dehydrogenase dysfunction affects Miller’s syndrome

A
  • DHODH is essential for pyrimidine synthesis
  • However, there is also a salvage pathway (via catabolic products) that recycles old nucleotide products
  • In Miller’s syndrome, mitochondrial DHODH is dysfunctional (mutations may lead to- incorrect localization, incorrect folding, degradation, lack of efficient catalysis etc.)
  • Survival is based on the ‘severity’ of the mutation, salvage and use of orotate from other sources
  • This is highly inefficient from development onwards
46
Q

When faced with a disease that is genetically inherited and influenced, what should be done to determine the cause of the disease and what are the benefits of this approach?

A

 Genetic mutations cause disease because of the functional consequences on the encoded proteins
• Hence it is important to design experiments based on genome sequences to find a genetic association with disease, and then looking at every step of the central dogma (genome, transcriptome, proteome and metabolome) to thoroughly understand the cause
• This can lead to exploitation of the disease cause to treat other conditions

47
Q

What is a disease example as to why it is important to employ an integrated -omics approach when looking at genetically influenced diseases?

A

o For example, CFTR in cystic fibrosis is an example of how multiple genetic mutations lead to different effects at the structure/function level and can affect the severity of the disease
 Most common mutation is ΔPhe508 resulting in defective ER trafficking/processing, incorrect folding and proteolytic degradation

48
Q

What is an example of how elucidating the biological cause of a disease that is genetically influence can lead to exploitation of the disease cause to treat other conditions?

A

o For example, inhibition of DHODH (which is responsible for Miller’s syndrome) can be used to treat myeloid malignancies (cancer therapy)

49
Q

What is transcriptomics and why is it useful?

A

o DNA encodes for RNA
o If the genome sequence is known, can monitor gene expression
o Transcriptomics is the systematic analysis of gene expression
o Can be studied at a cellular, tissue or whole organism level
o The coding RNA molecules need to be translated

50
Q

What is the aim of proteomics and why is it useful?

A

o Allows to see the global consequences of changes in protein abundance including changes to other proteins, compensatory effects and spatial effects
o Proteins are synthesised from RNA by translation
o Aim to identify and quantify all the proteins
o Protein abundance is controlled by many factors
o The proteome is spatially and temporally dynamic

51
Q

Why is metabolomics useful?

A

• Metabolomics
o Allows to see the functional consequences of changes in protein abundance include depletion of the product and accumulation of the substrate

52
Q

What kind of data is handled in systems biology?

A
  • Resequencing projects
  • De novo genome sequencing
  • SNPs, GWAS
  • Transcriptome sequencing/profiling
  • Metagenomics
  • ChIP-SEQ and RNA-SEQ
53
Q

What is a crucial tenant in systems biology, especially when applying systems biology to look at disease causes and treatments?

A

• Integrating large-scale data is extremely important
o Cannot rely on a single step of the central dogma for evidence- need a global view
 For example, simply performing genomics is not enough
o It is crucial to determine how such mutations can cause disease
• Genomics, proteomics, metabolomics and functional assays should be used together to understand disease and provide strategies for interventions
o Miller syndrome is such an example

54
Q

What does personalised medicine mean?

A

• Medical treatment could be tailored to the individual

o Therapies should be tailored to an individual

55
Q

Why is personalised medicine needed? Give examples

A

o One size fits all no longer works in medicine
o Alignment of health and disease is inherently personal
 There is a need to understand health before an understanding of disease can be reached
o The characterisation of tumours could facilitate the selection of appropriate therapies and increase success rate within the population
o The characterisation of an individual’s genome could lead to prophylactic (preventative) therapies
 Better prediction of disease offers hope for earlier intervention
o Understanding health requires determination of the baseline molecular profile of an individual
 Currently, we set arbitrary values for diagnostics to provide a yes/no answer- irrelevant to individual baseline and based on population mean
 E.g. prediction of prostate cancer based on prostate-specific antigen (PSA) (proteome as a diagnostic tool) is largely inaccurate as the threshold is fixed as a population- instead, should know the baseline PSA of an individual before setting the risk threshold
• Can lead to harmful effects- prostate biopsy is an invasive procedure that many do not need
o Molecular profiling would assist in providing the right therapy, giving higher success rates than where many patients are given a generic therapy that may not work for them
 Based on predictors

56
Q

Compare current medicine to future/personalised medicine in terms of:

  • Focus
  • Action type
  • Measurements
  • Frequency
  • Target group
A

Current medicine-
 Focused on illness and disease without reference to baseline health
 Reactive
 Measures very few things and often with poor accuracy
• High error margins
• Thresholds are insensitive and don’t take into account your baseline -omics
 Infrequent
 Population-based
• Mean based on population

Personalised/future medicine
 Focused on health and deviations from health
 Predictive of disease based on baseline and current -omics data
• Allows for early treatment and hence higher chance of treatment success
 Measures many things with high accuracy
 Frequent/consistent
• Profile the -omes of a person over time by providing sample every time a doctor is visited
 Individual-based
• Mean based on the individual

57
Q

What health data points can currently be obtained on an individual?

A
	TeleHealth
	Phenome
	Social media
	Epigenome
	Transcriptome
	iPS cells
	Single cell
	Transactional
	Proteome 
	Genome
58
Q

What is the benefit of wearables in generating personalized day-to-day data and what can this data be used for?

A

• Generation of personalized data
o There are many levels at which personalized data are currently generated
o Common wearable technologies (such as iPhones, Fitbits…) collect personalised data that can be used in many ways, including monitoring health
 These technologies form the basis for research grade and clinical grade wearables
o All the data, if collected appropriately, can be used as a predictive tool regarding an individual’s health
 If an individual’s baseline is known, then perturbations can immediately be seen

59
Q

Where can wearables monitoring health be used?

A
o	Wearables are already being used to assist disease diagnosis-
	Locations-
•	In hospitals
•	In-home
•	In remote/rural areas
•	In low resource area
60
Q

What sources can wearables monitoring health send data to?

A

 Wearables can send data over the internet/other sources to
• Healthcare practitioners
• Telehealth
• Artificial intelligence

61
Q

What personalized data can wearables monitoring health collect?

A

 Data can regard
• Metabolic, cardiovascular and gastrointestinal health
• Sleep, neurology, mental health, and movement disorders
• Maternal, pre- and neonatal care
• Pulmonary health and environmental exposures

62
Q

What challenges and limitations do wearables monitoring personalised health face?

A
	Challenges and limitations include
•	Accuracy
•	Privacy and security
•	Oversight
•	Scientific peer-reviewed evidence for safety and efficacy in healthcare
•	Accessibility
•	Cost
•	Compatibility
•	Acceptability
•	Interpretation
•	Technological form factor
•	Lack of standards
63
Q

What wearable device sensors are used for lyme disease and what does each measure?

A
  • Resting heart rate (photoplethysmography)
  • Skin temperature (thermopile)
  • SpO2 (pulse oximeter)
64
Q

What ML algorithm is used for Lyme disease to analyse the output of wearable device sensors?

A

Peak detection; logistic regression

65
Q

What wearable device sensors are used for respiratory viral infection and what does each measure?

A
  • Resting heart rate (photoplethysmography)

- Skin temperature (thermopile)

66
Q

What ML algorithm is used for respiratory viral infection to analyse the output of wearable device sensors?

A

Sliding window peak detection; logistic regression

67
Q

What wearable device sensors are used for insulin resistance and what does each measure?

A
  • Diurnal heart rate difference (photoplethysmography)

- Physical activity (accelerometer)

68
Q

What ML algorithm is used for insulin resistance to analyse the output of wearable device sensors?

A

Multiple regression

69
Q

What wearable device sensors are used for atrial fibrillation and what does each measure?

A
  • Heart rate (AliveCor ECG)

- Heart rate (photoplethysmography

70
Q

What ML algorithm is used for atrial fibrillation to analyse the output of wearable device sensors?

A

Deep neural network

71
Q

Which molecular approaches and -omics contribute to integrative personal omics profiles? Describe what sample each -ome comes from.

A
o	Cells/tissues- 
	Genome 
	Epigenome
	Transcriptome
	Proteome
	Metabolome
o	Body fluids (saliva, serum, plasma, urine)
	Proteome
	Metabolome
	Autoantibodyome
	Microbiome
	Envirome/exposome
o	Body surface and waste (nasal cavity, skin, feces)
	Microbiome
	Envirome/exposome
72
Q

Describe the impact of the microbiome on human health

A
  • The microbiome can alter human health almost completely without reference to the genome
  • Diet influences the microbiome-if microbiome is pushed in a certain direction due to an individual’s diet that puts a selective pressure on certain bacteria and hence certain waste products, it can predispose that individual to certain diseases
73
Q

What is the envirome/exposome and how is it measured?

A

• Use of environmental detectors (wearables)

o Detects the air and your exposure to the environment

74
Q

What categories are relevant exposures for inclusion in exposome studies? Describe them

A

External
Meteorology-Climate change, temperature, humidity, wind, atmospheric pressure
Outdoor exposures-NO2, SO2, CO, O3, volatile organic compounds, particulate matter, radiation, UV, traffic, pollen
Built environment-Population density, building density, facilities, green space, walkability, neighbourhood safety, accessibility to resources, noise
Home environment-Volatile organic compounds, particulate matter, NO2, CO, aldehydes, metals, plasticizers, dust, pets, pests, allergen , mold, fungi, microbes, endotoxin
Personal behaviour -Diets, physical activity, tobacco smoke, alcohol, drugs, sleep, sex, cosmetics
Social economic factors-Social factors, education, economy, psychological and mental stress
Food and water contaminants-Fertilizers, metals, pesticides, plasticizers, water disinfection by-products, polychlorinated biphenyl, flame retardants
Medications-Medicines, surgeries
Occupational exposures-Chemicals, dust, metals, virus, animal proteins, plants, heat/cold stress

Internal
Primary external exposures and associated metabolites, epigenetic (e.g. methylations, histone modifications), microbiome/metabolome/proteome/transcriptome/genome changes etc.

75
Q

Is an individual’s integrative personal omics profile spatially and temporally static? Why/why not? Give examples

A

• The integrative personal omics profile has both spatial and temporal aspect
o Spatial- each different component/level contains different data
o Temporal- each -omics profile can evolve over time. Health and disease are temporal. Need to know what people look like in health to be able to define what they look like in disease
 But the genome is static and predictive
 The transcriptome, proteome and metabolome are dynamic and reflective

76
Q

Are health and disease states static or temporal?

A

Health and disease are temporal. Need to know what people look like in health to be able to define what they look like in disease

77
Q

Why do -omics profiles rarely correlate with each other? What is the importance of this?

A

• Important to test across time (in health and when symptomatic) to predict disease earlier
 -Omics profiles rarely correlate when a snapshot is taken (one point measurement) due to the time displacement between each step -omics generation (e.g. the time between gene expression (genomics) and production of a protein profile (proteomics) in the body)
• -Omics are also affected temporally. EXTREMELY IMPORANT TO CONSIDER -OMICS PROFILES AS TEMPORALLY DEPENDENT
 If tests are performed when symptoms are apparent, it is often too late. However, if tests are performed across time, can often find susceptibilities which can be treated with higher success

78
Q

What is the Snyderome and how was it achieved? What was found from this Snyderome and what does this demonstrate?

A

o The Snyderome (Professor Snyder)
 Monitored himself over 726 days with a variety of standard and -omics based assays
• Took urine and blood everyday
 Took serum and peripheral blood mononuclear cells (for DNA sequencing and RNA transcript analysis) which, through analyses, all resulted in integrated personal omics
 Undertook a variety of phenotypic assays and lifestyle changes (increased exercise…)
 Found a high number of loss of function SNPs in his genome, which are very important
• Found that he had genetic variants that predisposed him to hyperglycaemia and diabetes (and was later diagnosed as a type II diabetic)
o His risk factors were temporally affected- his risk factor of diseases increased during the two years
• Would look at his transcriptomic and know that he would get a cold BEFORE he displayed symptoms
–Important to study the -omics over time as they are temporally affected

79
Q

Describe what -ome techniques can be performed from peripheral blood mononuclear cells (from the Snyderome)

A

o Whole genome sequencing
o Whole transcriptome sequencing (mRNA and miRNA)
o Proteome profiling

80
Q

Describe what -ome techniques can be performed from serum (from the Snyderome)

A
o	Untargeted proteome profiling
o	Targeted proteome profiling (cytokines)
o	Metabolome profiling
o	Autoantibodyome profiling
o	Medical/lab tests
81
Q

What data can be pulled from whole genome sequencing?

A

 Variant calling/phasing

82
Q

What data can be pulled from whole transcriptome sequencing?

A

 Variant calling/phasing
 Heteroallelic and variant expression
 RNA editing
 Quantitative differential expression and dynamics
 Variant confirmation in RNA and protein

83
Q

What data can be pulled from proteome profiling?

A

 Quantitative differential expression and dynamics

 Variant confirmation in RNA and protein

84
Q

What data can be pulled from untargeted proteome profiling?

A

 Quantitative differential expression and dynamics

85
Q

What data can be pulled from proteome profiling (cytokines)?

A

 Quantitative expression

86
Q

What data can be pulled from metabolome profiling?

A

 Dynamics

87
Q

What data can be pulled from autoantibodyome profiling?

A

 Differential reactivity

88
Q

What data can be pulled from medical/lab tests?

A

 Glucose, HbA1cc, CRP, telomere length

89
Q

Are all proteins produced by genomic ORFs of known function? Are they constant within a species?

A

• There are many coding regions (ORFs) in genomes that produce proteins with an unknown function (50% +)
o Some genes in these bacteria are specific within types of species

90
Q

What are the two main potential issues when discovering new genetic ORFs that code for a protein?

A

o Genome sequencing reveals new proteins of unknown functions with no homologs in other species-
 Genome sequencing reveals more homologs of proteins with unknown function (conservation of proteins with no known function suggests importance)
 DUFs- Domains of Unknown Function
o Genome sequencing reveals protein homologous to known proteins in other species but without any context
 Orphan enzymes- enzymes that are usually part of a greater part found isolated
• Indicates that the pathway these enzymes are part of are conducted differently in different organisms

91
Q

What do orphan genes in genomics indicate?

A

• Standard classical biochemical cycles are not necessarily the same in each organism
o Some organisms miss enzymes at crucial steps within supposed universal pathways
o Functions have evolved independently in many cases- as classical biochemical pathways are not conserved

92
Q

What do protein sequences dictate and what can be infered from this information?

A
  • Protein sequences can tell us varying amounts of information about protein function- very few entries in the databases have been functionally proven
  • Protein sequences dictate tertiary structure and protein functions
  • The sequence of an unknown protein can be used to interrogate databases for similar proteins of known function (obtained via experimental, not computational, means)
  • Sequence information does not always imply structural equality
93
Q

How is which proteins with unknown function should be prioritized for further study decided?

A

• Can apply:
o Expression tools (transcriptomics, proteomics)
 Interactions of the protein with other known proteins/levels at the central dogma
o Computational tools (predictions regarding the unknown protein functions)
o Structural tools (predictions regarding the unknown protein tertiary structure)
o Phenotypic and molecular tools
o Genomic evidence: the genes of the unknown protein
 Gene clustering
 Gene fusion
 Shared regulatory sites
 Phylogenetic occurrence
o Post-genomic evidence: the transcript and protein of the unknown protein
 Co-expression
 Protein-Protein interactions
 Organelle proteomes
 Essentially and other phenome data
 Structures

94
Q

What are computational approaches/databases to search in order to elucidate more information about an unknown protein?

A

o BLAST searchers
o PROSITE-Finding shared motifs
o Structural predictions/genomics
o GRAVY (Grand Average Hydropathy)
o Kyte-Doolittle Plots (protein topology and hydrophobicity)
o PSORT-Protein Subcellular Location Predictor

95
Q

Describe why structural predictions/genomics is a useful analysis to perform on an unknown protein

A

 Many of the predicted ORF’s have unknown functions
 Structure can give insight to function
 Structural genomics aims to systematically solve the structure of every protein in a genome

96
Q

What is the use of modelling protein structures?

A

 The use of modelling protein structures
• Use protein sequence to predict relationships to structures already present within the database
• Predict function for the many thousands of FUN proteins within genome sequence databases
• Clue to structure if impossible to obtain experimentally
• Predict effects of point mutations on known structures
• Model drug/protein interactions for many thousands of compounds- virtual screening
o Drug docking experiments

97
Q

What is the GRAVY database and what information does it correlate with?

A

o GRAVY (Grand Average Hydropathy)- provides a value based on the overall solublity of a predicted protein sequence
 Can combine with transmembrane regions (TMR) analysis
• Plot GRAVY values (ProtParam) and predicted TMR (TmPred) for predicted protein sequences:
o Hydrophobicity (GRAVY; X-axis) tends to rise with increased TMR
o Majority are hydrophilic as they are intracellular
o There are analytical challenges of hydrophobic (insoluble proteins)

98
Q

What does BLAST stand for?

A

o BLAST- Basic Local Alignment Search Tool

99
Q

What is a BLAST search, how does it function and what is its limitation?

A

o Searches unknown amino acid sequences (generally translated from DNA) against protein sequence databases (many entries)
 Takes query sequence and matches it against an entry in the database
o Scores ranked based on number and order of identical and similar sequences
o Be aware that although there might be an identity match, the function of the protein needs to be EXPERIMENTALLY functionally determined (as it is also quite likely that the protein matched in the database hasn’t had a functional determination either

100
Q

What protein parameters are used to compare protein sequences to each other

A
•	Protein parameters for searches (shared physicochemical properties of amino acids
o	Hydrophobicity
o	Positively charged 
o	Negatively charged
o	Polar
o	Charged
o	Small
o	Tiny
o	Aromatic
o	Aliphatic
o	Van der Waals volume
101
Q

Define the term identity when used to compare amino acid sequences. Give an example of how it would be written

A

• Identity between amino acid sequences- a calculation based on the number or amount of conserved (i.e. identical) residues shared between two (or more) amino acid sequences
o Often expressed as protein X and protein Y share 62.5% sequence identity

102
Q

Define the term similarity when used to compare amino acid sequences.

A

• Similarity- a calculation based on the number of not only identical amino acids, but also similar amino acids (e.g. small, polar, non-polar substitutions)

103
Q

Define the term homology when used to compare amino acid sequences.

A

• Homology- at the sequence level, 2 proteins with a high degree of sequence similarity/identity are said to be highly homologous- strictly, homologous proteins descend from a common ancestor

104
Q

What is a motif?

A

o A motif is a short (5-80 amino acid residues) stretch of sequence that is characteristic of a particular function or signal

105
Q

What information can a motif give about a protein?

A

• Domains and motifs give us clues about broad functional capability
o A motif might signify several sites, which might indicate function
 A protease recognition site
 An enzyme activity
 A binding site for a substrate or ligand
 A site for post-translational modification
o Likely discoveries of sequence motifs
 A clue to structure
 A clue to function
• Enzyme catalytic sites
• Family relationships
• Prosthetic group attachment sites (e.g. biotin, heme, etc.)
• Binding of metal ions (e.g. Ca2+)
• Sites for disulfide bond formation (cysteine)
• Sites for protein-protein, protein-nucleic acid or protein-ligand binding
o Computational analysis of potential motifs must always be verified experimentally as there are a lot of exceptions to motifs depending on the motif

106
Q

What is the PROSITE database and what information does it contain?

A

o Most widely used prediction tool for motifs
o Release August 2020 contains:
 1860 documentation entries
 1311 different conserved patterns/rules
o Motifs of lengths 4-80 are present, most common between 10-14 amino acids in length
o SWISS-PROT/UniProtKB: currently contains 563, 082 manually annotated proteins and more than 188 million unreviewed entries

107
Q
Identify PROSITE syntax, including:
-
character
X
{...}
[...]
(n)
(n,m)
A

 Each position in the motif is separated by a hyphen
 One character denotes a specified residue that is required at that position
 X is wild card- match any amino acid
 {…} denotes a set of disallowed residues (these residues break the rule)
 […] denotes a set of allowed residues
 (n) denotes a repeat of n
 (n,m) denotes a repeat of between n and m inclusive

108
Q

What are limitations of using motifs for analysis of proteins of unknown function?

A

o Limitations that a motif might need to satisfy
 Some motifs may not be viable in particular positions due to folding or other constraints (e.g. glycosaminoglycan attachment site (S-G-x-G) has two acidic amino acids (Glu or Asp) from -2 to -4 relative to the serine)
 High frequency (especially short) motifs have low confidence- need to characterise them functionally through experiments
 Motif components may not be order dependent
 Proof is never absolute until have functional characterisation

109
Q

How can a motif be manually and computationally recognised?

A

o Pattern extraction- recognising motif
 Recognise shared identical amino acids within the sequences between sequences- these are most likely the motif
• Short sequences will be too general, but long sequences may be too specific
 Large-scale data analysis enables pattern extraction