QBIO2001 Flashcards
Big data
What are biomarkers?
Data signatures that are diagnostic of different people’s signatures
What is GWAS?
Genome Wide Association Studies
A genome-wide association study is an approach that involves rapidly scanning markers across the complete sets of DNA, or genomes, of many people to find genetic variations associated with a particular disease
Why are some diseases not detectable by SNPs?
• The inability to detect some disease through SNPs is because disease is the result of an interaction between genes and the environment
What is something in gene analysis that should be done in the future?
• Gene analysis has no temporal analysis, but we need to study genes, diseases and the environment comprehensively and dynamically
• Studying the system when it is perturbed is useful
o Weaknesses of the system are exposed
o Connections between system and environment are figured out
Why are humans unreliable test subjects? What is the solution to this?
they can lie and not go through the treatment properly
• This is why mice and animals are used
What is an example of an experiment that had to be done with mice, and not humans due to the flaws of human experiments? What were problems with this study?
• For example, diet studies
o Mice on high fat western diet had:
Increase in anxiety, short term memory and laziness
Weak bone structure
High blood glucose
o Mice with calorie restriction
Lived longer
• However, the diet study is unreliable as all mice are genetically identical and belong to the same mouse strain
o Doesn’t take genetic diversity into account
o Calorie restriction can be good but can also be bad depending on genetics, which is why the previous experiment doesn’t work
o Fat, glucose and insulin response also depends on genes
What are the three major sub-species of laboratory mice?
o Laboratory mice are derived from three major sub-species
Musculus Domesticus
Musculus Musculus
Musculus Castaneus
How is genetic diversity introduced in lab mice and why?
o Original population of mice are largely genetically different
o Get collection of strains by crossbreeding the populations
o This is useful to induce genetic diversity in environmental experiments
o See gene + environment result
What is the difference in what has to be studied in monogenic vs polygenic diseases?
Monogenic vs polygenic diseases
• Monogenic- can look at a SNP and say with high probability that there will be the disease
• Polygenic- have to look at both disease and environment
What are the different types of networks and a bit about them?
• Cell signaling networks
o Phosphorylation
o Kinase has to recognize substrate and bind to substrate
• Transcriptional networks
o Which gene is regulating expression of which gene
o Genes regulate each other and themselves
o All different transcripts change expression of other transcripts
o Transcription factors have to interact with each other to form protein complexes
o Gene regulatory networks
o Gene regulatory circuitry
• Protein-protein interactions networks
o Proteins interact together to function
• Metabolic networks
o Looking at metabolites
• And more
Talk about the insulin cell signaling network
• Cell signaling network- insulin
o Insulin receptor
Recognize and allow insulin to bind, which triggers signaling cascade
o IRS-1 will phosphorylate different kinases (Mik-1, Mik-2, Erk)
o The kinases phosphorylate the substrates by recognizing them by motifs
o Kinases eventually control expression of the genes
o GLUT-4: vesicle that translocate through the membrane and brings glucose from the surface into the cell
If that pathway is broken your cell is insulin insensitive
What are transcription factors and what can they do?
o Transcription factors are proteins that recognize DNA sequences and bind to specific DNA sequences called motifs
Allow the cell to differentiate
What could transcription factors be used for in medicine?
Embryonic stem cells can differentiate into different cell types- all kinds of them
• Could be used to generate tissues
• Can be studied by using Chip sequencing
How are protein-protein interaction networks looked at?
o Physical interaction networks
Multiple proteins come together and physically attach each other
o Cross-link different proteins
o Measured by using mass spectrometry
How are metabolic networks organised?
o Organized by concept of functions of cells and the metabolites that contribute to the function
What is DNA sequencing?
Sequence DNA of an animal/plant
What is transcriptome/RNA sequencing?
Measure the letters from mRNA and know what the mRNA is floating in the cell and how many copies there are- can translate that expression to how high the gene expression is
o Can measure transcriptomes- transcription level of different genes
o What combinations of genes are expressed in different cell types
o Can be expressed at different (cell specific genes) or similar level (housekeeping genes- genes that are conserved in all cell types) in different cell types
What is ChIP sequencing and how is it done?
o Measure the DNA sequence that a transcription factor binds to
o Know where transcription factor is binding- what motifs of DNA it binds to
o Transcription factor will regulate the gene to which it binds to
Transcription factors are protein
o If re-align sequence back into genome can see exactly where the transcription factor binds to
o There will be background noise in sequencing experiments- signal is coming from very sharp peaks
o There is
Histogram of gene expression
Accessible Chromatin
Histone modifications
Transcription factor binding
What is a pathway database?
o Pathway Database: Computerize current knowledge of molecular and cellular biology in terms of the pathway of interacting molecules or genes
What is a genes database?
o Genes Database: Maintain gene catalogs of all sequenced organisms and link each gene product to a pathway component
What is a ligand database?
o Ligand Database: Organize a database of all chemical compounds in living cells and link each compound to a pathway component
What are pathway tools?
o Pathway Tools: Developed new bioinformatics technologies for functional genomics, such as pathway comparison, pathway reconstruction and pathway design
What is clustering a typical procedure for?
Creating regulatory networks
How can you study protein phosphorylation?
Have control cells and test cells
Mix lysates 1:1
Enzymatic digestion
• Break proteins up so can be passed through the mass spectrometer
Enrichment of phosphorylated peptides
nanoLC-MS/MS analysis
Lots of computation
Signalling dynamics graph
Do clustering
Once have all phosphorylation sites measured, partition them into different patterns
K-mean clustering used to partition different phosphorylation sites into different clusters which have different patterns
What does clustering do and what is it often used for?
• Partitions samples that have similar patterns into same groups- grouping technique
• Methods of grouping samples (x) that are similar- according to some pre-defined criteria: how do you measure similarity?
• A form of unsupervised learning- no label information (y) is used to tell the algorithm which observations should be grouped together
o Algorithm puts stuff in cluster
• It is often used for exploratory data analysis- a way of looking for patterns or structure in the data that are of interest
So what is the aim of clustering?
To group observations that are similar based on predefined criteria
What are the issues of clustering?
o Data types- counts, ratio, ordinal, categorical and continuous
Similarity depends on data types
o Missing data
Replace with appropriate number or remove instances
o Scaling
o (Dis)similarity metric
Person correlation, Spearman correlation, Euclidean, Manhattan…
What is a metric?
A metric is a measure of the similarity or dissimilarity between two data objects and it’s used to form data points into clusters
What are the two main classes of distance?
o Correlation coefficients (compares shape of expression curves)
o Distance metrics
Manhattan distance
Euclidean distance
What is genetic useful for?
• Genetics provides an unbiased tool for discovering factors that trigger or modify disease, without any prior knowledge of the nature or mechanism
If a disease is developed in less than a year, what are the most prominent causes of death? Is it genetic or environmental?
• If developed in less than a year, most probably genetic cause of death even if there is environmental influence
o Peritenal and conginetal most prominent disease
If a disease is developed from ages 1-44, what are the most prominent causes of death? Is it genetic or environmental?
• If get it from ages 1-44, most probably environmental cause of death even if there is genetic influence
o Suicide is most prominent disease
If a disease is developed from 45-95+, what are the most prominent causes of death? Is it genetic or environmental?
• If get it from ages 45-95+, both environmental causes and genetic causes of death.
o Heart failure and disease for males
o Dementia and alzheimer’s for female
o Cancer
What are the 3 classes of diseases?
Monogenic
- High penetrance
- Rare
- Entirely genetic
- Population genetics are used (GWAS)
Monogenic variable
- Low penetrance
- Variable expression
- Rare frequency and large effect size
- Mostly genetic with some environment
- Target population studies genetics (GWAS and/or DNA sequencing) used
Polygenic
- Common
- Mostly environment with some genetic
- Population genetics are used (GWAS)
How do we perform classical mendelian linkage analysis in complex diseases?
- Identify the inheritance and chromosome- in 80’s, 90’s
a. Autosomal, x linked, dominant, recessive, mitochondrial - Use polymorphic markers (SNP, restriction map, microsatellites) to narrow down the region
a. Restriction map- take DNA, cut with enzymes
b. Microsatellites, analyze with PCR - Confirm with Sanger sequencing
How does Sanger sequencing work?
a. Synthesise DNA in vitro
b. Chain termination method: lacks 3’OH group and therefore can’t form phosphodiester bonds, preventing DNA polymerase from continuing. Only add one ddNTP to each reaction. Then heat, denature and separate on gel (polyacrylamide)
i. Chain termination for each nucleotide
c. In gels, bigger is slower and smallest is slowest
d. Super accurate but slow
Talk about two mutations in which mapping and sequencing helped identify their causes
• Achondroplasia (Dwarfism)
o Mapping and sequencing
o Mutation in FGFR3
o Find people of different families and see if they have the same mutation
Older fathers make more mutations
o 80% of cases are sporadic mutations because of age of father
• Pain- mutations in Nav1.7 causing congenital insensitivity to pain have lead to the development of new pain killers targeting this channel
o Dominant- erythromelalgia
Burning in hands and feet
o Recessive- congenital insensitivity to pain
Feel no pain
What are examples of important life factors/diseases with a genetic basis of disease?
• Human height
o 70-80% genetic
o Only identify 20% of genetics that control height if don’t have mendelian disease
• Obesity
o 50% genetics, 50% environment
o Tissues expression of genes at GWAS loci compared to random gene sets
o Obesity/BMI genes are in the brain
o Variant of KSR2- produce more insulin after they eat, eat more
Affects metabolism
• Mental illness
• Lifespan
What is a polygenic disease and what are polygenic contributions to disease?
o Relatively common genetic changes common variant- common disease model
o Rare genetic changes in the multiple rare variants- common disease model
o Clearly more common that monogenic obesity (that is, there is a genetic predisposition to obesity that has a complex genetic architecture)
o Gene combinations, or “epistastatic” interactions?
o Need to think about gene-environment interactions
What are the aims of the HapMap project?
o Provide insight into patterns of genetic variation in the human population
o Guide design and analysis of medical genetic studies
o Increase power and efficiency of association studies to medical traits
What is the HapMap project?
- Public resource
- Catalogue of common genetic variants that occur in humans
- Genetic data from 4 populations (n=270) with African, Asian and European ancestry
- Millions of SNPs identified
How many alleles do SNPs have?
2
What is a haplotype?
o Haplotype: a set of SNPs along a chromosome
What does association mean?
o Any relationship between two measured quantities that renders them statistically dependent
What is an Odds Ratio?
o An odds ratio is a measure of association between an exposure and an outcome. The odds ratio represents the odds that an outcome will occur given a particular exposure, compared to the odds of the outcome occurring in the absence of that exposure.
o Odds ratios are most commonly used in case control-studies, however they can also be used in cross-sectional and cohort study designs as well
How can you find and order your GWAS results?
o Large case/control population
o Blood samples
o Genotype common variants via SNP chip( near 15 millions SNPs)
o Population analysis
P values at 10-8 are significant for genome-wide significance
o Whole genome disease loci (Manhattan plot)
Can tell us how obesity works
How does whole genome/exome sequencing work?
• Genomic library • Construct shotgun library • Break out DNA in fragments • Hybridization • Pulldown o Wash those that don’t code for protein • Captured DNA • DNA sequencing • Mapping, alignment, variant calling • For genome o Don’t hybridise go straight for DNA sequencing