12. Tools for complex traits Flashcards
Explain what are complex traits
Complex traits - traits controlled by multiple genes and the environment
Can have:
- binary phenotype - disease / no disease
- continuous phenotype - height
What are the causes a phenotype?
P = G + E
Trait phenoype is a result of interaction betweent he genetic and environmental determinants for the trait
What is the genetic architecture of a trait?
Trait genetic architecture - the network of number and action of causal variants for a trait
Are all traits P=G+E at 50/50?
No, different phenotypes have different divisions between G adn E -ex:
- Huntington’s - 100% G
- Drug addiction - 50/50% G and E
- Malaria - complicated interaction can’t be easily decided
What are the genetic markers used for finding trait genes in genomes?
Single nucleotide polymorphisms (SNPs) in genomes - nucleotide variation at the posiition - easy to detect / analyse
How are SNPs passed on parentally?
For all loci two alleles on two homologous chromosomes - at an SNP can be homozygous / heterozygous
Explain what types of loci are there in terms of their alleles
Loci for a specific population can be:
- monomorphic - one allele at the locus in the population
- polymorphic - 2+ alleles segregating at the locus
What could be the sequence options for a polymorphic marker?
A polymorphic marker couls be AA, AB, BB - ex: a marker which is polymorphic for G and T -> GG, GT, TT (?? why no TG - maybe because considering only one strain?)
What could be the types of traits?
Types of traits:
- recessive / dominant
- autosomal / sex-linked
What are the trait aspects investigated if the trait is considered to be a Mendelian trait?
- observed pattern of inheritance
- recessive / dominant
- autosomal / sex-linked
What is the inheritance pattern of autosomal recessive, autosomal dominant, and sex-linked recessive genes?
Define what is heritability
Heritability - proportion of phenotypic variation that can be attributed to differences in genetic factors
How is heritability estimated?
Heritability is traditionally estimated from relatives: phenotypic similarity - known relationships -> known level of heritability - how much of phenotypic similarity explained by genetic similarity
A range 0-1:
- if 0 - completely environmentally determined phenotype
- if 1 - completely genetically determined phenotype
What can be used to measure if theoretical heritability estimates are correct?
Genomics - SNPs markers can be used to measure true heritability for specific traits
Both parent-offspring and full-sib heritability estimates are 0.5, for which pair markers are expected to vary more?
Between full-sibs will vary more than parent-offspring because the child will definitely share all their alleles with their parents but not necessarily with the siblings if they inherited different alleles
How are markers chosen in case-control association studies?
Markers must be chosen in case -control association studies - too expensive / time-consuming to look at all genome - use haplotype blocks instead of single markers - see their LD / diversity within the block in disease vs healthy group
Usually looked at haplotype blocks - tagging SNPs - SNP panels chosen - identify genes around a causal variant
How can genomic data be used to predict relationships?
Genomics can be used to predict relationships - based on SNP similarity scores
- no longer need pedigrees
- need to account for other phenotypic causes - ex: age, sex
Summary 1
What is a case-control association study?
Case-control association study - don’t know which genes/variants involved - see what segregates with the disease - need observable markers - SNPs:
- diseased
- healthy
If there is difference in allele frequencies between groups - locus has sufficiently large effect on the disease
Ex: disease caused by recessive mutant D allele - H allele healthy -n found alle freq disequilibrium in diseased group
How are markers in case-control association studies chosen?
Markers must be chosen in case -control association studies - too expensive / time-consuming to look at all genome - use haplotype blocks instead of single markers - see their LD in disease vs healthy group
What is linkage disequlibrium?
Linkage disequilibrium (LD) - occurrence of conbinations of alleles at two loci more often than expected - non-random association of alleles at two or more loci
What are tagging SNPs?
Tagging SNPs - a subset that explains majority of variation
Distributed through the genome - represent haplotype blocks
What are SNP panels?
SNP panels (chips) - pre-determined SNP variants - 700,000 in human genome - selected as tagging SNPs
Why are surrounding genes informative when identifying a disease-causing allele?
When mutation occurs - variant segregates with neighbouring genes - over generations the haplotype piece decreases in size but surrounding genes likely to stay - low haplotype diversity around the causal variant
What are important aspects of samples in case control association studies which could influence the results?
Important to precisely choose diseased - healthy controls with similar backgrounds to decrease background variation as much as pssible - ex: age, sex, geographic location, ethnicity - otherwise additional variation will be introduced which will be detected but not contributing to disease
How is the significance of identified association assessed?
Statistical analysis - test for independence - logistic regression allows other explanatory variables to be fitted - choose best candidate
Disease status - complex - hard to find single causal allele - biuld models of risk
Explain what is GWAS
Genome-wide association studies (GWAS) - an approach to compare genomes from different people to find genetic markers associated with a particular phenotype / disease
How is the effect of an allele on the disease represented in GWAS?
Known B to be disease-causing
What are the important aspects to consider when performing GWAS?
- Stringent signifincance thresholds to prevent false positives - identifying association when there’s none
- Carefully choosing controls
How are GWAS loci results presented?
Manhattan plot - the higher the peak of a particular locus - the higher significance (association with the phenotype) - low are non-significant loci for this trait
Summary 2
What are the limitations of GWAS?
GWAS limitations:
- identified SNPs not causal - only in LD with the causal - because LD regions contain many genes
- associated SNPs are non-coding regions - in introns - could be regulatory or in LD with coding SNP at a distance
- identified variants are usually associated with risk rather than causation - disease complex genetic architecture - many genes with small effects
What are polygenic traits?
Polygenic traits - phenotypes influenced by several genes
Contribution of each chromosome to the disease - depends on the length - the longer the more genes can accommodate
What greatly influences the significance of GWAS results?
Sample size - the more involved - the more significant are the results - stronger associations found - higher support for polygenic traits
Is GWAS sufficient on its own to infer prediction for disease?
No - after identifying loci / genes in GWAS - see their function, build models of function - in what pathways / mechanisms they act to see if it could be causing the disease
Give an example disease which was uncovered using GWAS and subsequent functional studies
Urate level regulation - gout - urate accummulation - theories:
- level regulation (poor excretion): export into urine or re-uptake into blood -> GWAS found LD in 8 transporters
- excess synthesis: no association found in GWAS => regulation has greater role than synthesis
Summary 3
Give an example of a recent application of GWAS for personalised medicine
Some people experience severe covid - genetic determination of the phenotype - loci involved suggest drug targets
For GWAS sequenced severe covid patients + controls before covid exposure from UK Biobank - identified 4 regions in LD => identified genes investigated - involved in viral defense + mediators of inlammatory organ damage + low expression of IFNAR2, high expression of TYK2, CCR2 - drug with the same target already developed => baricitinib for arthritis downregulates TYK2 => reduces severe covid mortaloity by 20%
Explain what is precision / personalised medicine
Precision / personalised medicine - used genetic information in combination with lifestyle and environment to offer the best treatment approach
Genome-guided treatment - pharmacogenomics
P = G + E
What is essential in effective precision medicine?
Effective genetic diagnostic tests and risk prediction from results
What kind of genetic diagnostic tests are used in personalised medicine?
Genetic diagnostic tests:
- single gene disorders - ex cystic fibrosis - single autosomal recessive CFTR gene BUT there are 2000 variants of mutation in CFTR that cause - test for 20 most common - because there is so many disease variants - degree of severty varies between patients
- complex disorders
What kind of tests can be performed to forsee disease
- Test parents if carriers
- Prenatal testing
- Test newborn
Explain genome sequencing benefit in dealing with breast cancer
High heritability of BRCA1 and BRCA2 - with specific variants women have 50-80% chance of breast cancer - likelihood increases with age
Environment influences too but general population chance of breast cancer 12%
-> routine screening thorugh mammograms for patients with genetic BRCA1, BRCA1 profiles
When genome sequenced what is used to determine the risk of a patient for particular disease?
Polygenic risk score - sums the effects of alleles in the specific patient - predicts genetic components for phenotype - need to: estimate SNP effect + choose SNPs to be included
How are SNPs from GWAS chosen to be included in polygenic scores?
Only SNPs with significant peaks are included into polygenic scores - stingent threshold of significance - can set own p value level for signifcance to include more/less SNPs
What is the risk of choosing too many SNPs for calculating a polygenic risk score?
Additional SNPs included in polygenic risk scores could create noise - best threshold depends on genetic architecture of the specific disease for which the polygenic score is counted
How do the polygenic risks scores of breast cancer change with age?
What are the particular tumour markers?
What are the important aspects of genome-guided treatment
How is cystic fibrosis treated after considering the genomics behind each patient?
Personalised drug chosen for the specific carried mutation - not a general approach
How are cancers treated after considering the genomics behind each patient?
Specific targeted therapies for genetically caused cancers
Explain how genetics influences warfarin dosing
Warfarin - anticoagulant - used to slow down blood clotting - two genes known to be involved with sensitivity to warfarin:
- CYP2C9 - enzyme responsible for warfarin metabolism
- VKORC1 - warfarin drug target
Summary 4