Identifying Disease Genes Flashcards
What is heritability
This is the proportion of phenotypic variation that is attributable to genetic variation
Classically, in twins, heritability of a trait is twice the difference in the correlation between MZ and DZ
h2 = 2(r(MZ)-r(DZ))
R = correlation coefficient
Examples of complex and non-complex diseases
Complex - schizophrenia, stroke, type 2 diabetes, hypertension, depression, asthma
Purely genetic - DMD, phenylketonuria, downs syndrome
Purely environment - poisoning mesothelioma (lung cancer due to asbestos), car accident
Critique the use of twin studies
Twin studies compare the frequency of a phenotype in identical twins with non-identical twins (concordance rate)
Higher in identical = genetic
Problems:
Identical twins are not 100% genetically identical
MZ twins may have more similar environment than DZ
Twins may not be representative of the population
Twin registries may have recruitment bias
Critique the use of adoption studies
If adopted children have the same phenotype as biological parents and siblings , there’s a genetic effect
If adopted children have the same phenotype as adoptive parents and unrelated siblings there’s an environmental effect
Problems:
Recruitment bias, representation
Critique the use of segregation studies
Involve recruitment of large families where a trait is present in some members but not all
Studying inheritance patterns may identify classic mendelian patterns e.g. AR but not for complex disease
Problems
Phenocopies - same phenotype, different causes
Segregation analysis in diseases with no clear inheritance pattern is possible but it involves mathematics and programming
Critique the use of migration studies
If immigrants continue to have a disease regardless of where they live this suggests a genetic effect
If immigrants develop a new disease common in their new home this suggests an environmental effect
Problems
Immigrants often keep their cultural practices which influence their environment
Define and calculate population and sibling relative risk
In a population, we don’t know genetic risk nor the environmental risk. All we know is the frequency of disease (prevalence)
Population (n) with x people affected - thus prevalence is x/n
Sibling relative risk - The ratio of the frequency of disease in the sibling of an affected person compared to the rate in the general population, represented as λS
This is a measure of both genetic and environmental effects
λS = patience sibling risk of disease/population prevalence
E.g. 1/20 siblings have disease, prevalence = 1/200 thus λS = 10
High values = monogenetic disease
Lower values = complex, more common diseases
Explain what linkage analysis is
A form of mapping to identify SNP’s/microsatellites physically linked with a disease region - used for smaller mutations NOT chromosomal abnormalities.
LOD score >3 evidences linkage
Used in AD/X-linked disorders and AR disease
Used alongside autozygosity mapping in consanguineous/endogamous AR disease
Explain what autozygosity mapping is
Autozygosity = homozygosity in which the two alleles are identical by descent
Find regions of high homozygosity in affected individuals not found in unaffected individuals
Completed when attempting to identify disease-causing mutations in families with single gene disorders
Potentially harmful recessive alleles can be hidden from selection in the heterozygous individuals
Compare and contrast SNP and microsatellites as genetic markers
Compare:
Polymorphic
Randomly distributed across genes and genome
Contrast Microsatellites
Heterozygous/Polymorphic
PCR based identification with fluorescent primers - manual identification, labour intensive
Contrast SNP
Biallelic
More markers, closer together
Microarray based - red/green/yellow, automated
Define genotype and haplotype
Genotype = all of the genes in a given region /genome
Haplotype = a set of adjacent nucleotides on the same chromosome
Discuss the importance of recombination for genetic variation and how its related to linkage
Recombination allows for genetic variation, as it creates new haplotypes
Occurs in meiosis 1
At any particular loci there are 4 possible combinations - 50% chance of new haplotype
Syntenic (same chromosome) loci recombination at distant loci = no linkage = 0.5 recombination frequency
Syntenic loci recombination at close loci = linkage = <0.5 recombination frequency
Define linkage disequilibrium
Describes the relationship between alleles of two loci
Closer they are, the more likely they have linkage disequilibrium
Degree of equilibrium exceeds that of chance
Segments of DNA
that segregate together are said to be linked and can use the presence of one to expect the other
Define identical-by-state and identical-by-descent
IBS = identical genotype
IBD = identical genotype, however it comes from the same ancestor, surrounding SNP would be homozygous also
Why is autozygosity mapping effective in consanguineous/endogamous populations with recessive disorders
Autozygosity = homozygosity in which the two alleles are IBD
All markers in the linkage region would be homozygous - if this is broken up and the disease is lost it shows the broken region causes the disease
Explain how linkage analysis can be performed in Merlin, and what information is needed for this to be performed
TBC
Explain how regions of homozygosity can be identified in Excel and tools such as homozygosity mapper
TBC
Problems with linkage analysis
Deviation from Mendelian laws of inheritance
Reduced penetrance
Phenocopy rate
Locus heterogeneity
Variable expressivity
Describe PCR and fragment analysis
Amplifies regions of DNA using flanking primers
Fragment analysis - PCR followed by electrophoresis
Describe Sanger sequencing
Cycle sequencing, with fluorescent/coloured chain terminating tags
Used to identify SNP’s in single gene tests
Describe NGS
Sequencing of a ‘library’ of DNA sequences, to form a consensus sequence of the short reads
Used in disease panels - enriching disease regions and then sequencing only these areas
Describe process of WES
DNA library construction
Shear DNA (mechanically, enzymatically or physically/sonication)
End repair - fix sticky ends with polymerase, add A-tail with T overhang adapters acting as primer binding sites and anchors to the flow cell
Cluster generation via bridge PCR
Sequencing by synthesis - sequence one nucleotide at a time, capture, remove colour , continue
What are the 4 steps in processing NGS data
Primary - process raw data into a fastq file
Secondary - align into bam file
Tertiary - identify variants into vcf file
Quaternary - interpret, done via Annovar annotation software
Describe exome sequencing step 1 - target enrichment
Capture target regions of interest with baits
Potential to capture several Mb of genomic regions (30-60 Mb)
DNA library added to biotinylated RNA baits complementary to area of interest
Incubated together alongside buffer
Streptavidin coated magnetic beans attach to baited areas and then can be pulled with magnet
What are ways of identifying rare mutations
NGS + linkage analysis
NGS + autozygosity mapping to find homozygous regions
NGS in recessive disease - look for homozygous or compound heterozygous regions in affected individuals
De novo - find mutations in child but not in parents
Overlap - look at multiple families, and find shared variants
What is a case control study
Genotype matched cases, and control and look for a particular marker
Statistical analysis performed to determine which genetic loci correlate with disease
- If it appears in higher frequency in cases, it implies that the variant is associated with disease
MORE INFO:
Definition of the trait must be applied in a rigorous and consistent way
Controls must be as well-matched as possible e.g. age, sex, ethnicity, location etc.
Describe the use of SNP microarrays
Microarray has oligonucleotides that hybridise adjacent to SNP sites, so that when the next base is added that would be the SNP site
The bases are coloured allowing you to see which SNP is present
Explain the concept of haplotypes and linkage disequilibrium (LD)
High LD = haplotype remains same, less recombination for two given alleles
HapMap and Haploview allows you to select SNP with high LD
Critique the use of GWAS
GWAS contribution to the genetic component of disease has been estimated to be low (<5%)
Possible reasons
Common SNP’s of small effect
We don’t look at - Rare SNPs, CNVs, epigenetic variation
Estimation of genetic components can be incorrect
£££
Explain why GWAS threshold is lower than the standard
If we have significance of p<0.05 and use 10M SNP = p<5*10-8
What is GWAS
Recruit large numbers of cases and controls, whose SNP’s are genotyped
The significance is visualised in a Manhattan plot - association against genome location
X-axis = position of SNP on chromosome, each chromosome a different colour to adjacent ones
Y-axis is -log10(p-value from chi squared)
Give a published example of GWAS
Covid 19 study with italian and spanish populations
1500+ patients, genotyped
Increased expression of SLC6A20 which codes the protein that interacts with the cell surface receptor that enables entry of SARS-CoV2
Found ABO locus link
Limitations
Population, age, gender was not taken into account (men are at greater risk), underlying risk factors not accounted for, short time, bias towards symptomatic recruitment
What is a replication study
TBC
Describe the involvement of ApoE in Alzheimer’s
E4 allele variant increases risk greatly
E3 is a neutral allele
E2 protects from Alzheimer’s disease
What is a SNP odds ratio
The OR represents theoddsthat severe symptoms will occur when you have that SNP, compared to theoddsof the symptoms occurring in the absence of that SNP (effect size)
Above than 1 = 1 disease risk, below 1 = 1 protective
1.33 per risk SNP (median given) = 33% increase
What is 95% CI in relation to odds ratio
Confidence interval gives the 95% range for the real value, while odds ratio is the effect size
If odds ratio is >1, ideally the lower end of the CI >1 as this shows where the real value may be
If it falls below 1 then it suggests there is a chance that the real value may be that of no effect/no significance
An ideal confidence interval is one that has a narrow interval
Define copy number variation
Insertions or deletions; typically greater than or equal to 1kb in length of a particular DNA sequence, altering the natural number of copies of that sequence and consequently, chromosome structure
Conditions involve associated are called genomic disorders
Explain how CNV’s arise
There are two ways - homologous and non-homologous
Homologous - via low copy repeats
LCRs are highly homologous/identical repeat sequences found in multiple locations in the genome
Non-allelic homologous recombination
Holliday junction forms - LCR’s misalign and crossover
Describe non-allelic homologous recombination
Non-allelic homologous recombination
Holliday junction forms - LCR’s misalign and crossover
Explain identification of CNV’s via FISH
FISH enables fluorescent visualisation
Probe targets region of interest
Denatured probe and target DNA mixed, allowing probe to bind = fluorescence
More/less fluorescence shows CNV
Explain identification of CNV’s via Array-CGH
Patient DNA = Green
Control = Red
Yellow = normal Green = CNV insert Red = CNV del
Explain identification of CNV’s via MLPA
Variation of PCR with a single primer pair
An oligonuclotide probe pair is used
One half recognises the forward sequence, the other recognises the reverse thus forming one long probe
It inserts itself into the DNA sequence, which the PCR primers will detect for amplification
Only the MLPA oligonucleotides are amplified, as a proxy for the DNA
Signal strength compared to reference DNA
Describe NGS based detection of CNV’s
After data is sequenced, it is aligned and reads for each exon are counted
A map is made along the sequence depicting the number of reads - increased/decreased reads in a region compared to the rest suggests CNV presence
WES cannot find translocations or inversions - WGS can
Give examples of diseases involved with CNV’s
Alzheimer’s - APP duplication in Chr21 (linked to Down’s)
Parkison’s - SNCA triplication which is neurotoxic
GLS and cerebella ataxia - movement disorder due to GLS duplication leading to truncation
Leukoencephalopathy - vanishing white matter, AARS2 deletion + mutation
17q21.21 microdeletion - developmental delay, H2 is an inversion of H1 and filled with LCR = high risk of deletion
22q11 deletion syndrome (DiGeorge syndrome) - affects pharyngeal arch thus the heart, face, thymus, parathyroids