Identifying Disease Genes Flashcards

1
Q

What is heritability

A

This is the proportion of phenotypic variation that is attributable to genetic variation

Classically, in twins, heritability of a trait is twice the difference in the correlation between MZ and DZ

h2 = 2(r(MZ)-r(DZ))

R = correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Examples of complex and non-complex diseases

A

Complex - schizophrenia, stroke, type 2 diabetes, hypertension, depression, asthma

Purely genetic - DMD, phenylketonuria, downs syndrome

Purely environment - poisoning mesothelioma (lung cancer due to asbestos), car accident

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Critique the use of twin studies

A

Twin studies compare the frequency of a phenotype in identical twins with non-identical twins (concordance rate)

Higher in identical = genetic

Problems:
Identical twins are not 100% genetically identical
MZ twins may have more similar environment than DZ
Twins may not be representative of the population
Twin registries may have recruitment bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Critique the use of adoption studies

A

If adopted children have the same phenotype as biological parents and siblings , there’s a genetic effect

If adopted children have the same phenotype as adoptive parents and unrelated siblings there’s an environmental effect

Problems:
Recruitment bias, representation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Critique the use of segregation studies

A

Involve recruitment of large families where a trait is present in some members but not all

Studying inheritance patterns may identify classic mendelian patterns e.g. AR but not for complex disease

Problems
Phenocopies - same phenotype, different causes
Segregation analysis in diseases with no clear inheritance pattern is possible but it involves mathematics and programming

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Critique the use of migration studies

A

If immigrants continue to have a disease regardless of where they live this suggests a genetic effect

If immigrants develop a new disease common in their new home this suggests an environmental effect

Problems
Immigrants often keep their cultural practices which influence their environment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define and calculate population and sibling relative risk

A

In a population, we don’t know genetic risk nor the environmental risk. All we know is the frequency of disease (prevalence)

Population (n) with x people affected - thus prevalence is x/n

Sibling relative risk - The ratio of the frequency of disease in the sibling of an affected person compared to the rate in the general population, represented as λS

This is a measure of both genetic and environmental effects

λS = patience sibling risk of disease/population prevalence

E.g. 1/20 siblings have disease, prevalence = 1/200 thus λS = 10

High values = monogenetic disease

Lower values = complex, more common diseases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain what linkage analysis is

A

A form of mapping to identify SNP’s/microsatellites physically linked with a disease region - used for smaller mutations NOT chromosomal abnormalities.

LOD score >3 evidences linkage

Used in AD/X-linked disorders and AR disease
Used alongside autozygosity mapping in consanguineous/endogamous AR disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain what autozygosity mapping is

A

Autozygosity = homozygosity in which the two alleles are identical by descent
Find regions of high homozygosity in affected individuals not found in unaffected individuals

Completed when attempting to identify disease-causing mutations in families with single gene disorders
Potentially harmful recessive alleles can be hidden from selection in the heterozygous individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Compare and contrast SNP and microsatellites as genetic markers

A

Compare:
Polymorphic
Randomly distributed across genes and genome

Contrast Microsatellites
Heterozygous/Polymorphic
PCR based identification with fluorescent primers - manual identification, labour intensive

Contrast SNP
Biallelic
More markers, closer together
Microarray based - red/green/yellow, automated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define genotype and haplotype

A

Genotype = all of the genes in a given region /genome

Haplotype = a set of adjacent nucleotides on the same chromosome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Discuss the importance of recombination for genetic variation and how its related to linkage

A

Recombination allows for genetic variation, as it creates new haplotypes

Occurs in meiosis 1

At any particular loci there are 4 possible combinations - 50% chance of new haplotype

Syntenic (same chromosome) loci recombination at distant loci = no linkage = 0.5 recombination frequency

Syntenic loci recombination at close loci = linkage = <0.5 recombination frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define linkage disequilibrium

A

Describes the relationship between alleles of two loci

Closer they are, the more likely they have linkage disequilibrium

Degree of equilibrium exceeds that of chance

Segments of DNA
that segregate together are said to be linked and can use the presence of one to expect the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define identical-by-state and identical-by-descent

A

IBS = identical genotype

IBD = identical genotype, however it comes from the same ancestor, surrounding SNP would be homozygous also

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is autozygosity mapping effective in consanguineous/endogamous populations with recessive disorders

A

Autozygosity = homozygosity in which the two alleles are IBD

All markers in the linkage region would be homozygous - if this is broken up and the disease is lost it shows the broken region causes the disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Explain how linkage analysis can be performed in Merlin, and what information is needed for this to be performed

A

TBC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Explain how regions of homozygosity can be identified in Excel and tools such as homozygosity mapper

A

TBC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Problems with linkage analysis

A

Deviation from Mendelian laws of inheritance

Reduced penetrance
Phenocopy rate
Locus heterogeneity
Variable expressivity

19
Q

Describe PCR and fragment analysis

A

Amplifies regions of DNA using flanking primers

Fragment analysis - PCR followed by electrophoresis

20
Q

Describe Sanger sequencing

A

Cycle sequencing, with fluorescent/coloured chain terminating tags

Used to identify SNP’s in single gene tests

21
Q

Describe NGS

A

Sequencing of a ‘library’ of DNA sequences, to form a consensus sequence of the short reads

Used in disease panels - enriching disease regions and then sequencing only these areas

22
Q

Describe process of WES

A

DNA library construction

Shear DNA (mechanically, enzymatically or physically/sonication)

End repair - fix sticky ends with polymerase, add A-tail with T overhang adapters acting as primer binding sites and anchors to the flow cell

Cluster generation via bridge PCR

Sequencing by synthesis - sequence one nucleotide at a time, capture, remove colour , continue

23
Q

What are the 4 steps in processing NGS data

A

Primary - process raw data into a fastq file

Secondary - align into bam file

Tertiary - identify variants into vcf file

Quaternary - interpret, done via Annovar annotation software

24
Q

Describe exome sequencing step 1 - target enrichment

A

Capture target regions of interest with baits

Potential to capture several Mb of genomic regions (30-60 Mb)

DNA library added to biotinylated RNA baits complementary to area of interest

Incubated together alongside buffer

Streptavidin coated magnetic beans attach to baited areas and then can be pulled with magnet

25
Q

What are ways of identifying rare mutations

A

NGS + linkage analysis

NGS + autozygosity mapping to find homozygous regions

NGS in recessive disease - look for homozygous or compound heterozygous regions in affected individuals

De novo - find mutations in child but not in parents

Overlap - look at multiple families, and find shared variants

26
Q

What is a case control study

A

Genotype matched cases, and control and look for a particular marker

Statistical analysis performed to determine which genetic loci correlate with disease
- If it appears in higher frequency in cases, it implies that the variant is associated with disease

MORE INFO:
Definition of the trait must be applied in a rigorous and consistent way
Controls must be as well-matched as possible e.g. age, sex, ethnicity, location etc.

27
Q

Describe the use of SNP microarrays

A

Microarray has oligonucleotides that hybridise adjacent to SNP sites, so that when the next base is added that would be the SNP site

The bases are coloured allowing you to see which SNP is present

28
Q

Explain the concept of haplotypes and linkage disequilibrium (LD)

A

High LD = haplotype remains same, less recombination for two given alleles

HapMap and Haploview allows you to select SNP with high LD

29
Q

Critique the use of GWAS

A

GWAS contribution to the genetic component of disease has been estimated to be low (<5%)

Possible reasons
Common SNP’s of small effect
We don’t look at - Rare SNPs, CNVs, epigenetic variation
Estimation of genetic components can be incorrect

£££

30
Q

Explain why GWAS threshold is lower than the standard

A

If we have significance of p<0.05 and use 10M SNP = p<5*10-8

31
Q

What is GWAS

A

Recruit large numbers of cases and controls, whose SNP’s are genotyped

The significance is visualised in a Manhattan plot - association against genome location

X-axis = position of SNP on chromosome, each chromosome a different colour to adjacent ones

Y-axis is -log10(p-value from chi squared)

32
Q

Give a published example of GWAS

A

Covid 19 study with italian and spanish populations

1500+ patients, genotyped

Increased expression of SLC6A20 which codes the protein that interacts with the cell surface receptor that enables entry of SARS-CoV2

Found ABO locus link

Limitations
Population, age, gender was not taken into account (men are at greater risk), underlying risk factors not accounted for, short time, bias towards symptomatic recruitment

33
Q

What is a replication study

A

TBC

34
Q

Describe the involvement of ApoE in Alzheimer’s

A

E4 allele variant increases risk greatly
E3 is a neutral allele
E2 protects from Alzheimer’s disease

35
Q

What is a SNP odds ratio

A

The OR represents theoddsthat severe symptoms will occur when you have that SNP, compared to theoddsof the symptoms occurring in the absence of that SNP (effect size)

Above than 1 = 1 disease risk, below 1 = 1 protective

1.33 per risk SNP (median given) = 33% increase

36
Q

What is 95% CI in relation to odds ratio

A

Confidence interval gives the 95% range for the real value, while odds ratio is the effect size

If odds ratio is >1, ideally the lower end of the CI >1 as this shows where the real value may be

If it falls below 1 then it suggests there is a chance that the real value may be that of no effect/no significance

An ideal confidence interval is one that has a narrow interval

37
Q

Define copy number variation

A

Insertions or deletions; typically greater than or equal to 1kb in length of a particular DNA sequence, altering the natural number of copies of that sequence and consequently, chromosome structure

Conditions involve associated are called genomic disorders

38
Q

Explain how CNV’s arise

A

There are two ways - homologous and non-homologous

Homologous - via low copy repeats
LCRs are highly homologous/identical repeat sequences found in multiple locations in the genome

Non-allelic homologous recombination
Holliday junction forms - LCR’s misalign and crossover

39
Q

Describe non-allelic homologous recombination

A

Non-allelic homologous recombination

Holliday junction forms - LCR’s misalign and crossover

40
Q

Explain identification of CNV’s via FISH

A

FISH enables fluorescent visualisation

Probe targets region of interest

Denatured probe and target DNA mixed, allowing probe to bind = fluorescence

More/less fluorescence shows CNV

41
Q

Explain identification of CNV’s via Array-CGH

A

Patient DNA = Green
Control = Red

Yellow = normal
Green = CNV insert
Red = CNV del
42
Q

Explain identification of CNV’s via MLPA

A

Variation of PCR with a single primer pair

An oligonuclotide probe pair is used

One half recognises the forward sequence, the other recognises the reverse thus forming one long probe

It inserts itself into the DNA sequence, which the PCR primers will detect for amplification

Only the MLPA oligonucleotides are amplified, as a proxy for the DNA

Signal strength compared to reference DNA

43
Q

Describe NGS based detection of CNV’s

A

After data is sequenced, it is aligned and reads for each exon are counted

A map is made along the sequence depicting the number of reads - increased/decreased reads in a region compared to the rest suggests CNV presence

WES cannot find translocations or inversions - WGS can

44
Q

Give examples of diseases involved with CNV’s

A

Alzheimer’s - APP duplication in Chr21 (linked to Down’s)

Parkison’s - SNCA triplication which is neurotoxic

GLS and cerebella ataxia - movement disorder due to GLS duplication leading to truncation

Leukoencephalopathy - vanishing white matter, AARS2 deletion + mutation

17q21.21 microdeletion - developmental delay, H2 is an inversion of H1 and filled with LCR = high risk of deletion

22q11 deletion syndrome (DiGeorge syndrome) - affects pharyngeal arch thus the heart, face, thymus, parathyroids