Disease Gene Discover (Rare Disease) Flashcards

1
Q

What is Mendel’s first law?

A

Law of Segregation of genes

During gamete formation, the alleles for each gene segregate from each other so that each gamete carries only one allele for each gene.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Mendel’s second law?

A

Law of Independent Assortment.

Genes for different traits can segregate independently during the formation of gametes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Mendel’s third law?

A

Law of Dominance

Some alleles are dominant while others are recessive; an organism with at least one dominant allele will display the effect of the dominant allele.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Recombination frequency?

A

Recombination frequency is a measure of genetic linkage and is used in the creation of a genetic linkage map.

Recombination frequency (θ) is the frequency with which a single chromosomal crossover will take place between two genes during meiosis.

e.g. if 100 meioses (offspring) are examined and 1 has a crossover event (recombinant) then θ=0.01.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the definition of genetic linkage?

A
  • Genetic linkage is the tendency of alleles that are located close together on a chromosome to be inherited together during the meiosis phase of sexual reproduction.
  • Genes whose loci are nearer to each other are less likely to be separated onto different chromatids during chromosomal crossover, and are therefore said to be genetically linked.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a centiMorgan and how does this relate to the recombination fraction?

A

A centimorgan (cM) is a unit that describes a recombination frequency of 1%.

In this way we can measure the genetic distance between two loci, based upon their recombination frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Name a limitation of using centiMorgans to measure genetic distance.

A
  1. Double crossovers would turn into no recombination.
  2. In this case it appears that no crossover have taken place (indicating genes are close when infact they’re far)
  3. If the loci we’re analysing are very close (less than 7 cM) a double crossover is very unlikely.
  4. When distances become higher, the likelihood of a double crossover increases.
  5. Therefofe as the likelihood of a double crossover increases we systematically underestimate the genetic distance between two loci.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the maximum value for θ and why?

A
  1. The value of θ will never exceed 0.5 because it would violate Mendel’s Second Law of independent assortment of genes
  2. Take 2 genes/loci e.g. leaf colour (Red/Green) and leaf shape (Round/Spikey)
  3. If these genes are on different chromosomes their alleles (Red/Green and Round/Spikey) will assort into gametes independently.
  4. On average 50% of the gametes contain a combination of alleles which of produce the same phenotype offspring (non-recombinant)
  5. The other 50% contain alleles which of produce a defferent phenotype (Recombinant)
  6. 50:50 expressed as a fraction is 0.5 – If θ = 0.5 the loci are not linked.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why are double crossovers very unlikely if the loci we’re analysing are very close (less than 7 cM)?

A
  1. A second crossover event in the immediate vicinity is unlikely as the first event creates a phenomenon called interference, which restricts further crossing over.
  2. i.e. Individual crossover events are often not independant. The interaction between crossover events is called interference.
  3. Thus, the distribution of recombinants along a chromosome is non-random and sex-specific
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a region of Autozygosity?

A

When the two alleles at a locus originate from a common ancestor by way of nonrandom mating (inbreeding), the genotype is said to be autozygous.

This is also known as being “identical by descent”, or IBD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Autozygosity Mapping?

A
  1. Autozygosity mapping is a form of linkage analysis used in consanguineous families.
  2. Autozygosity occurs when individuals are homozygous at a particular locus because the alleles are IBD
  3. In affected individuals, the size of autozygous segment is reduced due to recombination events during meiosis over successive generations.
  4. Unaffected individuals will heterozygous or homozygous for another allele around disease locus.
  5. In autozygosity mapping, shared blocks of homozygous markers that segregate with the disease of interest can be analysed to determine the location of the disease gene.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is autozygosity mapping performed?

A
  1. Identify a consanguinous family with a recessive phenotype, with multiple affected probands.
  2. Genotype large number of SNPs or microsatellites spread throughout genome
  3. Identify regions of homozygosity shared by all affected probands
  4. Computer programs, e.g. IBDfinder, analyse the data from autozygosity studies
  5. Fine map (narrow) the candidate region using more polymorphic markers
  6. Identify candidate genes within homozygous region and known gene function, sequence them.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What factors can affect autozygosity mapping?

A
  1. Number of informative affected and unaffected individuals
  2. The frequency of allele in population: the rarer the allele, the greater the likelihood that homozygosity represents autozygosity (IBD).
  3. Degree of relatedness: the more remotely related the individuals, the smaller the proportion of the genome that is shared from the common ancestor (due to recombinations).
  4. Unexpected allelic heterogeneity, identification of a homozygous IBD region unrelated to the disease locus and the potential for inflation of LOD scores due to underestimation of the extent of inbreeding (Miano at al. 2000).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is linkage analysis?

A

Is a method used to identify the gene responsible for a given phenotype

Linkage studies usually involve looking at large families where the disease affects individuals in several generations.

The key is to identify a genetic marker that is always inherited by those family members with the disease but not by those who do not have the disease.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What type of markers are used in linkage studies?

A
  • Linkage studies usually start by identifying genetic markers, commonly SNPs or STRs, on a section of a chromosome and then narrowing the region down until the gene or gene variant of interest is identified
  • SNPs have the disadvantage of being bi-allelic and are thus not as highly polymorphic as STRs.
  • However, they represent the most frequent type of polymorphism and their detection via Genotyping chips detects hundreds of single-nucleotide polymorphisms at a time.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the ideal qualities of the markers used for linkage analysis. What are the most common types of markers used for linkage analysis?

A
  1. Be easy and cheap enough to score e.g.: markers found in blood/saliva.
  2. Be highly polymorphic to increase the chance of being informative.
  3. Microsatelite markers and SNPs have largely superseded all other markers.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How is linkage analysis usually performed?

A
  1. Collecting families in which the character of interest segregates (i.e there is mendelian inheritance of the trait)
  2. Genotype a series of markers in all members of the family.
  3. Perform scoring of meioses as recombinant or non-recombinant for each genotyped marker.
  4. Finding a genetic marker that segregates along with the trait of interest most of the time.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

At what point do the result of linkage analysis become credible?

A
  1. All results must be replicated to be credible.
  2. Failure to replicate linkage does not necessarily disprove the hypothesis as linkages will often involve weak effects.
  3. Replication studies should always state their power to detect the proposed effect with the given sample size.
  4. Negative results are only meaningful if the power is high.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is Parametric linkage analysis?

A

Parametric linkage analysis is called thus because a series of parameters need to be specified before analysis can begin.

Parametric linkage analysis is the traditional approach, whereby t_he probability that a gene_ important for a disease is linked to a genetic marker is studied through the LOD score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What parameters need to be defined in parametric linkage analysis?

A
  1. Mode of inheritance (dominant/recessive/X-linked etc)
  2. Gene frequencies
  3. Penetrance – the most difficult to specify
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What must you be careful of when setting paramters for parametic linkage analysis?

A
  1. Parametric linkage analysis is not suitable for complex disease such as diabetes or schizophrenia (no idea of gene frequencies or penetrance of any susceptibility alleles or sometimes mode of inheritance).
  2. Ensure that the trait being assessed does not have phenocopies (i.e. other non-genetic causes for the trait being assessed)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a LOD score?

A

‘LOD’ stands for ‘Logarithm of Odds’ and is denoted by the letter Z.

The LOD score is a statistical test used for linkage analysis to compare the likelihood of obtaining the test data if the two loci are indeed vs observing the same data purely by chance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Why is the odds ratio logged to produce a LOD score?

A

Odds ratios can have large ranges anywhere from 1 - 1 million therefore by logging the odds ratio the numbers are brought down to more manageable ranges.

Z = base 10 log 10 = 1 (10:1 odds)

Z = base 10 log 100 = 2 (100:1 odds)

Z = base 10 log 1000 = 3 (1000:1 odds)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Briefly describe the method of parametric linkage analysis?

A
  1. Arbitrarily select a recombination fraction (e.g. θ = 0.2)
  2. For θ = 0.2, calculate the probability of observing the given birth sequence if the marker is indeed linked to the disease gene.
  3. Calculate the probability of observing the given birth sequence if the marker is NOT linked to the disease gene.
  4. Odd of linkage = point 2 / point 3
  5. Log point 4 to calculate LOD score
  6. Arbitrarily select a new recombination fraction (e.g. θ = 0.3) and repeat.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How is the odds score calculated in parametric linkage analysis?

A
  1. Calculate the probability of observing the given birth sequence if the marker is indeed linked to the disease gene: (1-θ)^NR * θ^R
  2. Calculate the probability of observing the given birth sequence if the marker is NOT linked to the disease gene: (0.5)^NR+R
  3. OR = Point 1 / point 2
26
Q

How are LOD scores interpreted when performing parametric linkage analysis?

A

After trialling multiple values of θ, the most likely recombination fraction for a marker and disease gene is the one at which the lod score is highest.

Thus when Z=3 this corresponds to 1000:1 odds that the marker is linked with the disease gene which is deemed sufficiently strong odds in favour of linkage.

Linkage can be rejected at values of Z=-2 and values between -2 and +3 are inconclusive

27
Q

What is multipoint mapping in parametric linkage analysis?

A
  1. Linkage analysis can be more efficient if data from more than two loci are analysed simultaneously (multipoint mapping).
  2. Multipoint mapping also helps overcome the problem of uninformative single markers. Families are normally genotyped for hundreds of thousands of markers in genome wide searches.
28
Q

Discuss some limitations of linkage analysis.

A
  1. Multi-generational multi-affected cases hard to find - especially if disease is late-onset.
  2. Vulnerable to errors: switched samples, non-paternity, misassignment of disease, order of markers wrong.
  3. Locus heterogeneity
  4. Dependant on number of meioses analysed
  5. Correctly specifying parameters
  6. Some diseases no amenable to linkage;
    1. Complex disorders/traits
    2. Common phenocopies
29
Q

Define Identity by descent (IBD)

A

Alleles shared by affected relatives are both copies of one specific allele that was present in a recent ancestor. This must be demonstrable.

30
Q

Defiine Identity by state (IBS)

A

Alleles shared by affected relatives appear identical but a second independent example of the same allele has entered the family at some stage

e.g. common recessive alleles for disorders such as CF can easily be brought in via marriage. Whilst the affected child’s alleles are identical, more specifically they’re IBS not IBD.

31
Q

What is _Non-_parametric linkage analysis?

A

A method of mapping disease genes that does not require an inheritance model and makes no assumptions about other genes involved in disease risk

(as opposed to parametric linkage analysis which requires a tightly specified genetic model).

32
Q

What is the principle behind Non parametric linkage analysis?

A
  1. The region of the genome within which disease-causing gene is situated will be co-inherited from a common ancestor by affected members of the family more frequently than would be expected by chance.
  2. Shared segment analysis can be performed using any set of affected relatives but affected sib pairs most often used (easy to collect).
  3. To perform non-parametric linkage analysis it is important to distinguish chromosomal segments that are IBD from those IBS.
33
Q

Breifly describe the method of Non-parametric linkage analysis.

A

See video: Part 06: Non-Parametric Linkage Analysis

34
Q

How is information regarding IBD and IBS factored in to non-parametric linkage analysis?

A
  1. IBD alleles are treated in terms of the Mendelian probability of inheritance from the defined common ancestor.
  2. IBS allelels need to have the population frequency of the allele taken into account.
  3. e.g. for very rare alleles, two independent origins are unlikely so generally IBS implies IBD however with common alleles (delF508) no such inference can be made.
  4. Nonparametric analysis can be performed with either IBS or IBD data, provided that the appropriate analysis is used.
  5. IBD is more powerful than IBS but it requires samples from relatives to prove the alleles are IBD and to narrow down the segement range.
35
Q

What are the merits of non-parametric linkage analysis vs parametic linkage analysis?

A
  • Non-parametric methods are widely assumed to be more robust than parametric methods.
  • However, complications arise regarding how statistics should be calculated and assessed when multiple related individuals are affected, particularly when combining evidence from families of different sizes (methods have been described to deal with this).
36
Q

What is Affected Sib Pair analysis?

A
  1. Model free method used to identify alleles shared by affected siblings (regions identical by descent)
  2. Random inheritance predicts siblings share 0, 1, 2 alleles with freq 1/4, 1/2, 1/4
  3. If they inherit an allele more or less than would be expected by chance this indicates that the allele or its locus may be involved with the disease.
  4. i.e Look for chromosomal regions where sharing is above the random 1:2:1 ratios (e.g. sharing of one allele is >½ and sharing of 2 alleles is >¼ ).
37
Q

Describe an succesful example of non-parametric Linkage Analysis.

A
  • Hugot et al 1996: Using a non-parametric two-point sibling-pair linkage method identified IBD1 locus, on chromosome 16. Proportions of siblings sharing 0, 1 and 2 alleles IBD were 0.19, 0.4 and 0.42, respectively, suggesting recessive inheritance.
  • Ogura et al 2001: reported that a frameshift mutation in the NOD2 gene at the IBD1 locus. This was the first susceptibility gene to be identified in Crohn’s Disease.
38
Q

What is ‘positional cloning’?

A
  • Method of identifying a gene solely on its approximate chromosome location.
  • Initially a candidate region is located via techniques such as linkage analysis or autozygosity mapping, or analysis of chromosome rearrangements.
  • Positional cloning is then used to narrow down the region.
  • Therefore prior knowledge of the gene or protein is not required
  • Region size is important. A small region reduces potential for errors and decreases the amount of work needed
39
Q

How are the clones produced for positional cloning?

A

Chromosome walking

  • Starts with a known marker at the end of the candidate region.
  • DNA fragment with this marker used as a probe to screen a genomic library to identify other clones containing the marker and adjacent sequences.
  • Repeat step with new clones to identify further overlapping clones.
  • Starting from both ends of the candidate region should produce a set of clones that meet in the middle.
  • Very slow method. Presence of repeated sequences can be a problem as this causes non-specific probe binding
40
Q

What alternative to chromosomal walking can be used to generate clones for positional cloning?

A

Chromosome jumping: overcomes issues with chromosome walking

  • Restriction digest of DNA produces fragments which are separated by pulse-field gel electrophoresis.
  • Fragments 80-130kb in length containing a known marker near one end are ligated to form a circle.
  • The marker now near DNA located 80-130kb away. This junction DNA is cut and cloned into another vector which can be used as another probe in chromosome jumping or walking.
  • Genes within cloned genomic region can be identified by using zoo blots or screening cDNA libraries.
  • If the DNA hybridises, the cDNA can be isolated and the DNA and protein sequence deduced.
41
Q

Once a library of clones has been created how is the clone containing the disease gene selected for sequencing?

A
  • For each new DNA clone a polymorphism is identified
  • This is tested in the mapping population for its recombination frequency compared to the mutant phenotype.
  • When the DNA clone is at or close to the mutant allele, the recombination frequency with the polymorphism should be close to zero.
  • This clone is then selected for sequencing.
  • If the chromosome walk proceeds through the mutant allele, the new polymorphisms will start to show increase in recombination frequency compared to the mutant phenotype.
42
Q

What is the basis for using WES/WGS for Rare Disease gene mapping rather than traditional methods such as linkage?

A
  • NGS sequencing platforms have become widely available, reducing the cost and time it takes to identify disease-causing genes using this method.
  • WES is replacing these approaches, which are labour and resource intensive and can be costly.
  • i.e. Sanger sequencing of candidate genes after positional mapping (linkage analysis etc).
  • WGS/WES can also being used to investigate complex traits and cancers as all variants in the genome/exome are uncovered.
43
Q

What important genetic disease was identified by positional cloning?

A

Cystic Fibrosis

Gene identified by positional cloning

(Rommens et al 1989)

44
Q

What are the drawbacks to using WES/WGS for disease gene mapping?

A
  • WES typically identifies ~20,000 variants per patient.
  • Incidental findings might be detected for other unrelated conditions that have not yet been diagnosed or are not currently penetrant.
  • Computational demands to narrow down the massive list of identified variants - is not issue with mapping/sanger strategies.
45
Q

How is WES/WGS data filtered to make it more managable for disease gene identification?

A

Variants can be filtered using a range of criteria:

  1. Quality: the total number of independent reads showing the variant and the percentage of reads showing the variant
  2. ‘Discrete filtering’: Non-pathogenic variants can be excluded by comparison with known polymorphisms in population databases
  3. ‘Stratifying candidates’ after discrete filtering: On the basis of their predicted functional impact. Variants outside the coding regions and synonymous variants can be filtered out on the basis of the assumption that these will have minimum effect.

These steps can reduce the number of candidate mutations by 90-95%, leaving around 150-500 that can be prioritised as potential pathogenic variants.

46
Q

After variant filtering what is the key criterion for the subsequent stages of analysis for Rare Disease gene identification?

A
  • When there is either a familial recurrence of a rare phenotype or the presence of consanguinity, the likelihood of a monogenic rare disease is high.
  • The mode of inheritance influences the selection and number of individuals to sequence, as well as the analytical approach used
  • In addition, The selection of the appropriate gene discovery approach is contingent on whether the mutations are anticipated to be inherited, de novo or mosaic
47
Q

What WES strategy is utilised when the MOI indicates autosomal recessive disease in a family?

A
  • Affected siblings are sequenced to identify shared variation.
  • Compound heterozygosity is expected in the absence of consanguinity or occurrence in an isolated population.
48
Q

What WES strategy is utilised when the MOI indicates consanguinous autosomal recessive disease in a family?

A
  • Affected siblings are sequenced to identify shared homozygous variants.
  • This may also be observed in non-consanguinous coupels from isolated populations.
49
Q

What WES strategy is utilised when the MOI indicates X-linked recessive disease in a family?

A
  • The favoured strategy is to analyse the two most remotely related male family members.
  • Autosomal variants can be disregarded.
50
Q

What WES strategy is utilised when the MOI indicates Autosomal dominant disease in a family?

A
  • The mapping of the gene to a discrete chromosomal region (for example, <2 Mb) may allow gene identification from the analysis of one individual
  • larger genomic regions or diseases which are not mapped require the analysis of a greater number of individuals
51
Q

What WES strategy is utilised when the MOI indicates de novo dominant disease in a family?

A
  • Analysis of WES data from unaffected parents-affected child trios generally produces a handful of de novo variants for further analysis
  • Comparison of these variants between as few as two families will generally reduce these to a single candidate gene.
52
Q

What WES strategy is utilised when the MOI indicates mosaic mutations in a family?

A
  • The comparison of sequence data from a patient’s affected and unaffected tissue is frequently sufficient to identify de novo mosaic disease-causing mutations
  • This strategy is commonly used in oncology to identify the somatically acquired mutations in the tumour tissue
53
Q

Describe two successful examples of projects that utilise WES/WGS to identify new rare disease genes?

A
  1. Genomics England 100,000 Genomes project
  2. Deciphering Developmental Disorders (DDD) project
54
Q

What is the basis for using cytogenetic methods for identifying rare disease genes?

A
  • Pre-2000’s it was expensive to conduct large amounts of DNA sequencing and linkage studies
  • It may not have been possible to do linkage in small families or de novo cases
  • Thus an index case with a cytogenetic abnormality rather than a mutation was highly valuable for indicating the genomic location of the disease causing gene.
  • Where multiple cytogenetic cases are identified the overlapping regions between the cases can be used to identify a minimal common region which must contain the disease gene
  • Small scale sequencing can then be fouces on genes in the minimal region to confirm cases without cytogenetic abnormality to identify the disease gene.
55
Q

What types of cytogenetic abnormality may indicate proximity to a disease gene?

A
  • Apparently Balanced Translocation can lead to
    • Sub-microscopic imbalance
    • Fusion of a gene (LOF or GOF)
    • Separation from cis acting regulatory elements
  • Deletions/duplications leading to loss or gain of gene
  • Inversions have a positional effect
56
Q

Give an example of how an apparently balanced translocation (that wasn’t balanced) has lead to the identification of a disease gene.

A

CHARGE syndrome

  • One patient had an apparently ‘balanced’ 6;8 translocation
  • Another tested using A-CGH showed a de novo 4.8Mb deletion at 8q12
  • The first patient tested by array showed two microdeletions partially overlapping the 4.8Mb deletion in the second patient with 2Mb minimal region.
  • 17 further patients screened did not have deletions - sequencing of all 9 genes in overlap mutaitons in CHD7
57
Q

Give an example of how an apparently balanced translocation has lead identification of a disease gene by creating to gene fusion leading to LOF.

A

SOTOS Syndrome

  • Patient identified with a translocation 46,XX,t(5;8)(q35;q24.1)
  • Breakpoint disrupted NSD1 (5q35). Mutations identified in cases with no translocation

Duchenne Muscular Dystrophy (1994)

  • In females with an X;autosome translocation the normal X is preferentially inactivated, to avoid an autosomal imbalance
  • If the translocation affects a disease gene this will lead to loss of function of a gene on the X;autosome
  • A female with X;21 translocation lead to identification the DMD gene
58
Q

Give an example of how an apparently balanced translocation has lead identification of a disease gene by creating to gene fusion leading to GOF.

A
  • This not common in rare disease but is frequently seen in tumours / haemato-oncology
  • Chronic Myeloid Leukaemia- BCR-ABL fusion gene created by juxtapositioning the Abl1 gene on 9q34 to a part of the BCR (breakpoint cluster region) gene on chromosome 22q11.
59
Q

Give an example of how an apparently balanced translocation has lead identification of a disease gene by separation from cis acting regulatory elements.

A

Aniridia (severe hypoplasia of the iris)

  • PAX6 haploinsufficiency at 11p13 cause of aniridia
  • Deletions and LOF point mutations are causitive
  • Also translocations with breakpoint 3’ of PAX6 causitive
  • Translocation affects gene expression = LOF and haploinsufficiency
60
Q

How can cytogenetic deletions/duplications (CNVs) lead to disease gene identification?

A
  • Gene that are susceptable to haploinsufficiency lead to disease when there is loss of one allele caused by a deletion
  • Increasingly clear that many microdeletion syndromes are largely or completely due to the phenotypic effects of haploinsufficiency for single genes.
  • Microdeletions and/or microduplications may comprise up to 15% of all mutations underlying monogenic diseases.
  • e.g. Miller-Dieker Syndrome (MDS)
  • Minimal common region between multiple deletion cases will narrow down the causitive disease gene for the disorder
61
Q

How can cytogenetic inversions lead to disease gene identification?

A
  • Chromosomal inversion reported 7q22.1 ;7q31.1 in autistic siblings.
  • Inversion breakpoints mapped using FISH, distal region fell into 2Mb gene desert
  • Prox. breakpoint may disrupt coding seq/reg elements of several cytochromeP450 genes.
  • In addition, the distal inversion breakpoint does show significant association with multiple SNPs which could have some positional effect in the regulation of a distant gene.