Linkage Analysis Flashcards

1
Q

What is genetic variation?

A

• Genetic variation refers to differences in the DNA sequence between individuals in a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can variation arise?

A

• Variation can be inherited or due to environmental factors (e.g. drugs, exposure to radiation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What effects can genetic variants have?

A

Alteration of the amino acid sequence (protein) that is encoded by a gene
Changes in gene regulation (where and when a gene is expressed)
Physical appearance of an individual (e.g. eye colour, genetic disease risk)
Silent or no apparent effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why is genetic variation important?

A
  1. Genetic variation underlies phenotypic differences among different individuals
  2. Genetic variations determine our predisposition to complex diseases and responses to drugs and environmental factors
  3. Genetic variation reveals clues of ancestral human migration history
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the 3 mechanisms of genetic variation?

A

• Mutation/polymorphism: errors in DNA replication. This may affect single nucleotides or larger portions of DNA
Germline mutations: passed on to descendants, occurs in gametes and is passed on from parent to offspring
Somatic mutations: not transmitted to descendants. This occurs in a single cell of the body and is not inherited – depending on the gene effected it may lead to cancer
de novo mutations: new mutation not inherited from either parent. They occur spontaneously, either in one of the parental gametes or in the fertilized egg during early embryogenesis. They are not inherited, but can subsequently be passed onto the next generation
• Homologous recombination: shuffling of chromosomal segments between partner (homologous) chromosomes of a pair, resulting in new allele combinations. But importantly, this process can be utilised in linkage analysis to track the inheritance of chromosomal segments and determine the likely location of a disease gene
• Gene flow: the movement of genes from one population to another (e.g. migration) is an important source of genetic variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Compare a mutation from a polymorphism

A
  • A mutation is a rare change in the DNA sequence that is different to the normal (reference) sequence. The ‘normal’ allele is prevalent in the population and the mutation changes this to a rare ‘abnormal’ variant
  • By contrast, a polymorphism is a DNA sequence variant that is common in the population. In this case no single allele is regarded as the ‘normal’ allele. Instead there are two or more equally acceptable alternatives
  • The arbitrary cut-off point between a mutation and a polymorphism is a minor allele frequency (MAF) of 1% (i.e. for a variant to be classed as a polymorphism, the least common (minor) allele must be present in ≥1% of the population)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When does genetic recombination occur?

What is genetic recombination?

A

Genetic recombination occurs during prophase I, when the two homologous chromosomes (i.e. maternal and paternal) line up together.

Homologous chromosome pair with each other and undergo genetic recombination, in which DNA is cut and then repaired, which allows them to exchange some of their genetic information. A subset of recombination events results in crossing over, which creates physical links known as chiasmata between the homologous chromosomes.
These crossing over events result in the production of recombinant chromosomes, which are highly informative and can be utilised for linkage analysis studies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is crossing over?

What does it result in?

A
  • Crossing over: reciprocal breaking and re-joining of the homologous chromosomes during meiosis
  • Results in exchange of chromosome segments and new allele combinations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define genotype, phenotype and alleles

A
  • The genotype is the genetic makeup of an individual
  • The phenotype is the physical expression of the genetic makeup
  • Genes are found in alternative versions called alleles
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define homozygous, hetrozygousl, haplotype and locus

A
  • A homozygous genotype has identical alleles
  • A heterozygous genotype has two different alleles
  • A haplotype is a group of alleles that are inherited together from a single parent
  • Locus is any region in the genome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the 3 categories for genetic disease?

A
  • For linkage analysis, we will be focused on Mendelian / Monogenic disease. These are most often rare diseases that are highly heritable within families. The term ‘monogenic’ means that the disease is caused by one gene, i.e. a mutation in a single gene is sufficient to cause the disease. ‘Mendelian’ refers to the inheritance patterns observed by Gregor Mendel.
  • By contrast, Non-Mendelian / Polygenic diseases require ‘hits’ in multiple different genes. It is the cumulative effect of these multiple hits that leads to the disease.
  • Whilst Multifactorial diseases result from the combination of genetic and environmental factors, e.g. someone with a genetic predisposition to heart disease may be able to counteract this with a good diet, exercise, low alcohol, no smoking, etc. whereas different lifestyle choices (drinking, smoking, poor diet, etc.) would be more likely to cause disease.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe the penetrance vs variant frequency graph

A

On image

At the other end of the spectrum is Polygenic (many genes) / Common complex disease. In this case, common variants each have low penetrance and it is the cumulative effect of multiple variants in different genes that cause the disease.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is linkage analysis?

What is the main assumption of linkage analysis?

A
  • Linkage analysis is a method used to map the location of a disease gene in the genome
  • The term ‘linkage’ refers to the assumption of two things being physically linked to each other

The major assumption in linkage analysis: genetic markers that are in close proximity to our disease gene will be co-inherited together.

  • Therefore, ‘linkage’ refers to physical proximity between two loci
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

For Linkage analysis what are the 2 types of genetic maps?

A

For linkage analysis, we use two different types of maps: genetic maps and physical maps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do genetic maps provide?

A

Genetic maps tend to provide information about blocks or regions of a chromosome – this is similar to the zones on a tube map:
• We might say that we live in zone 3, for example – this provides some information on distance relative to another zone
• But the exact position of each station within a zone is not so important – this is the same with genetic maps

A genetic map shows the approximate map distance that separates any two loci and the position of these loci relative to all other mapped loci.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What do physical maps provide?

A

By contrast, physical maps provide more precise information on physical distance – this is similar to the tube stations
• In this case, the exact location of a station (relative to any other station on the line) is important
• We can use physical maps to calculate precise distances between two stations, as we know their exact location

Physical maps indicate the precise location of a specific locus (e.g. gene or genetic marker). Positions can be defined to the individual base pair, or more broadly by Megabase (Mb) positions.

17
Q

Why can we use recombination frequencies to produce genetic maps of all the loci along a chromosome?

A

Because the frequency of recombination between two loci is roughly proportional to the chromosomal distance between them, we can use recombination frequencies to produce genetic maps of all the loci along a chromosome and ultimately in the whole genome

18
Q

What is genetic linkage?

When are alleles likely to be inherited together?

A
  • Genetic linkage is the tendency for alleles at neighbouring loci to segregate together at meiosis
  • Cross-overs are more likely to occur between loci separated by some distance than between loci close together on the chromosome
  • Therefore to be linked, two loci must lie very close together
  • A haplotype defines multiple alleles at linked loci. These chromosomal segments can be tracked through pedigrees and populations

On image

19
Q

What are the methods of genetic linkage?

A

 Genotype multiple genetic markers across the genome
 Genotype multiple family members from families with the genetic trait
Identify which genetic markers co-segregate with the disease (phenotype)
(i.e. which haplotypes are the same in all affected family members)
 These genetic markers are therefore ‘linked’ to the disease gene
–> This indicates where in the genome the disease gene is likely to be located
NB: further work is needed to identify the gene and disease-causing mutation!
- Genetic markers are genotyped across the whole genome, for multiple family members (ideally from many different families)
- Using linkage analysis software, we can identify which genetic markers co-segregate with the disease or phenotype. This will be discussed in more detail in Part 2
- By identifying shared haplotypes in affected family members, we can determine where in the genome to search for the disease gene

20
Q

Compare microsatellites and SNPS

A

On image

21
Q

What is microsatellite genotyping?

A

Microsatellite genotyping is a PCR-based method that is used to amplify highly repetitive regions of the genome. PCR primers are located outside of the repetitive element and are used to amplify the full microsatellite region.

Different numbers of repeat units (i.e. CA or GATA) produce different length PCR products, each of which differ by one repeat unit

  • For a CA repeat, PCR fragments will differ by 2 nucleotides
  • So for locus ‘A’ above, there are 4 possible alleles across the individual #1 and #2
  • Individual #1 has genotype 2,5 (allele A2 has 2 repeats – CACA, allele A5 has 5 repeats - CACACACACA)
  • Individual #2 has genotype 3,4 (allele A3 has 3 repeats, allele A4 has 4 repeats)
  • For a GATA repeat, PCR fragments will differ by 4 nucleotides
  • The PCR fragments are then electrophoresed through an acrylamide gel and the different numbers of repeat units is represented by the difference in size of the PCR bands
  • Primers for microsatellite analysis are often fluorescently tagged to allow multiple markers to be electrophoresed at the same time – the different PCR products can then be distinguished by colour. This is discussed in more detail on the next slide.

This process is still used for DNA analysis – e.g. paternity testing, forensics – as testing of 13 polymorphic microsatellite loci is generally sufficient to identify a specific individual

22
Q

What is genotyping microsatellites used in?

A
  • DNA fingerprinting from very small amounts of material
  • Standard test uses 13 core loci making the likelihood of a chance match 1 in three trillion
  • Paternity testing
  • Linkage analysis for disease gene identification
23
Q

What is Fluorescent genotyping?

A

The figure on the left shows an example of microsatellite genotyping using fluorescently-tagged primers for amplification by polymerase chain reaction (PCR).
• The different peaks represent different PCR products, with smaller fragments on the left and larger fragments on the right. Fragment sizes in bp are marked below each peak
• Because they are highly polymorphic (i.e. have many alleles), each microsatellite marker covers a range of fragment sizes (typical range spans 20-40bp)
• Each peak represents one allele: single peaks are homozygous; double peaks are heterozygous for the marker
• By using different coloured fluorescent tags (e.g. blue, green, yellow) and amplifying different sized fragments, PCR products can be pooled for multiplex analysis

24
Q

What is SNP genotyping used for?

A
•	Linkage analysis in families (affected vs unaffected relatives)
 	homozygosity mapping (autosomal recessive) and mapping of Mendelian traits
•	GWAS in populations (unrelated cases vs matched controls)
 	non-Mendelian disorders and multifactorial traits
25
Q

Describe the coverage of the human genome

A

As microsatellite-based panels have an average coverage of 9 cM, there are some areas of the genome that are more poorly covered.
If the disease gene lies in one of these poorly covered regions, it may not be detected by linkage analysis.
SNP arrays provide excellent coverage of the genome by comparison to microsatellite-based linkage panels.
However, individual SNPs are less informative as they are biallelic (so less polymorphic than microsatellites). This may lead to more ambiguity when building haplotypes.
Therefore, microsatellites are often used for refinement mapping of the critical linkage interval once the region has been detected by genome-wide analysis.

26
Q

Describe Linkage mapping using genetic markers

A
  • Uses an observed locus (genetic marker) to draw inferences about an unobserved locus (disease gene)
  • As you have seen in the previous slide(s), we can use the genotypes of genetic markers (observed loci) to draw inferences about the disease gene, which we have not yet identified (unobserved locus).
  • So if we genotype lots of genetic markers across the genome, we can track chromosomal segments within a family (or multiple families) so test for co-segregation with the disease gene.
  • In the figure above, each marker is represented by M# and the disease gene is located somewhere between markers M3 and M4.
27
Q

How do you build a haplotype?

A

On document

28
Q

What is pedigree analysis

A

On document

29
Q

How can we analysis linkage statistically?

A

 The probability of linkage can be assessed using a LOD score
 LOD = logarithm of the odds score
 Assesses the probability of obtaining the test data if the two loci are linked, to the likelihood of observing the same data purely by chance
 i.e. calculates a likelihood ratio of observed vs. expected (no linkage, θ=0.5)
(θ = recombination fraction, NR = number of non-recombinant offspring, R = number of recombinant offspring)
 Recombination fraction is the proportion of recombinant births (i.e. R / NR+R)
 The higher the LOD score, the higher the likelihood of linkage

30
Q

What do the results of linkage analysis provide show?

A

 LOD scores can be calculated across the whole genome using genotype data for many genetic markers in multiple members of a family
Parametric analysis specifies the pedigree structure and inheritance pattern (model)
Non-parametric analysis detects allele sharing between affected individuals
 LOD scores are additive – different families linked to the same disease locus will increase the overall score
 A LOD score ≥ 3 is considered evidence for linkage
Equivalent to odds of 1000:1 that the observed linkage occurred by chance
Translates to a p-value of approximately 0.05
A LOD score ≤ -2 is considered evidence against linkage

A linkage peak with LOD score ≥3 provides significant evidence for linkage.
- These peaks are dependent on the statistical power within the families being studied
- Larger pedigrees (or many smaller pedigrees) contain more power to detect LOD scores of statistical significance
In the absence of large families, linkage peaks may be smaller but may still provide suggestive evidence for linkage
- LOD scores over zero should be considered as suggestive
LOD scores ≤ -2 provide strong support against linkage. These regions of the genome can be excluded from further analysis as they are highly unlikely to contain the disease gene.