quiz 3/13 Flashcards
What are the three main types of distance measures covered?
- Distances for numerical data
- Distances for proportions
- Distances for presence-absence data
When should you use distance measures for presence-absence data?
- When data is organized as a set of features for different individuals/samples/populations.
- Examples include:
- Whether species are present (1) or absent (0) in different locations.
- Whether viruses express certain genetic markers.
- Whether bodies of text contain specific words.
How is presence-absence data summarized?
- In a contingency table with counts:
- a = Features present in both samples
- b = Features present in Sample A but absent in Sample B
- c = Features present in Sample B but absent in Sample A
- d = Features absent in both samples
- n = Total features (a + b + c + d)
How can similarity be derived from a distance measure?
- Similarity is often computed as:Similarity = 1 - Distance
- This works for presence-absence data and proportions.
What does the Mantel Randomization Test do?
- It compares two distance matrices to see if their distances are correlated.
- Tests if one type of distance (e.g., economic indicators) relates to another (e.g., health outcomes).
What are some real-world questions the Mantel Test can answer?
- Do economic indicators impact health outcomes?
- Does geographic distance affect genetic similarity?
- Do news announcements influence stock market fluctuations?
How does the Mantel Test determine significance?
- Compute the similarity index for all possible random permutations of one of the matrices.
- Compare the actual similarity to the distribution of randomized similarities.
- If the observed similarity is much higher than expected by chance, we reject the null hypothesis.
What do the Mantel Test results tell us?
- If two distance matrices are significantly correlated, their distances are related.
- If not, the patterns in one matrix do not predict patterns in the other.
What is the formula for the Simple Matching Coefficient (SMC)?
a+d / a+b+c+d
Counts both presence (a) and absence (d) as agreement.
What is the formula for Jaccard Similarity?
a/(a+b+c)
Does not count shared absences (d) as similarity.
What is the formula for Sørensen-Dice Similarity?
2a/(2a + b +c)
Gives twice the weight to shared presences (a) compared to Jaccard.
What is the formula for Ochiai Similarity?
a / (sqrt((a+b) * (a+c)))
Similar to Jaccard but accounts for sample sizes using a square root.
When should i use ochiai similarity eqn?
When massive difference in samples and groups
when do you want to use sorensen dice?
when a is small
when do you want to use jarrcard?
when a is not small
when do you want to use simple matching?
when you think the absences (d) mean something