VL 4 Flashcards
What is a contingency table? And how to use it?
A contingency table is a table with count items. Each count is a number of cases of a certain level or sharing a given combination of levels.
Normally a contingency table is used on factors/categorical data but on continuous data can be used with cut, use a“good”
break ñ hint: use quantiles for cutting
How to access multidimensional table on R?
access of multidimensional tables as with matrices and dataframes using rectangular braces and n-1 commas. n is the number of dimensions.
Explain independence table.
Number of observations if there would be no dependency between the variables.
Expected= (sum Rowtotal * sum Columntotal) / Total
What is Pearson-Residuals?
Pearson residuals are standardized measures that quantify the discrepancy between observed and expected frequencies in a contingency table. They help identify cells with large discrepancies and provide insights into the association between categorical variables.
Pearson residuals are used in contingency table analysis to assess the association or independence between categorical variables. They help identify significant discrepancies, explore the pattern of associations, and provide insights into the strength and direction of the relationship.
The χ2 (chi-square) value
The χ2 (chi-square) value is a statistical measure used in contingency table analysis to quantify the discrepancy between the observed frequencies and the expected frequencies under the assumption of independence between categorical variables.
The χ2 value represents the overall discrepancy or deviation between the observed and expected frequencies in a contingency table. It measures how far the observed data deviates from what would be expected if the variables were independent.
A larger χ2 value indicates a greater discrepancy between the observed and expected frequencies, suggesting a stronger association or relationship between the categorical variables. Conversely, a smaller χ2 value suggests a weaker association or a closer adherence to the assumption of independence.
What would you use to plot 1D (1 dimensional) Graphics
- Pie Chart
- Bar-plo
- Dot chart
What would you use to plot 2D Graphics?
- Association plot (display residuals - immediatly focus on Pearson Residuals)
- Mosaic plot / Bar plot (display absolute numbers)
- Fourfoldplot
1) and 2) nee a contingency table.
What is the Passion distribution?
The Poisson distribution is a discrete probability distribution that is used to model the number of events that occur within a fixed interval of time or space, given a known average rate of occurrence. It is characterized by a single parameter, often denoted as λ (lambda), which represents the average rate at which the events occur. The distribution assumes that events occur independently and at a constant average rate throughout the interval.The Poisson distribution is often used to model rare or random events that occur over a fixed interval, such as the number of phone calls received in a call center in a given hour or the number of accidents that occur at a specific location in a day. It has applications in various fields, including insurance, telecommunications, and epidemiology.
The lower limit is zero but there is no upper limit.
What is the Chi-square distribution?
The chi-square distribution is a continuous probability distribution that arises in statistics and is used in various statistical tests and confidence interval calculations. It is typically used when working with categorical or count data.It is associated with the chi-square test statistic and is used to assess the significance of discrepancies between observed and expected frequencies.
The chi-square distribution is characterized by a parameter called degrees of freedom (df), which determines its shape.
df = (number of levels var1 - 1) * (number of levels var2 -1)
The shape of the chi-square distribution is right-skewed, and it takes only positive values. As the df increase, the distribution becomes more symmetric and approaches a bell-shaped curve.
What is the fisher test?
Fisher’s exact test is a statistical test used to assess the significance of association between two categorical variables in a 2x2 contingency table, especially when sample sizes are small or situations where the assumptions of the chi-square test (which is commonly used for large sample sizes) are not met. It calculates the exact probability of obtaining the observed distribution of data under the assumption of independence between the variables.
The CI in the fisher.test is Odds ratio.
Odds ratio?
The odds ratio is a measure that compares the likelihood of an event occurring in one group to the likelihood in another group. It quantifies the strength and direction of the association between two categorical variables.
event did occur / event did not occur.
ranges from 0 till Inf
probability of 0.5 == odds of 1.0
probability of 0.33 == odds of 0.5
probability of 0.75 == odds of 3
odds = probability / (1-probability)
odds-ratio = odds1 / odds2
(2x2 contingency table)
What are Effect sizes?
Effect sizes in statistics quantify the magnitude of relationships or differences between variables. They provide information about the practical significance of an effect beyond statistical significance. To use effect sizes, choose an appropriate measure, calculate it using relevant formulas, interpret its magnitude, and consider practical implications. Effect sizes enhance the interpretation of results and allow for comparisons across studies or contexts.
How to use Effect sizes?
To use effect sizes, consider the following steps:
1) Choose an appropriate effect size measure: Select the effect size measure that aligns with the research question and the nature of the variables being studied. For example, Cohen’s d is commonly used to measure the difference between means, while correlation coefficient measures the strength and direction of a linear relationship.
- Cohens w (larger contingency tables)
- Cohens h (only for 2x2 tables)
- Odds Ratio
- Relative Risk
2)Calculate the effect size: Depending on the chosen effect size measure.
3) Interpret the effect size: Assess the magnitude of the effect size using established guidelines or benchmarks.
4) Consider practical significance: Alongside statistical significance, evaluate the practical significance of the effect size. Consider the implications of the effect in real-world terms, such as the clinical relevance, practical impact, or meaningfulness of the findings.
Significancy (α) and Type I/II errors.
- α is a decision threshold or significance level
- mainly used in science: α = 0.05
- but this is completely arbitrary
- lowering α ñ less false positives, more false
negatives - increasing α ñ less false negatives but more false
positives - rejecting α with true H0 ñ type I error
- accepting α with false H0 ñ type II error
- α sets the probability of getting a type I error