Wk 3 - Research Questions for Associations - Contingency Tables and Correlation Flashcards
What is a confidence interval?
A confidence interval defines a range of plausible values for the unknown population parameter that we are interested in making inferences about from our single observed sample statistic.
What data is required to construct a confidence interval?
- The observed test statistic
- The standard error for the sample statistic
- The critical test statistic (defined by alpha)
What is the level of confidence?
100 x (1 - alpha)
So with an alpha level of 0.05, the CI will be 95%
What does a confidence interval allow us to do?
Make inferences about a population parameter based on a sample statistic.
What does a 95% confidence interval tell us about the corresponding population parameter?
That we can be 95% confident that the range of values for the population parameter that corresponds to an observed sample statistic, will be between the upper and lower bound of the CI.
When does the value of p increase in a p-value function plot (when conducting multiple null hypothesis tests)?
The closer to the mean, the larger the p value. The further from the mean, the smaller the p value (and more likely p will be less than 0.05)
Why is a confidence interval more informative than a null hypothesis significance test?
Because it defines a range of plausible values for the population parameter we are interested in, rather than just a single value (as in NHST).
What can’t a p-value from a NHST tell us?
Can’t tell us about the range of values for the corresponding population parameter. The p value is only indirectly relevant when making inferences about the population, while the CI is directly relevant.
Is a confidence interval an expression of probability?
No! Doesn’t tell us that there’s a 95% chance a given population parameter will be between the upper and lower bound of CI.
Only in the long run, over repeated applications will 95% of CIs contain the population parameter.
Is a confidence interval an expression of probability?
No! Doesn’t tell us that there’s a 95% chance a given population parameter will be between the upper and lower bound of CI.
What kind of statistical techniques are used for ASSOCIATIONS?
- For categorical data, a contingency table, chi-squared and odds ratio
- For continuous data, correlation.
What kind of association is investigated using correlation?
Associations between continuous variables
What does an association imply?
A relationship between variables (a systematic co-occurence between two variables)
What is meant by categorical data?
Nominal or ordinal scales of measurement. Use of numbers to label categories is arbitrary.
What is meant by continuous data?
Values imply some sort of meaningful order. Low scores imply that a person has less of a construct (and vice versa for high scores)
What are the four components of a research question?
- A question mark
- Type of association
- identifying relevant population
- Defining and measuring constructs
What is chi-squared used to measure?
An association between categorical variables.
What is correlation used to measure?
An association between continuous variables.
What is meant by a contingency between two variables?
An association in which there is a dependency between the frequencies of one categorical variable and the frequencies of the other.
What is an independent relationship between variables?
No association.
What is a dependent relationship between variables?
An association in which the frequencies in one category co-occur with frequencies in another category.
What is a Chi Squared test
A null hypothesis significance test between frequencies in two categorical variables.
What does the null hypothesis for a chi-squared NHST say?
That there is no relationship between variables. That observed and expected frequencies are the same.
What does Chi Square statistic measure
The difference between observed & expected frequencies.
What is the formula for calculating chi squared?
The sum of (observed - expected scores)^2/ expected scores
How are expected frequencies determined?
row marginal frequency x column marginal frequency/
What is the effect of large effects and large sample sizes on Tobs?
The observed test statistic will be larger when the effect is bigger and/or the sample size is bigger (pref. both)
What is an Odds Ratio?
The probability of an event occurring in one variable relative to the probability of an second and different category occurring in a different variable.
What are odds?
The probability of some event occurring relative to it not occurring.
What are the odds of an event occurring?
P/ (1 - P)
What are the 5 properties of odds ratios?
- OR = 1 (equals no association, variables are independent of one another)
- OR can’t be negative (range from 0 - infinity)
- The further away from ‘1’, the stronger the association (i.e..closer to 0, or closer to infinity)
- Odds ratio is undefined if any cell value is zero
- The odds ration does not depend on which variable defines the rows and which one defines the columns of the contingency table
What is the formula for odds ratio?
a xd/ b x c
Can odds ratios be formed form contingency tables bigger than 2 x 2?
No
How is an odds ratio larger than 1 interpreted in terms of the variables in the contingency table?
The odds of the variable in the ‘leading cell’ (a) are more likely to occur than it not occurring.
Why is odds ratio useful as a measure of effect size?
Because the OR is
- a sample statistic
- an (unknown) population parameter
What does a CI that contains 1 tell us
That a null association is possible at a population level.
What does a CI that does not contain 1 tell us?
That the odds ration is significantly different from the null hypothesised value of 1.
It also provides a range of plausible values for the association at a population level.
How is an odds ratio that is less than one interpreted?
By inverting the odds ratio
= 1/odds ratio value (when this value is less than one)
How is a confidence interval for an odds ratio inverted?
By dividing the upper and lower bound into 1 and then swapping upper and lower bound around. eg. 1/lower bound value; 1/upper bound value.
How is an inverted odds ratio interpreted?
The odds are lower for the people in the ‘leading cell’ (a) by the value of the odds ratio. The confidence interval provides estimate that the odds are XX lower for people in category (cell a) than for the other category) between a certain range of values.
What cell determines the interpretation of the odds ratio?
The ‘a’ cell in the contingency table.
Is the sample odds ratio a biased or unbiased estimator?
Biased but consistent (estimates popn odds ratio more accurately as sample size increases).
The confidence intervals are not biased.
What does sample variance allow us to quantify?
The variability in construct scores
What is covariance?
The extent to which scores on measures of two constructs systematically vary together
What formula is used to calculate sample variance?
s^2 is the sum of squared deviation scores divided by the sample size -1 (df)
s^2 = sum (X - M)^2/ n-1
What is the formula for covariance?
s xy = sum (x - M)*(y - M)/n-1
What is a standardised covariance score called?
The pearson product moment correlation coefficient.
What is the formula for the pearson’s correlation?
Rxy = sum (Zx)*(Zy)/n-1
What value indicates no correlation?
Zero (the variables are independent of each other)
What value indicates a perfect correlation?
1
What range of values can the correlation coefficient be between?
Between +1 and -1
What does a correlation between 0 and -1 indicate?
A negative correlation (high scores on one variable are associated with low scores on the other)
What does a correlation between 0 and +1 indicate?
A positive correlation (high and low scores tend to be associated for both variables)
What is the advantage of using a standardised metric when calculating ‘r’?
One correlation can be compared to another because they are in a standardised metric.
What happens to the strength of association as r approaches +1 or -1?
The association becomes stronger.
How does a negative correlation look on a scatterplot?
Data points go from top left to bottom right.
How does a positive correlation look on a scatterplot?
Data points go from bottom left to top right.
What does the size of the correlation indicate?
It’s strength
What does the sign (+ or -) tell us about a correlation?
It’s direction
What theoretical probability distribution is associated with a sampling distribution of the correlation coefficients?
It is t-distributed
What happens to the t-distribution as the degrees of freedom increase?
It becomes closer and closer to a normal distribution
Why is ‘r’ a natural measure of effect size?
Because it tells us both the strength and direction of the association.
What is the null hypothesis for a sample correlation state?
That the population r (p) will equal 0
What are the degrees of freedom for a correlation?
n- 2 (one data point can’t move for each variable)
How is the observed test statistic calculated for a correlation coefficient?
Tobs = r - 0/standard error
What will produce a larger Tobs?
- A larger value for r (increases the numerator and reduces standard error)
- Bigger sample size (which reduces standard error)
What effect does r and sample size have on standard error?
Larger ‘r’ and larger sample size both reduce standard error.
When are you likely to get a small p value?
When there is a large t-value (larger sample size, larger ‘r’ and bigger T-obs all lead to smaller p)
What does a confidence interval tell us about a sample correlation coefficient?
It provides a range of possible values between which the corresponding population parameter may be found.
What does a Ci that contains a value of zero tell us?
That a correlation of zero is plausible at a population level and a significant correlation is therefore unlikely.
What happens to the size of the Ci as sample size increases?
The width of the CI decreases and becomes more precise.
What are the assumptions of the Pearson correlation coefficient?
That both variables are:
- Linearly related to each other
- That both variables are continuous
- Independence of observations
- Normally distributed
- Measured without error
- Unrestricted in their range
What kind of estimator is the Pearson Correlation Coefficient?
A biased but consistent estimator.
What happens to the Pearson correlation when its assumptions are violated?
It becomes more biased
What happens to the Pearson coefficient when scores on both variables are normally distributed.
Both point and interval estimates are unbiased.
What happens to the Pearson correlation when both variables are non-normally distributed?
The estimator for sample correlation remains unbiased, but the confidence interval becomes biased and inconsistent.