Quiz 3 Flashcards
Quiz 3 for Stats I
Flip side of power ( __%) is ___ (__%)
80%, type II error / beta (20%)
directional hypothesis is to _____ as non-directional is to ____
one tailed, two tailed
What is the mean of the distribution of a z statistic when H0 (null) is true
mean of zero
When null is false, H1 is ____
true
Power
- The greater the number of subjects in a sample, the more power (generally)
- Though the incremental increase in power becomes smaller as the sample size increases
- Power analysis is a statistical method to calculate the sample size needed to achieve desired power
Influences on Power
- Significance level (alpha), more lenient alpha = more power
- One-tailed vs two tailed test (one tailed has more power)
- As standard error decreases, there is more power
more lenient alpha = _____ power
more
which has more power: a one-tailed t-test or a two tailed
one tailed
As standard error decreases, power ________
increases
why is there greater likelihood for us to reject the null hypothesis when there is less standard error?
there is less overlap between the graphs of the null and alternative distributions
One-tailed test
Tests a directional hypothesis (specific direction of an effect)
E.g. change > 0
Two-tailed test
Test a non-directional hypothesis (doesn’t specify the direction of an effect
E.g. Change ≠ 0
Why does a one-tailed test provide us with greater power?
One tailed test provides us with greater power because the critical value is smaller than the cutoff for a two-tailed test
Researchers tend to use a two tailed test in order to ______.
discriminate between a zero effect and a negative effect
How are one-tailed tests often misused?
One-tailed tests are often misused in a way which increases type 1 error rate.
If people incorrectly reject the null after there is the illusion of an effect at the wrong side of the curve for a directional hypothesis.
Effect Sizes
- A standardized measure of the size of an effect:
- Standardized =
comparable across
studies- Not (as) reliant on the sample size - Allows people to objectively evaluate the size of the observed effect
- Amount that two
populations do not
overlap
- Amount that two
E.g. cohen’s d, correlation
Types of effect sizes
The family of effect size measures has been categorized into TWO broad groups:
Measures of mean differences (e.g. Glass’s Delta, Cohen’s d)
Measures of strength of relations (e.g. r, R^2, eta squared)
Two groups of measurement of effect sizes
- Measures of Mean differences
- Measures of strength relations
Measures of mean differences examples
Glass’s Delta, Cohen’s d, Hedges g
Measures of strength of relation examples
r, R^2, eta squared
Measures of Mean Differences
Largely calculated in the same manner
E.g. d = mean1/mean2 / population SD
We don’t know the population SD,
Differ in how they estimate the population SD
Glass’s delta
- uses the SD using the control group because it is “untainted” by the treatment
- Strength of this metric depends on the size of the control group, the larger the control group, the more appropriate the SD as it estimates the population SD
Cohen’s d
- combines the SD of both groups
- Because cohen’s d pulls information from sd of both groups, when the sd’s across groups are very different, it makes more sense to use glass’s delta
Hedges g
- Multiply cohen’s d with some scaler
- Cohen’s d tends to overestimate the amplitude of effect in smaller samples
- Corrects for cohen’s d bias, so better to use in smaller samples (20 or less)
As we have less variability, effect size becomes _____, as SD is in denominator, even with the same degree of mean difference
bigger
Measures of Strength of Relations
- Effect sizes based on variance explained
- These effect sizes estimate the amount of the variance in an outcome variable that is explained or “accounted for” by the model/predictor variables
- E.g. include r, which is the correlation coefficient and r^2 which is the coefficient of determination
- On a range of ±1 for pos/neg correlation
- Tells us the size/strength of the association and the direction of the association of the two variables
Some useful guidelines for the magnitude of effect sizes
r = .1, d = .2 (small effect):
the effect explains 1% of the total variance
r = .3, d = .5 (medium effect):
the effect accounts for 9% of the total variance
r = .5, d = .8 (large effect):
Importance of Effect Size
- Estimates of anticipated ES can be used to project the sample size that would be adequate for detecting statistically significant results → power analysis
- They enable researchers to inform judgment about the practical significance of the study
- Because effect sizes are standardized measures of the size of mean differences or strength of relations, they are used to compare the results of different studies with one another and to be used in meta-analysis
- A qualitative approach would be a systematic review
Power Analysis
Estimates of anticipated ES can be used to project the sample size that would be adequate for detecting statistically significant results
______ enable researchers to inform judgment about the practical significance of the study
effect size
Because _______ are standardized measures of the size of mean differences or strength of relations, they are used to compare the results of different studies with one another and to be used in _______.
effect sizes, meta-analyses
Clinical Significance
Jacobson, Follette, and Revenstorf (1984)
- Clinically significant change conceptualized as return to normal functioning
- Clinical significance → the extent to which therapy moves someone outside the range of dysfunctional population or within the range of functional population
Significance testing does not tell us about _____.
effect size
Three cutoffs for clinical significance
a. 2 SDs above the dysfunctional mean
b. 2 SDs below the functional mean
c. Midpoint between the means of the functional and dysfunctional groups
Cutoff C, the midpoint between the means of the functional and dysfunctional groups is usually considered better criterion for clinical significance, BUT it requires _______.
- normative data from the general population
- In the absence of normative data, we cannot us cutoff b or c
- when there is norm data, but the two distributions do not overlap, it is more appropriate to use cutoff b than c
When there is norm data, but the two distributions do not overlap, it is more appropriate to use cutoff _____ than _____
b (2 SDs below the functional mean) than c (the midpoint between the means of the functional and dysfunctional groups)
Reliable Change Index Equation
RCI = (score 2 - score 1) / SEdiff
RCI = (score 2 - score 1) / SEdiff meaning
- What this formula is capturing is whether the degree of change the individual has achieved is greater than what we would expect to see by random chance
Interpreting RCI Scores
RCI ≥ 1.96 : reliable change (not due to a measurement error)
RCI < 1.9: unreliable change (could be due to a measurement error)
Sometimes hard to use as a dichotomous measure, researchers can also use it as a continuous measure
Cleaning Data
- Organizing / making sure data is entered correctly
- Some say this should take up the majority of our time
A method to do this can be double entry- To compare the two
entries of data and check
the accuracy of entry
- To compare the two
- Another method is running descriptives
Missing Data:
- Begins with the question as to why data are missing
- Knowing the source of the missing data is essential to know what to do about it and to achieve good statistical measures
- Randomness in missing data just happens and is less bad
Rubin’s Taxonomy of Missing Data
- MCAR
- MAR
- MNAR
MCAR
- Missing Completely At Random
- The probability that an observation (Yi) is missing is unrelated to the value of Yi or the the value of any other variable (e.g. X)
- Missingness is NOT related to the characteristics of the participant/case (e.g. a questionnaire is lost or there is a data entry error)
- The best case-scenario of missingness because it is random, and will not lead to bias in the estimated parameters
- Can analyze and still have valid results
Downside of MCAR
can artificially decrease sample size → loss in statistical power
MAR
- Missing At Random
- The probability than an observation (Yi) is missing is unrelated to the value of Yi but is related to the value of other variables (e.g. X)
- As long as another variable in the data set can explain it → MAR
- E.g. those who are younger age more likely to skip income question on a survey
MNAR
- Missing Not At Random
-The probability that an observation Yi is missing is related to the value of Yi
- MOST problematic
- When we have data that are MNAR, it is nonignorable
- Most difficult to address
- Run two sets of analyses
MNAR requires a _____ analysis
what if
Two sets of analyses we run for MNAR
- One for primary analyses to answer research question (comparing means, correlations etc)
- The other for modeling possible scenarios of the missingness → what-if analysis
- Then compare and see if results from primary analysis are consistent across different scenarios of MNAR
______% of missing data are considered acceptable with preceding with analysis of data
5-20%
Power of tests for MCAR depend on ______
sample size
MCAR test in SPSS
If NOT significant → MCAR
But does not tell us exactly what we want to know about the missing data
If significant → MAR or MNAR
Doesn’t tell us what variables have the missingness / violate MCAR
We need to do more work to identify that
MCAR vs. MAR Procedure
Create dummy variables for missingness
Binary variable 1= missing; 0 = not missing
If continuous → use t-test
If categorical → use chi-squared
When there are significant relations → data are likely MAR and not MCAR because other variables in the data set can be used to predict the missingness in the variable
Cannot necessarily rule out MNAR
MAR vs MNAR
- Measure some of the missing data (e.g. follow up on non-respondents)
- Then compare initial respondents and non- respondents on the variables with missing data
- Use scientific knowledge and knowledge of situations where MNAR may occur
- i.e. the more sensitive
the nature of the variable
you’re assessing, the less
likely participants are
likely to provide a
response
- i.e. the more sensitive
What do you do if you have data that is MCAR?
Options to delete data or estimate the missing values through single imputation
Deleting Data for MCAR Types
Listwise and Pairwise
Listwise Deletion
- aka casewise deletion
- Subjects who have any missing data points are entirely deleted
- Con → loss of power
Pairwise Deletion
- Subjects with missing data points on the particular analysis you are running are not included in that analysis, but are included for analyses for which they do have data
-
Con of Listwise Deletion
loss of power
Con for Pairwise Deletion
- parameter estimates will be based on slightly different sets of data
- Degree of inconsistency across different analys
Advantage of Pairwise Deletion over Listwise Deletion
- Makes use of all available data points
- Maintains largest possible sample size
Appropriate to use deletion methods if the data are MCAR or the percentage of missingness is ________.
5% or less
Estimate the missing values using single imputation
- Mean for items on a scale that are not missing
- Mean for the missing variable (using available data)
- Mean for a subsample (e.g. men)
- Multiple regression (with or without random error)
- Use regression estimate
to fill in missing value
- Use regression estimate
Single Imputation
Filling in the missing values with some single number
Con of Single Imputation
- While we increase the sample size, we are artificially reducing the variability in the data
- This is important because the data might be missing because they are different from the rest of the data set
- By plugging in the typical values from existing data points may not be the best/most accurate approach
- This can artificially increase the strength of relations between variables and increase bias in hypothesis testing → spurious results
To address limitations of single imputation:
Add random errors intentionally to add variability and avoid data that is too homogeneous
For regression substitution, we only want to consider doing it with _____________ in order to add variability and avoid spurious results
added random error
For Single Imputations
Mean substitution should be avoided because it is not as good as regression substitution
Two modern “gold-standard” methods for dealing with missing data:
- Maximum likelihood procedures
- Multiple imputation
Univariate Outliers
some data points that have really unusual or unlikely values on one variable
Multivariate Outliers
combination of unusual scores on at least two variables
We can check the presence of univariate outliers using ___________.
a box plot
Checking for univariate outliers using box plot
- 2-stage flag/process for outliers built into SPSS
- Values between 1.5-3 box links from either end of the box are marked as a circle, meaning a potential outlier
- Values further away from the three box links are in the “red zone” / “red range” and those values would be marked with an asterisk
- Indicates a more extreme outlier
Why use 1.5 box lengths to identify univariate outliers?
- Easy to see when transpose the box plot with normal distribution
- Note: Not the same as SDs
- The range of the box plot includes most of the data points—close to 99%, so this is why 1.5 box lengths is the first cut off for outliers
- So outside of the box plot range is an unlikely value
General approach to outliers
- Run analysis with and without the outliers
- If results do not change much, you can potentially keep the outliers and report results based on the data
- If the results do change → Trimming, Winsorizing, Transformation,
Ways to Deal With Univariate Outliers
Trimming, Winsorizing, Transformation
Trimming
- Remove outliers
- Makes sense when outliers occur for random reasons
- Con –> Reduce power
Con of Trimming
reduces power
Winsorizing
- Bring outliers back towards the mean
- Replacing outlier values with other value
- 90th percentile, 95th
percentile, sometimes 3
SD cutoff - 90th percentile replacement would be described as 10% Winsorized
For Winsorizing, a 90th percentile replacement would be described as _______.
10% Winsorized
Transformation
- Can help to pull unusual / high numbers towards other values
- E.g. square root, log
transformations
- E.g. square root, log
- Can help to make statistical assumptions (e.g. normality) work better
- Can reduce impact of a single point/outlier
Need to be extra careful with interpretation
For Trimming and Winsorizing, we do NOT want to do that with more than ______ of data.
5%
Note: Less than 5% of data should have little effect on p-value
- In the case of trimming, more than 5% can greatly reduce power
Multivariate Outliers Definition
A case that has an unusual combination of scores on 2 or more variables
Mahalanobis’ Distance
- A statistical procedure to discern if a case is an outlier
- defined as the distance of a case from the centroid
Centroid
point created by the means of all the variables