Stats Flashcards
Multiple regression assumptions:
OV - Continuous, PV = Continuous/dichotomous.
What are the 2 types of variables?
Qualitative (categorical):
- Data occur when we assign objects into labelled categories.
- No natural ordering.
- Measured on ordinal/nominal scale.
Quantitative (Measurement):
- Measured on interval/ratio, ordinal scale.
Numerical
Types of kurtosis and describe their shape:
Leptokurtic - heavy tails, score centred in the middle.
Platykurtic - Light tails, score spread across the distribution.
What are variables?
Measured constructs that vary across entities in the sample.
What are parameters?
These are estimated from the data and are constructs believed to represent some fundamental truth about the relations between variables in the model.
When would you use a point biserial correlation?
Used when one variable is a discrete dichotomy (i.e gender).
Are directional hypothesis possible with chi square tests?
Yes, but only when you have 2x2 design. If its larger than this the chi square will be testing a compound hypothesis.
T-test assumptions:
Both between and within:
- Normal distribution
- Interval level data at least.
Between:
- Homogeneity of variance.
- Independence of scores.
Pearson’s correlation assumption:
What to do when they are violated?
- Continuous data (interval/ratio).
- Independence of scores.
- Linear relationship between variables.
- Observations are from random samples with normal
distribution.
When violated - Boostrap CI, or use non-parametric alternatives such as Spearman’s R or Kendall’s Tau
Simple Liner regression assumptions:
OV = Continuous, PV = Dichotomous or continuous. Independence of scores. Independence of errors. Normal distribution. Non-zero variance. Linearity Homoscedasticity.
One-way ANOVA assumptions.
Normal distribution.
Homogeneity of variance.
Independence of scores (for between groups design).
ANCOVA assumptions:
Same as normal ANOVA +
- Independence of the covariate and the IV (can’t be highly correlated).
- Homogeneity of regression slope (regression line fits to the entire data set regardless of groups).
Different measures of error:
Standard deviation. Sum of squares. Deviance. Variance. Standard error.
What does a 95% confidence interval tell you?
Confidence interval is the likelihood that they will contain the population parameter.
- 95 out of 100 samples will contain the population parameter.
What are the different types of hypotheses?
One tailed (directional hypothesis).
- Only focuses on one tail of the distribution (5% in the direction that the hypothesis states).
- Reject the Ho for scores that fall within this 5% region.
Two tailed (non - directional).
- Considers both end of the distribution (2.5% either end).
- Reject the Ho for extreme sores in both directions (in the 2.5% region).
Types of error:
Type 1 error - state that there is a significant effect when in reality there isn’t (falsely reject the Ho).
- Acceptable level for this error is Alpha level .05.
Type 2 error - When you state there is no effect in the population when in fact there is (too quick to accept the Ho).
- Acceptable level for this is p = level .2 (Beta is the probability level).
Effect size
How close the predictions of the model are to the observed outcomes.
- These are standardised measures which are comparable across measures (this is why they are reported in meta analyses).
- Not as reliant on the sample size at significance level are.
- Larger effect size = lower type 1 error rate.
Power:
Formula for power:
Ability for a test to successfully find an effect and correctly reject the H.
1 - Beta (beta = .2).
- 1 - probability of making a type 2 error.
1 - .2 = .8 - this would indicate 80% chance of detecting an effect if it exists.
Parametric/normal distribution assumptions:
Linearity/additivity.
Normality.
Homogeneity of variance.
Independence of errors.
How to reduce bias when assumptions are met before resorting to non-parametric tests.
- Boostrap confidence intervals.
- Transform data (apply mathematical function to the scores i.e. log transform to get data in a form that can be modelled.
- Trim data (delete certain amount of extreme scores).
- Windsorzing (substituting outliers with the highest value that isn’t an outlier)