Statistics Flashcards

1
Q

What is descriptive research?

A
  • Aim: to describe characteristics of a sample (what kind, how much etc)
  • Used to summarise, organise and simplify sample data
  • Often based on measurement of a single variable (univariate statistics)
  • Relies on measures of central tendency, frequencies, spread, distributionel shape, etc
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is inferential research?

A
  • Null hypothesis testing
  • Aim: to infer characteristics of the population
  • Often interested in multiple variables (bivariate, multivariate) - Relies on a wide range of different tests (e.g. correlation, regression, t-tests, ANOVA, chi square etc.)
  • Allows us to make probability statements about how confident we can be that our sample findings reflect the ”true” of things
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Level of measurement: What are the two main types of variables?

A
  • Categorical
  • binary (2 levels)
  • nominal (3+ levels)
  • ordinal (ordered, no equal intervals)
  • Continuous (Interval, ratio)
  • Interval (ordered, equal intervals, no absolute zero)
  • Ratio (ordered, equal intervals, absolute zero)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can you keep error to a minimum?

A
  • By making sure we use careful sampling strategies and use measures that are valid and reliable
  • Validity+reliability=credibility
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the critical values of z-scores?

A
  • 95% of z-scores lie between -1.96 and 1.96
  • 99% of z-scores lie between -2.58 and 2.58
  • 99.9% of z-scores lie between -3.29 and 3.29
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the z-score represent?

A
  • The distance a particular observation is away from the mean, measured in standard deviations
  • The standard normal distribution has a mean of 0 and a standard deviation of 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the two ways you can carry out inferential hypothesis-based research?

A
  • Correlational research (observing what naturally happens without interfering)
  • Experimental research (manipulationg one variable and observing the effect on another variable – can be used to infer cause/effect)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the two types of experimental designs?

A
  • Independent/between subject (different participants in different groups)
  • Dependent/repeated measures (same participants exposed to all conditions)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is systematic variance?

A
  • Variation due to genuine effect
  • Variance that can be explained by our model
  • Signal/effect - What we want to measure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is unsystematic variance?

A
  • Noise/error
  • Small differences in outcome due to unkown factors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the most important formula of all? :-)

A
  • outcome=(model)+error
  • the way that effect and error is measured varies for each type of statistical test
  • But for a test to be ”significant”, effect should be considerably greater that error (chance)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the null hypothesis?

A
  • What we actually tests in statistics
  • Assumes that there is no effect, then try to reject this
  • H0: no effect in the population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the alternative hypothesis?

A
  • What we’re really interested in, when trying to reject the null hypothesis
  • Can be:
  • Non-directional: H1: There is an effect in the population
  • Directional: H1: There is this effect in the population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are significance tests for?

A
  • For determining whether to reject or fail to reject the null hypothesis
  • For determining by how many percents confidence we reject the null hypothesis (typically 95%, 99% or 99.9%)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the z-distribution, t-distribution, F-distribution etc?

A
  • Test statistics
  • A statistic for which we know how frequently different values occur
  • Theoretical sampling distributions that assume the null hypothesis
  • Test statistic=variance explained by the model (effect)/variance not explained by the model (error)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the confidence level for p<.05, p<.01 and p<.001?

A
  • P<.05=95%
  • P<.01=99%
  • P<.001=99.9%
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

If p is high (p>.05)…

A
  • … the null applies! :-)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

If p is low (p<.05)…

A
  • … the null must go! :-)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the relationship between critical values, significance and confidence?

A
  • As critical value increase (gets further away from null), confidence increases
  • As confidence increases, p (probability of making a type 1 error) decreases
  • Confidence+p=1.0 or 100%
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are critical cut-offs dependent on?

A
  • Type of test - 1 vs 2 tailed significance
  • P level
  • Degrees of freedom (calculated different for different tests)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a type I error?

A
  • False positive
  • Saying there is an effect when there isn’t
  • ”You’re pregnant” to a man :-)
22
Q

What is a type II error?

A
  • False negative
  • Saying there isn’t an effect when there is
  • ”You’re not pregant” to a pregnant woman :-)
23
Q

What is NHSTP?

A
  • Null Hypothesis testing procedures
  • Black and white thinking -> limitations
  • We should take a middle ground, combining NHSTP and effect sizes
24
Q

What is the point of confidence intervals?

A
  • Can be useful in helping us to estimate the range within which the true population mean (or some other parameter) would fall in most samples
  • Typically 95% (p<.05)
25
Q

What is an effect size?

A
  • A standardized measure of the size of an effect
  • Comparable across studies
  • Not as reliant on the sample size
26
Q

What are the effect sizes we’ve learned?

A
  • Pearson’s r
  • Cohen’s d
  • R2
  • Odds ratio
  • Cramer’s V
27
Q

When and how should we test for normality?

A
  • For all parametric tests
  • Using the K-S/Shapiro-Wilks test
28
Q

When and how should we test for homogeneity of variance?

A
  • Independent t-test
  • Independent ANOVA
  • Using Levene’s test
29
Q

When and how should we test for sphericity?

A
  • Dependent ANOVA
  • Using Mauchly’s test
30
Q

What do you usually assume, when assuming normality?

A
  • That the sampling distribution of the parameter (e.g. means, or mean differences for a dependent t-test), or the residuals for regression, are normal in shape
  • Not that the distribution of the sample data must be normal
31
Q

What does the K-S and Shapiro Wilks tests tell you?

A
  • Significant test at p<.05=violation of normality
  • Non-significant test at p>.05= normality is OK
  • We want a non-significant test!
32
Q

What do you do, if there’s a difference between K-S and Shapiro-Wilks?

A
  • Use shapiro-wilks :-)
33
Q

What is the central limit theorem?

A
  • As sample size increases, the random sampling distribution tends towards a normal distribution regardless of the shape of the sample data
  • The tendency increases as sample size increases
  • Can usually be argued with a sample size of >30
  • For independent tests: at least 30 in each group
  • For dependent tests: at least 30 overall!
  • Can also be argued if K-S/Shapiro Wilks tests show problems with normality
34
Q

What if there’s a problem with normality? :-(

A
  • If large sample size, argue to meet normality assumptions on the basis of the central limit theorem
  • If not: Use a transformation (consider bootstrapping)
  • Or: use a non-parametric test
35
Q

What is the homogeneity of variance?

A
  • The assumption that the variance in the outcome variable is approximately equal at all levels (groupings) of the independent variable
  • If variance is approximately equal for all groups, there is homogeneity of variance
  • If variance is not equal across groups, there is heterogeneity of variance, and the assumption is vioalted
36
Q

When is the homogeniety of variance relevant?

A
  • Independent designs
  • For independent t (independent t-tests)
  • For F tests (independent ANOVA)
37
Q

How can we assess homogeneity of variance?

A
  • Using Levene’s test
  • Non-significant Levene’s test at p>.05=homogeneity of variance
  • We want a non-significant Levene’s test!
38
Q

What if we violate the assumption of homogeneity?

A
  • For independent t-tests: if Levene’s test is significant, meaning there is heterogeneity of variance, we should report the t-statistic and degrees of freedom from the equal variances NOT assumed row in SPSS output
  • For Independent ANOVA: If Levene’s test is significant, meaning there is heterogeneity of variance, report corrected F and df values such as Welch’s F
39
Q

What is the assumption of Sphericity?

A
  • Similar to the assumption of homogeneity, but for repeated measures designs
  • The variances of the differences between groups are expected to be equal
40
Q

How do you test for sphericity?

A
  • Calculating the differences between each pair of conditions - Calculating the variance of these differences
  • Determining if the variances are approximately equal
  • (If variance 1=variance 2=variance 3…., the assumption of sphericity is met)
41
Q

How can we assess sphericity?

A
  • Using Mauchly’s test
  • Non-significant Mauchly’s test p>.05=sphericity
  • We want a non-significant Mauchly’s test!
42
Q

What if there’s a violation of sphericity?

A
  • If Mauchly’s test for sphericity is significant at p<.05, you should report your findings from a corrected row in the SPSS output

- (e.g. Greenhouse-Geisser or Huynh-Feldt correction)

43
Q

Why do assumptions matter?

A
  • Many of the most common statistical tests (parametric tests) are only reliable if these assumptions are satisfied or corrections are made
  • If we use uncorrected parametric tests with problematic data, there is a greater risk of drawing inccorect conclusions (type I error, especially)
44
Q

What has more power? Non-parametric or parametric tests?

A

Parametric tests!

45
Q

What are the key factors in determining which test to use?

A
  • Aim of research
  • Level of measurement of IV and DV (categorical vs continuous)
  • Research design (for group tests - independent vs repeated measures)
  • Normality (e.g. K-S tests)
  • Sample size (to argue CLM for independent tests: 30 in each group, to argue CLM for dependent tests: at least 30 overall)
  • Homogeneity of variance and Sphericity
  • Post hoc tests (if ANOVA)
46
Q

What is correlation?

A
  • Determining how two continuous variables are related
  • E.g. what relationship, if any, exists between number of hours spent studying for an exam and exam performance?
  • Correlation DOES NOT equal causation :-)
47
Q

What is the most widely used correlation coefficient?

A
  • Pearson’ r
  • Ranges from -1 (perfect negative relationship) to +1 (perfect positive relationship
48
Q

What is Cohen’s rule of thumb for Pearson’s r?

A

r_>_.1 (small effect)

r_>_.3 (medium effect)

r_>_.5 (large effect)

49
Q

What is Cohen’s rule of thumb for Cohen’s d?

A

d_>_0.2 (small effect)

d_>_0.5 (medium effect)

d_>_0.8 (large effect)

50
Q

What is Cohen’s rule of thumb for Odds Ratio?

A

OR > 1.49 (small effect)

OR > 3.49 (medium effect)

OR > 9.0 (large effect)

51
Q

How do you generally report results using APA format?

A

1: State the type of analysis you conducted
2: state the overall finding in normal words (including mean or Mdn, SD)
3: report the DF and test statistic (F, t etc)
4: report the significance level (p)
5: report effect size (including direction) (fx r, d, OR)