Statistics exam 4 Agresti Flashcards

Regression, non parametrics, ANOVA

1
Q

What is ANOVA and when is it used?

A

Analysis of variance

Comparing quantitative response variables that have a categorical explanatory variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between a one- two- three-way ANOVA?

A

One: 1 independent variable in a between groups design

Two: factorial 2x2 design

Three: factorial design 2x3x3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between variability between and within?

A

Between: distance between tops of distributions

Within: distance within a distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does var. between > var. within mean?

A

There is a true difference between the groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What type of distribution is used for ANOVA and how does it look?

A

F-distribution
- One right tail
- High F = small p value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the assumptions for an ANOVA test?

A
  • Quantitative variable in more than 2 groups
  • Independent random sampling
  • Equal standard deviations (largest sd < 2x smallest sd)
  • Normally distributed
  • Equal n (for now)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What do the hypotheses for ANOVA look like?

A

H0 = mu1 = mu2 = …. mu g
HA = at least 2 population means are different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the steps for calculating F statistic in ANOVA test?

A
  1. Calculate within variability
  2. Calculate between variability
  3. Fill in in F statistic formula
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you calculate the p-value in ANOVA testing?

A

1-F.DIST (F ; df1 ; df2 ; true)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the conclusion if p < alpha in ANOVA test?

A

At least 2 groups differ, but you don’t know which ones

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is MS and SS?

A

MS: mean squares = variability within and between

SS: sum of squares = MSg or MSe times the df1 or df2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the fisher method in ANOVA?

A

The confidence interval of ANOVA testing. If you have 3 groups, you have 3 intervals

This confidence interval is more narrow than the normal confidence interval for t distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why would you use the fisher method and not doing three times the t-distribution?

A

It capitalizes on chance. By doing the test over and over again, the chance of a type I error (alpha) increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the Bonferroni method?

A

Adviced alpha = used alpha / number of tests (K)

It corrects for capitalization on chance for doing t-tests over and over again

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is an alternative for the Bonferroni method?

A

Tukey method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When do you use non-parametric tests?

A

When central limit theorem isn’t met, because groups are too small. No normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do you deal with ties in non-parametric tests?

A

Average the ranks the ties would get

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the three types of non parametric tests and when do you use them?

A
  • Wilcoxon: non parametric t test for comparing 2 means
  • Kruskal Willis: non parametric anova test for between groups/factorial designs
  • Sign test: for paired observations/ dependence/ paired t-test / pre-posttest design / matched individuals
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the assumptions for the Wilcoxon test?

A
  • Rank ordered
  • 2 independent samples
  • No assumptions regarding the distribution
20
Q

What do the hypotheses for the Wilcoxon test look like?

A

H0: equal expected values for sample mean ranks and identical population distribution

H1: different expected values for sample mean ranks (two sided)

H1: higher/lower expected values for sample mean ranks (one sided)

21
Q

What distribution can you use for samples larger than 20 in a Wilcoxon test? What do you have to do in other cases?

A

Use z distribution if n >20

In other cases: W = average (treatment) - average (control). Read the P-value from a sampling distribution

22
Q

What is sample space in the Wilcoxon test? What is thought of these possibilities under H0?

A

All possible rank combinations.
All these possibilities are equally likely under H0

23
Q

What distribution does the Kruskal-Wallis test use?

A

Chi square distribution

24
Q

What are the assumptions for the sign test?

A
  • Small n, not normally distributed
  • Random sampling
  • Unequal values for each pair (no equal pre/posttest values)
25
What do the hypotheses of a sign test look like?
H0: P (+) = 0,5 H1: P (+) =/ 0,5 (two sided) H1: P (+) > 0,5 (one sided) H1: P (+) < 0,5
26
What distribution does the sign test use?
The normal z-distribution
27
What is the difference between a regression line and a correlation?
Regression line predicts the value for a response variable Correlation indicates strength of the association
28
What is extrapolation?
Using regression line to predict y for x outside of the range of the data
29
In what case is r = b?
If the data have the same variabilities for variables
30
What is a residual?
Distance between data and regression line
31
What happens to b and r when the scale changes?
b changes r doesn't change, because it's standardized
32
How do you calculate the correlation in excel?
Function CORREL(select both columns)
33
What is R squared?
Proportion of variation in y values that is accounted for by the linear relationship of x and y It describes the predictive power = proportional reduction in error
34
What is the case for R squared = 0?
All values of estimated y are the same (horizontal line)
35
Are correlation and regression line resistant to outliers?
No
36
What is a lurking variable?
Variable that influences association between variables of primary interest. It has the potential to be confounding
37
What is the Simpson paradox?
Interpreting association wrongly and not taking in account several classes within the association. Reversal of direction association after adjusting for lurking variable
38
What is regression towards the mean?
Extreme values tend to be less extreme over time R < 1: so y is always relatively closer to the mean than x is to its mean if x is 2 sd away and r = 0,5, y is 0,5 * 2 = 1 sd away
39
What is the difference between the residual and the total? How do you summarize this?
Residual = distance data to regression line Total = distance data to mean Summarize by squaring the sum of both totals (RSS and TSS) You look if the regression line predicts the data better than the mean
40
What does this mean: Sum (y-yhat)^2 < Sum (y-ymean)^2 or RSS < TSS? What does this mean for R square?
If RSS < TSS: strong association. The regression line is a better predictor - R square is large
41
What happens with R square when: RSS = TSS RSS = 0 0 < R < 1
RSS = TSS --> R square = 0 (b = 0) RSS = 0 --> R square = 1 (the best!) 0 0
42
What does R square = 0,5 mean?
The error using regression line yhat to predict y is 50% smaller than using ybar to predict y 50% of total variance explained Variance around regression line is 50% less than total variance
43
What is ecological fallacy?
Using correlation to predict values for a specific individual. This can be very dangerous
44
What are the assumptions for regression analysis?
- Population has linearity - Data is randomly gathered - For each x, y follows normal distribution - The standard deviation for y should be the same for all values of x
45
What distribution does regression analysis use?
T-distribution