Statistics exam 4 Agresti Flashcards by Guinevere Linders

What is ANOVA and when is it used?

Analysis of variance

Comparing quantitative response variables that have a categorical explanatory variable

How well did you know this?

Not at all

Perfectly

What is the difference between a one- two- three-way ANOVA?

One: 1 independent variable in a between groups design

Two: factorial 2x2 design

Three: factorial design 2x3x3

How well did you know this?

Not at all

Perfectly

What is the difference between variability between and within?

Between: distance between tops of distributions

Within: distance within a distribution

How well did you know this?

Not at all

Perfectly

What does var. between > var. within mean?

There is a true difference between the groups

How well did you know this?

Not at all

Perfectly

What type of distribution is used for ANOVA and how does it look?

F-distribution
- One right tail
- High F = small p value

How well did you know this?

Not at all

Perfectly

What are the assumptions for an ANOVA test?

Quantitative variable in more than 2 groups
Independent random sampling
Equal standard deviations (largest sd < 2x smallest sd)
Normally distributed
Equal n (for now)

How well did you know this?

Not at all

Perfectly

What do the hypotheses for ANOVA look like?

H0 = mu1 = mu2 = …. mu g
HA = at least 2 population means are different

How well did you know this?

Not at all

Perfectly

What are the steps for calculating F statistic in ANOVA test?

Calculate within variability
Calculate between variability
Fill in in F statistic formula

How well did you know this?

Not at all

Perfectly

How do you calculate the p-value in ANOVA testing?

1-F.DIST (F ; df1 ; df2 ; true)

How well did you know this?

Not at all

Perfectly

What is the conclusion if p < alpha in ANOVA test?

At least 2 groups differ, but you don’t know which ones

How well did you know this?

Not at all

Perfectly

What is MS and SS?

MS: mean squares = variability within and between

SS: sum of squares = MSg or MSe times the df1 or df2

How well did you know this?

Not at all

Perfectly

What is the fisher method in ANOVA?

The confidence interval of ANOVA testing. If you have 3 groups, you have 3 intervals

This confidence interval is more narrow than the normal confidence interval for t distribution

How well did you know this?

Not at all

Perfectly

Why would you use the fisher method and not doing three times the t-distribution?

It capitalizes on chance. By doing the test over and over again, the chance of a type I error (alpha) increases

How well did you know this?

Not at all

Perfectly

What is the Bonferroni method?

Adviced alpha = used alpha / number of tests (K)

It corrects for capitalization on chance for doing t-tests over and over again

How well did you know this?

Not at all

Perfectly

What is an alternative for the Bonferroni method?

Tukey method

How well did you know this?

Not at all

Perfectly

When do you use non-parametric tests?

When central limit theorem isn’t met, because groups are too small. No normal distribution

How well did you know this?

Not at all

Perfectly

How do you deal with ties in non-parametric tests?

Average the ranks the ties would get

How well did you know this?

Not at all

Perfectly

What are the three types of non parametric tests and when do you use them?

Wilcoxon: non parametric t test for comparing 2 means
Kruskal Willis: non parametric anova test for between groups/factorial designs
Sign test: for paired observations/ dependence/ paired t-test / pre-posttest design / matched individuals

How well did you know this?

Not at all

Perfectly

What are the assumptions for the Wilcoxon test?

Study These Flashcards

Rank ordered
2 independent samples
No assumptions regarding the distribution

What do the hypotheses for the Wilcoxon test look like?

Study These Flashcards

H0: equal expected values for sample mean ranks and identical population distribution

H1: different expected values for sample mean ranks (two sided)

H1: higher/lower expected values for sample mean ranks (one sided)

What distribution can you use for samples larger than 20 in a Wilcoxon test? What do you have to do in other cases?

Study These Flashcards

Use z distribution if n >20

In other cases: W = average (treatment) - average (control). Read the P-value from a sampling distribution

What is sample space in the Wilcoxon test? What is thought of these possibilities under H0?

Study These Flashcards

All possible rank combinations.
All these possibilities are equally likely under H0

What distribution does the Kruskal-Wallis test use?

Study These Flashcards

Chi square distribution

What are the assumptions for the sign test?

Study These Flashcards

Small n, not normally distributed
Random sampling
Unequal values for each pair (no equal pre/posttest values)

What do the hypotheses of a sign test look like?

H0: P (+) = 0,5 H1: P (+) =/ 0,5 (two sided) H1: P (+) > 0,5 (one sided) H1: P (+) < 0,5

What distribution does the sign test use?

The normal z-distribution

What is the difference between a regression line and a correlation?

Regression line predicts the value for a response variable Correlation indicates strength of the association

What is extrapolation?

Using regression line to predict y for x outside of the range of the data

In what case is r = b?

If the data have the same variabilities for variables

What is a residual?

Distance between data and regression line

What happens to b and r when the scale changes?

b changes r doesn't change, because it's standardized

How do you calculate the correlation in excel?

Function CORREL(select both columns)

What is R squared?

Proportion of variation in y values that is accounted for by the linear relationship of x and y It describes the predictive power = proportional reduction in error

What is the case for R squared = 0?

All values of estimated y are the same (horizontal line)

Are correlation and regression line resistant to outliers?

What is a lurking variable?

Variable that influences association between variables of primary interest. It has the potential to be confounding

What is the Simpson paradox?

Interpreting association wrongly and not taking in account several classes within the association. Reversal of direction association after adjusting for lurking variable

What is regression towards the mean?

Extreme values tend to be less extreme over time R < 1: so y is always relatively closer to the mean than x is to its mean if x is 2 sd away and r = 0,5, y is 0,5 * 2 = 1 sd away

What is the difference between the residual and the total? How do you summarize this?

Residual = distance data to regression line Total = distance data to mean Summarize by squaring the sum of both totals (RSS and TSS) You look if the regression line predicts the data better than the mean

What does this mean: Sum (y-yhat)^2 < Sum (y-ymean)^2 or RSS < TSS? What does this mean for R square?

If RSS < TSS: strong association. The regression line is a better predictor - R square is large

What happens with R square when: RSS = TSS RSS = 0 0 < R < 1

RSS = TSS --> R square = 0 (b = 0) RSS = 0 --> R square = 1 (the best!) 0 0

What does R square = 0,5 mean?

The error using regression line yhat to predict y is 50% smaller than using ybar to predict y 50% of total variance explained Variance around regression line is 50% less than total variance

What is ecological fallacy?

Using correlation to predict values for a specific individual. This can be very dangerous

What are the assumptions for regression analysis?

- Population has linearity - Data is randomly gathered - For each x, y follows normal distribution - The standard deviation for y should be the same for all values of x

What distribution does regression analysis use?

T-distribution

Statistics exam 4 Agresti Flashcards

Regression, non parametrics, ANOVA (45 cards)