Chapters Beyond 9 Flashcards
What does the mean squares represent?
The between group variability and the within group variability.
Between is on top
Within is on bottom
When do we do in pairwise comparisons?
When we reject H_o in an ANOVA and want to see which means are different between groups.
Why do we do pairwise comparisons?
To be able to see which groups have different means
What does it mean if two variables measured on the same subject are associated?
Knowing one value of the variable tells us something about the value of the other variable.
What do we use to show the correlation coefficient, and what are its units?
r.
r has no units
What is the parameter for the correlation coefficient?
Rho (p)
r = 1 corresponds to what correlation?
A perfect positive correlation.
r = -1 corresponds to what correlation?
A perfect negative correlation.
r = 0 signifies what?
No correlation. No linear association.
What happens to the scatter plot when there is a strong linear association?
The points are tightly clustered around a line.
What is the explanatory variable?
X
Which is the response variable?
Y
Does r change if we interchange the explanatory and response variables?
No
When there is a strong linear association, what do we know?
That information about one variable helps in predicting the other.
What happens in a weak association?
The points are scattered broadly
What does a low r mean?
It means there is no linear association - however not necessarily that there is no association
Correlation does not imply _______
Causation
What is linear regression used for?
To find a line that summarizes the linear relationship between two variables.
With it we can make predictions about y, the response variable
How do we notate the regression line?
y_i = beta_o + beta_1x_i + epsilon_i
beta_o = intercept beta_1 = slope epsilon_i = error term. Indicates how far y_i is from the line.
What is beta_o?
Intercept
Represents the average value of y when x is zero.
What is beta_1?
Slope
Represents the change in the average for y for every one unit increase in x
What is epsilon_i?
error term. Indicates how far y_i is from the line.
We estimate the slope and intercept of the regression line from the data to get:
y_hat i = beta_hat o + beta_hat i * x
You don’t want to use a regression line to estimate values that are…
Outside of the range of data we got in our sample. (This is known as extrapolation)
What is the least squares regression line?
The line that minimizes the total squares vertical error (i.e. The total of the squares residuals)
What are residuals?
The vertical deviations between the points and the line y_i-B_o+B_ix_i = epsilon_i
In the least squares line:
B_hati can be found by?
r *(sd_y / sd_x)
In the least squares line :
B_hato can be found by?
y_bar - B_hat1*x_bar
In ANOVA, what describes how much the observations vary around the sample mean?
the within group variance
In ANOVA, what describes how much the sample means vary around a total mean?
the between group variance
What does a Chi-squared Goodness of Fit Test measure?
it measures the discrepancy between observed cell counts and cell counts expected under the null hypothesis, to assess whether the hypothesized distribution is plausible
What is H_o for the goodness of fit test?
H_o: p1=p1, p2+p2, …, pk = pk*
What is H_A for the goodness of fit test?
H_A: pi != pi*, for some i.
What are the expected counts in a goodness of fit test?
The proportion given to us, multiplied by the total # of observations in our test.
All the expected counts summed should equal our total observations in our test!
What is the Pearson chi-square test statistic?
chi-squared = sum from i to k of [(x_i-e_i)^2]/e_i
or [(observed-expected)^2]/expected
How is the p-value for a chi-squared goodness of fit test calculated?
We use the table for chi-square, and we always want a one-tailed right hand tail.
What are the degrees of freedom for a chi-square test?
k-1, or the number of groups minus 1.
What do we use a chi-square test of independence for?
To determine whether two variables, summarized in a 2-way contingency table, are independent.
What is a two-way contingency table?
a set of frequencies that summarize how a set of objects is simultaneously classified under two different categorizations.
What is H_o in a test for independence?
H_o: the two variables are independent
What is H_A in a test for independence?
H_A: the two variables are not independent.
How do we calculate the expected count in a test for independence? (e_ij)
e_ij = (row_i total * col_j total)/grand total
What is the chi-square test statistic for a test for independence?
chi^2 = sum from i to r of, sum from j to c of (x_ij - e_ij)^2/e_ij
What are the degrees of freedom in a chi-square test for independence involving an r*c table?
(r-1)(c-1)
When does the chi-square test for independence work well?
when e_ij GT= 5
What two hypotheses tests are there for categorical data?
1) goodness of fit test
2) chi-square test of independence
(both are chi-square tests!)