Midterm 3 Flashcards

1
Q

In OLS regression, total variation or deviation follows the logic of what test?

A

-test of significance called analysis of variance (ANOVA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Total variation in bivariate regression represents what?

A

-the total sum of squares SST

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What indicates the explained variation in a bivariate regression?

A
  • SSR sum of squares regression
  • the amount of variation in Y accounted for by X
  • amount of total variation that is explained by regression equation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the SSR also called?

A

-model sum of squares SSM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What represents the amount of variance left over in Y that the bivariate regression didn’t account for?

A
  • sum of squared errors (SSE)

- Sometimes called residual sum of squares (SSR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the most important use of SST, SSE and SSR?

A
  • calculation of the coefficient of determination

- AKA square of Pearson’s r (r^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does r-squared tell us?

A

-the proportion of the total variation attributable by X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What type of relationship do SSR and SSE hold with each other?

A
  • a reciprocal relationship

- as one sums increases the other decreases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If there is a stronger linear relationship between X and Y, what will happen to the explained and unexplained variation?

A
  • greater explained variation

- lesser unexplained variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What would a r-squared value of 1 mean?

A
  • X explains 100% of the variation in Y

- we could predict Y from X without error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When X and Y are not linearly related, what happens to the explained variation and r-squared?

A
  • both are zero

- X explains none of the variation in Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do you need to calculate for a linear relationship to really say if its a strong relationship?

A

-r-squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does it mean if the correlation coefficient is +1?

A

-there is a perfect positive relationship between the two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does it mean if the correlation coefficient is -1?

A

-there is a perfect negative relationship between X and Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does it mean if the correlation coefficient is 0?

A

-no linear relationship between these two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How would you express a correlation coefficient of 0.65?

A

-A one standard deviation increase in X is associated with a 0.65 increase in Y, on average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Does the magnitude of a linear slope have anything to do with scatter?

A
  • NO

- it’s possible to have a very deep line with scatter or a very shallow line with no scatter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the slope coefficient?

A

-b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What do r and b have in common?

A
  • the same numerator

- thus, testing the hypothesis that r=0 is the same as testing if b=0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Why must we test to see if the relationship between the variables exists in the population from which the sample was drawn?

A
  • since the data for a bivariate regression is based on a random sample
  • called testing for significance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do we test for significance?

A

-Pearson’s r since the slope is identical to this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What assumptions are made to test for significance in a bivariate relationship?

A
  1. Assume that both variables are normal in distribution (bivariate normal distributions)
  2. Assume the relationship between variables in somewhat linear
  3. Homoscedastic relationship
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a homoscedastic relationship?

A

-The Y scores are evenly spread above and below the regression line for the entire length of the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How do you determine if it is appropriate to proceed with the assumptions around the test of significance?

A

-look for homoscedascity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are bivariate normal distributions?

A

-both variables are normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

In hypothesis testing, what does it mean if you fail to reject the null?

A
  • the Pearson’s r could have occurred by chance alone

- two variables are unrelated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is hypothesis testing based on?

A

-sampling distribution of means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the sample distribution of means?

A
  • describes the variation in the values of the mean over a series of samples
  • based in the central limit theorem
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How large do samples have to be to reach a normal distribution?

A

-greater than or equal to 30

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What happens with a larger sample size in hypothesis testing?

A

-better approximation to the normal distribution and a more effective estimation of the population mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What can be understood about b in hypothesis testing?

A
  • it can be interpreted as a mean

- thus the regression equation should have the population regression slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What does b produce?

A
  • beta

- not always though

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is critical about b for hypothesis testing?

A

-that b is normally distributed is critical for hypothesis testing of OLS regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Why can we use z to determine b and beta?

A

-since b is normally distributed in the population of samples

35
Q

Why can we drop the beta in the formula for t?

A

-since beta is presumed to equal 0

36
Q

What do the residuals indicate?

A
  • how far the predicted value based on b is from each actual case
  • suggest other factors besides X are influencing Y
37
Q

The larger the standard deviation of X will cause what for the standard deviation of b?

A
  • smaller SD of b

- better estimate the slope when we have a lot of values for the predictors

38
Q

What are the three most commonly used levels of significance in quantitative research?

A
  • p<0.05 *
  • p<0.01 **
  • p<0.001 ***
39
Q

When SPSS produces coefficients of bivariate relationship which values correspond with bX, a and Sb?

A
  • bX is unstandardized coefficient and B
  • a is unstandardized coefficient and B
  • Sb is std. error and the X variable
40
Q

What are antecedent variables?

A
  • Z effects X independently and Y independently

- no effect between X and Y

41
Q

What are redundant variables?

A
  • Z and X affect each other but only Z affects Y

- Z and X are simultaneous

42
Q

What is the least squares multiple equation for two independent variables?

A

Y=a + b1x1 + b2x2

43
Q

What is b1 and b2

A

-b1 is the partial slope of the linear relationship between the first independent variable and Y

44
Q

What is the purpose of multiple regression?

A
  • to examine the independent relationship between each predictor (IV) and an outcome (DV, Y) in a set of predictors
  • holds all other variables constant
  • statistical control
45
Q

What is statistical control

A

-we cannot eliminate the effect of other variables on our Y so we use statistics to control

46
Q

Is multiple regression as good as an experiment?

A
  • No
  • assumes that the relationship between variables can be assumed by a linear equation
  • makes errors as small as possible
47
Q

What is wrong with multiple regression?

A

-we cannot measure every variable that affects our dependent variable

48
Q

What is the purpose of a in a regression equation?

A

-anchor for the regression

49
Q

How realistic is a multiple regression model?

A

-all models are poor depictions of reality

50
Q

What is e in the full multivariate regression equation?

A
  • it indicates all the other influences besides all X’s in the model
  • changes for every case
51
Q

What can b be thought of as in the multivariate regression model/equation?

A
  • each b is a weight
  • expresses how much of Y each X contributes with a 1 unit increase in X
  • each b indicates the independent effect of each X
52
Q

What is covariance?

A

-measure of how two variables vary together

53
Q

What value shows r in a SPSS correlation matrix?

A

-find the two variables you are interested in and look at where they intersect

54
Q

What does it mean to look at the independent effect?

A

-remove other variables effect on it

55
Q

How do we look at the independent effect of two independent variables with correlation?

A

-run both of them in the regression model

56
Q

How do we find the full regression equation in SPSS?

A
  • a is equal to unstandardized and B
  • b1 is equal to unstandardized and X1
  • b2 is equal to unstandardized and X2
57
Q

Describe the regression equation Y=1.897 + 0.339Xage + 0.521Xmemory + e

A
  • a one unit increase in age is related to a 0.339 unit increase in Y, controlling for memory
  • if age and short term memory were both zero, we would predict a reading ability of 1.897
58
Q

What is the multiple coefficient of determination?

A
  • R^2

- since r^2 doesn’t work for multiple regression cause there is overlap

59
Q

What is R^2?

A
  • correlation between observed and predicted values from the multiple regression
  • variance in the dependent variable accounted by the predictors in the regression
60
Q

What would it mean if we had a R square value of 0.702?

A

-The amount of variance in Y X1 and X2 account for which is 70.2%

61
Q

Why can we not just compare partial slopes?

A

-different units

62
Q

What do we do to convert partial slopes into a comparable form?

A

-look at standardized coefficients

63
Q

What are standardized partial slopes called?

A

-beta weights

64
Q

How to interpret beta-weight values?

A

-the higher the beta-weight value the stronger the relationship regardless of + or -

65
Q

In bivariate regression what type of strength do we observe with standardized coefficients?

A

-absolute

66
Q

In multiple regression can we use standardized slopes to determine absolute strength?

A
  • No

- Relative strength only

67
Q

Is beta-weight equal to r?

A

-no

68
Q

What does multiple regression do for spurious relationships?

A

-it is used to rule out spurious relationships among variables

69
Q

What are the three types of spurious relationships?

A
  • antecedent
  • redundant
  • suppression
70
Q

What is suppression

A
  • opposite of redundancy

- when the relationship between two variables gets stronger when you control for a third variable

71
Q

How can we use stepwise regression to show spurious relationships?

A
  • the unstandardized betas will change values in each model (go down)
  • or the R square will change value in each model
72
Q

How do you test for significance in multiple regression?

A

-use t equation of b/Sb

73
Q

What forms does multicollinearity come in?

A

-extreme and near extreme

74
Q

What is extreme multicollinearity?

A
  • at least two of the X variables in a regression equation are perfectly related by a linear function
  • correlation between X1 and X2 is 1
75
Q

What is near-extreme multicollinearity?

A
  • there are strong, although not perfect, linear relationships among the X’s
  • correlation between X1 and X2 will be close to 1 or -1
76
Q

How do you find near-extreme multicollinearity?

A
  • regress each independent variable on all the other independent variables and look for a high R-square
  • if any of these are above 0.6 this is concerning
77
Q

Why is multicollinearity a problem?

A
  • it will result in a larger standard error for its coefficients
  • making it harder to find statistically significant coefficients (t)
78
Q

What differs between the standard error for bivariate and multivariate regressions?

A

-correction factor for the covariance between the two predictors

79
Q

What does greater covariance between two predictors result in?

A

-less reliable estimates because it inflates Sb

80
Q

What is VIF?

A

-captures the factor to which two independent variables are collinear

81
Q

How would you interpret a VIF of 9?

A

-you’re multiplying the standard error for a coefficient for a factor of 3

82
Q

What variable will have a large VIF?

A

-independent variable that is highly correlated with other predictors in the model

83
Q

What is the cut off for VIF?

A

6