MR Chapter 1 Flashcards

1
Q

Simple (Bivariate) Linear Regression first things to check

A

One influence (IV) and one outcome (DV)

  1. LOOK AT THE VALUES (do any of them seem too high?)
  2. Look at the mean (is it plausible?)
  3. Check descriptives for the national average (e.g. National Math Ach Test)
  4. Check the shape of the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The Regression Analysis - second things to check

A

We regress the outcome on the influence or influences

1. The correlation between the two variables (zero-order correlation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Zero-order correlation

A

The correlation between the two variables - distinguishes it from first-, second-, or multiple-order partial correlations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is in the Model Summary part of the SPSS output?

A
  1. lists the R, which is normally used to designate the multiple correlation coefficient, but with one predictor, is the same as Pearson’s correlation
  2. Lists R squared, which denotes the variance explained in the outcome variable by the predictor variables
  3. As you run this regression, you’ll get extra stats like R2 adjusted, but ignore this for now
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is R?

A

used to designate the multiple correlation coefficient, but with one predictor, is the same as Pearson’s correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

R squared

A

denotes the variance explained in the outcome variable by the predictor variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do we know if the regression, that is, the R and R2, is statistically significant?

A

-Because it’s simple regression, run correlations into SPSS and that has a sig box.
- Increasingly, we use the ANOVA F test to test the significance of the regression equation.
SSregression/df regression
F= ______________________
ss residual/df residual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Sum of Squares Regression?

A

a measure of the variation in the DV that is explained by the IV(s)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Sum of squares residual?

A

the variance unexplained by the regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you work out DF for regression?

A

Equal to the number of IVs (k)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you work out DF Error/residual?

A

Equal to the sample size minus the number of IVs in the equation minus 1 (N-k-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does significance of the regression mean?

A

If F = 11.179 - What is the probability of obtaining a value of F as large as 11.179, if these variables were in fact unrelated in the population?

According to the table (column labelled sig) such an occurance would happen only 1 time in 1000 (p=.001); so these are indeed related.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you work out R sqaured?

A

SS regression 1291.231
R2 = ———————— = ——————– = .102
SS total 12610.190

Homework explains .102 of the variance or 10.2 % of the variance in Math achievement.
R2 can vary between 0 and 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is The Regression Equation?

A

Y = a +bX + e

= a person’s score on the DV (Math Achievement) is a combination of a constant (a), plus a coefficient (b) times his or her value on the IV (Math HW), plus error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Where do you find variables a and b for the regression equation in the SPSS output?

A

Variables A and B are shown in the second column “unstandardized coefficients B” on the “Coefficientsa” bit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is A?

A

A is the constant, or the intercept: the predicted score on the DV for someone with a score of zero on the IV.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is B?

A

The unstandardized regression coefficient. If we don’t have a direct estimate of error, ignore it for now, and use Y’=a + bX, where Y’ is predicted value of Y.

Can be thought of as b=rise/run = My-a/Mx-0 = predicted increase in Y expected for each unit increase in X

18
Q

How do you draw a rough conclusion from a regression equation?

A

So predict math scores, multiply his or her time spent on HW by 1.990 and add 47.032. We can expect a 1.99 mark increase in every hour spent on math HW

19
Q

The regression line

A

We draw the line with two points. First is (X= 0, Y = 47.032.) The second point is simply the point defined by the mean of X (Mx=2.200) and the mean of Y (My=51.410).

b=rise/run = MY-a/Mx-0 = predicted increase in Y expected for each unit increase in X
If a is 47.032, it shows that the average Achievement test score for students not studying is 47.032, slightly below the national average.

20
Q

If the R is used to designate the multiple correlation coefficient, what happens when R is larger?

A

If the R was larger, the data points would cluster more closely around the regression line. Higher correlation, nearer to line of best fit.

21
Q

With MR how do we test statistical significance?

A

What is t in the tables? The values corresponding to the regression coefficient are simply the result of a t test of the statistical significance of the regression coefficient (b).

t = statistic/SE of the statistic

  • in this case: t = b/SEb = 1.990/.595 = 3.345
  • t=3.344 with N-K-1 df (98). The probability of obtaining such a t by chance is
22
Q

How do we know if a t will be significant?

A

As a general rule of thumb, with a reasonable sample size (eg 100+), a t of 2 or greater will be significant with alpha of .05, and two tailed (non-directional) test.

Overall regression may be significant, but the regression coefficients for some of the IVs may not be significant, whereas others are significant.

23
Q

Is F always t squared?

A

In simple regression t squared is F, (as always, withing errors of rounding). In MR this will not be the case.

24
Q

What are Confidence Intervals used for?

A

b is an estimate, but we are interested in it’s true value of the regression coefficient (or slope, or b) in the population. The CI makes this underlying thinking more obvious.

25
Q

If the CI is .809 to 3.171, what does this mean?

A
  • There is a 95% chance that the true (but unknown) regression coefficient is somewhere within the range .809 to 3.171
  • If we conducted the study 100 times, 95 times out of 100 b would fall within the range .809 to 3.171.
26
Q

How do we know from CI if the regression or b is significant?

A
  • If the range does not include zero, b is statistically significant. If the range did include zero, “we could not say with confidence that the coefficient was different from zero.”
27
Q

Do we always compare b to zero with CIs?

A
  • No, CI can be used to say the regression coefficient is different from zero, or, from another specified value.

Eg regression coefficient of 3.0 for MAch and HW, for each hour of homework, scores increase by 3 points.

We might reasonably ask whether our finding for 8th graders is inconsistent; the fact that 95% CI includes the value of 3.0 means that our results are not stat sig diff from high school results.

28
Q

Is CI always 95%?

A

No, We can also use any level of confidence eg 99% confidence intervals.

29
Q

What are the steps for Confidence Intervals?

A
  1. Pick a level of confidence e.g. 99%
  2. Convert to the probability (.99) and subtract that probability from 1 (1 - .99= .01)
  3. Look up t table with df and two tailed. This is the value of t associated with the probability of interest. (eg t = 2.627)
  4. Multiply this t value times SE of b, (e.g. .595x2.627 = 1.563) and add and subtract the product from b. (e.g. 1.990 ±1.563 = .427-3.553).

This is the confidence interval around the regression coefficient.

30
Q

What is The standardised Regression Coefficient?

A

This is Beta or ß. - Beta is interpreted in a similar way to B, but in standard deviations units

31
Q

What is the unstandardized coefficient ?

A

This is B or b. It is interpreted as the change in the outcome for each unit of change in the influence

32
Q

If ß is .320, for the HW and Math Ach example how is this interpreted?

A
  • ß for the example (.320) means that for each SD increase in HW, Ach will, on average, increase by .320 standard deviations, or about a third of an SD.
    See? Same as B but in SDs.
33
Q

How do z scores relate to ß and B?

A

ß is the same as b with the independent and dependent variables standardised (converted to z scores.)

  • ß = b(SDx/SDy) or b = ß(SDy/SDx)
  • ß = 1.990(1.815/11.286) = .320
34
Q

Is ß usually the same as correlation?

A

No, in this case the standardised regression coefficient (ß) is the same as the correlation coefficient because it’s simple regression. It will not be the case with multiple regression because multiple predictors.

35
Q

Is ANOVA like Regression?

A
  • ANOVA is the same, part of the general linear model.
  • Y = mu +ß+e, person’s score on the DV Y is the sum of the overall mean plus variation due to treatment effects and plus or minus error.
  • Now regression equation is Y = a Bx + e, person’s score on the DV is the sum of a constant that is the same for all individuals (a), plus the variation due to the IV (X), plus or minus error.
  • With two groups, F = t2 within errors of rounding. The t test, ANOVA and regression line tell you the same thing. Ie. MR subsumes ANOVA.
36
Q

Why is MR better than ANOVA?

A
  1. we can use categorical IVs (training vs no training on consultant performance), or continuous IVs (motivation), or both. But ANOVA discards variance in the IV and leads to a weaker statistical test.
  2. In MR, we can use multiple IVs to explain the variation in a variable such as school performance. In ANOVA, lots of IVs would tax the researchers interpretative abilities.
  3. ANOVA is useful in experimental research, where there is active manipulation of the IV and random assignment. MR can do this (ANOVA is easier though), but can also do non-experimental research. The IVS are not assigned at random; motivation etc. is an existing IV. So you don’t manipulate it at all. MR is almost always better for non-experimental research.
37
Q

Do we say predict or explain?

A

Explanation subsumes prediction. If you can explain a phenomenon, you can predict it. A worthy goal does not explain anything. We are more interested in explaining than predicting here.

38
Q

Can we make causal inferences with MR?

A

We can and do make causal inferences from correlational or non-experimental data. Under certain conditions, this is valid with scientific responsibility. In other cases it’s invalid and misleading.

39
Q

What is variance and SD?

A

SD = √V, V=SD2.
It’s often easier to use SD because same units as original variables.
Another standardised regression coefficient is ß = b(√Vx/Vy).

40
Q

What is covariance?

A

Variance is the degree to which one variable varies around its mean. Covariance is the degree to which these variables vary together.

  • A correlation coefficient is a standardised covariance.
  • Covariance is an unstandardized correlation coefficient.
  • If we know the Vs or SDs we can easily convert correlations to standardised and unstandardized.

rxy = CoVxy/SDxSDy

Can standardise where SD is always 1.

rxy = CoVzxzy/1 x 1 = CoVzxzy