Regression; Lec 8; Lab 4 Flashcards

1
Q

What is a coefficient?

A

A factor that makes up a particular property

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does a significant r mean?

A

That the regression coefficient is also significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which variable goes on X axis and which goes on Y axis?

A

Variable which ‘varies’ (IV) on the X-axis Variable measured (DV) on the y-axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

If you want to predict university score from SAT scores - which is predictor variable and which is criterion variable?

A

SAT score is predictor variable

University score is criterion variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a negative correlation?

A

As one variable increases so the other decreases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does it mean when variables are said to covary?

A

That they have either a positive or negative correlation to one another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the predictor variable?

A

Independent variable - used to predict an outcome (variable that varies)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are three things that are important to remember in regression?

A
  1. Any set of data can have a regression line plotted
  2. The significance of the correlation or regression tells us whether a real relationship exists
  3. The correlation or the standard error of estimate tells us how accurate the regression equation is
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How should

rxy = cov(x,y)/SxSy

Where

Covxy = Σ(X - Xbar)(Y - Ybar)/N-1

be interpreted?

A
  1. It is an indication of how closely the data points lie along the line of best fit (the regression line)
  2. Like all stats requires a p value to determine whether the relationship is due to chance or is the product of a real relationship
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is r2 =

A

The proportion of the variance in the DV that is predictable from the IV.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Regression produces eight different analyses:

  1. Descriptive statistics,
  2. correlations,
  3. variables entered/removed,
  4. Model summary,
  5. ANOVA,
  6. Coefficients,
  7. Casewise diagnostics,
  8. Residual Statistics

First we look at ‘casewise diagnostics’, which should we look at second? Why?

A

Model summary

This is related to the correlation

r is the Pearsonn correlation restated

r2 is the coefficient of determination (a measure of relative variability) and indicates how much of the variation in the DV can be explained by variation in the IV

‘Std. Error of the Estimate’ is the std. error of prediction for all the values - this gives up a direct measure of the potential of our predictions using the regression equation. You compare this score to the SD of the criterion variable to get an indication of how useful the regression is (if the score in the regression is lower than that of the criterion variable score than it is useful)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is this the formula for?

Note: in this instance the denominator = N-2 because we are imagining 2 values involved in the prediction

A

Residual/error variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Regression produces eight different analyses:

  1. Descriptive statistics,
  2. correlations,
  3. variables entered/removed,
  4. Model summary,
  5. ANOVA,
  6. Coefficients,
  7. Casewise diagnostics,
  8. Residual Statistics

First we look at ‘casewise diagnostics’, then we look at ‘Model summary’, what should we look at third? Why?

A

Coefficients (factor that makes up a particular property)

  1. Column ‘B’, the value idenitified as constant is the intercept (a in regression equation)
  2. Slope/gradiant of the line = identified by name given to predictor variable (b in regression equation)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

H0 rho =

A

0

where rho is the population correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

r2 can give us % predictable variance. Using the smoking and CHD example, where r2 = .7232 = .508, explain.

A

r = .713

r2 = .7232 = .508

Approximately 50% in variability of incidence of CHD mortality is associated with variability in smoking - NOTE: you cannot infer a cause and effect relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

This is the formula for the regression of the line.

How do you calculate b?

A

SX2 = variability of X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the standard error of prediction?

A

Standard dev. of all predicted values minus the recorded value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How do we know if our prediction is better than just using the mean?

A

Total SS - residual SS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the criterion variable?

A

Dependent variable (variable measured)

20
Q

What can badly affect Pearson’s r?

A
  1. Data with outliers
  2. Data that is not linear (e.g. curvilinear - a smooth curve of any shape)
21
Q

Covxy = ?

A

Covxy = Σ(X - Xbar)(Y - Ybar)/N-1

22
Q

There are two possible ways to predict - what are they?

A
  1. Basic = difference from mean scores (average Y)
    1. Total sums of square
  2. Regression = difference from the regression line (difference from Yhat)
23
Q

If you want to predict the incidence of CHD in population based on incidence of smoking - which is the predictor variable? Which is the criterion variable?

A
  • Predictor variable X (IV) = average number of cigarettes smoked per head of population
  • Criterion variable Y (DV) = incidence of CHD
24
Q

How do you remove a data point from analysis?

A

Data –> Select cases –> ‘Select Cases’ dialogue box –> ‘If condition is satisfied’ –> ‘If…’ –> move variable you want to exclude to box and then put ~ (tilda - should not equal) and then type value of outlier you want to remove (e.g. 12.45) –> continue –> OK

Then you must re-run the analysis you wanted to run.

25
Q

What are each of the values for the regression equation:

Predicted score = b x (predictor score) + a

A

Predicted score = (Slope/gradiant of the line from coefficients output) x (predictor score) + (Coefficients output column ‘B’, the value idenitified as constant is the intercept)

26
Q

What is the linear regression equation?

A

Yhat = predicted value of Y

X = smoking incidence in a country

b = slope of line - change in predicted Y due to one unit change in X

a = the intercept - the value of Yhat when X is at 0

27
Q

There are 4 columns in the casewise diagnostics output:

  1. Std. Residual
  2. Case number
  3. Predicted value
  4. Residual

What do each of them mean?

A
  1. Std. residual = identifies how many stardard errors of prediction the selected data point is away from the regression line
  2. Case number = Participant number
  3. Predicted value = What SPSS (regression) predicted the value would be
  4. Residual = The gap between the predicted score and the actual score
28
Q

How would you summarise what correlation summarises?

A

Correlation quanitifies the potential linear relationship between two variables; the supposition of linearity must be confirmed by inspection of the scatterplot

29
Q

What

A
30
Q

How would you run a regression analysis in SPSS?

A

Analyze –> Regression –> Linear –> Move predictor variable (IV) to ‘Independents’ box –> Move criterion variable (DV) to ‘Dependents’ box –> Statistics –> Descriptives –> Casewise diagnostics –> Continue –> OK

31
Q

Regression produces eight different analyses:

  1. Descriptive statistics,
  2. correlations,
  3. variables entered/removed,
  4. Model summary,
  5. ANOVA,
  6. Coefficients,
  7. Casewise diagnostics,
  8. Residual Statistics

Which should we look at first?

A

Casewise diagnostics.

  • This will identify the data that can be considered outliers according to the ‘standard error of prediction’.
  • If there are any outliers that are more than 3 std devs beyond the value predicted by the regression line, then they can be considered ‘extreme outliers’ and should be removed
  • Recalculate the regression with the outlier removed
32
Q

How would you summarise what a scatterplot depicts?

A

A scatterplot depicts the nature of association between two variables in a graphical form

33
Q

What is the difference between correlation and regression?

A

Correlation allows you to establish whether two variables covary, but does not enable prediction (regression does)

34
Q

r = ?

A

degree to which X and Y vary together (covariability of X and Y) / Degree to which X and Y vary separately (Variability of X and Y separately)

35
Q

What is the intercept?

A

a = the value of Yhat when X is zero

36
Q

What correlation should you run if the data are non-parametric (e.g., curivlinear)?

A

Use Spearman’s correlation

37
Q

What should you do before you run Pearson’s r?

A

Produce a scatterplot to see if data is linear and check for outliers. If appropriate then remove outliers.

38
Q

What is this the formula for?

A

Standard error of estimate

It is the SD (sq root of variance - calculated by determing the variation between each data point relative to the mean) of predicted values and a common measure of accuracy of prediction

39
Q

What is the mathematical formula for Pearson’s R?

A

rxy = cov(x,y)/SxSy

40
Q

Standard deviation

A
  1. SD is a measure of spread
  2. A low SD tells us that the data is clustered around the mean, while a high SD tells us that it is dispersed over a wider range of values
  3. Used when the data is normally distributed
  4. Tells us whether a data point is standard/expected, or unusual/unexpected
  5. Represented by sigma

How to calculate:

  1. calculate the mean
  2. subtract the mean from each data point
  3. Square each difference
  4. Calculate the mean of the squared differences
  5. Take the Square root
41
Q

How would you summarise what regression quantifies?

A

Regressions quantifies the degree of impact one variable has on another, thus enabling prediction

42
Q

What is a positive correlation?

A

As one variable increases so the other increases

43
Q

What is error variance in regression called?

A

Residual variance - it is the variability of predicted values

44
Q

b = slope of line - change in predicted Y due to one unit change in X

What is this known as?

A

Regression coefficient

45
Q

How do you run Pearson’s in SPSS?

A

Analyze –> Correlate –> Bivariate

46
Q

This is the formula for the regression of the line.

How do you calculate a?

A
47
Q

Describe Pearson’s Product-Moment Correlation Coefficient

A

The extent to which a criterion variable (Y) varies in conjunction with the predictor variable (X)