W4/5: Practice Questions for Predictions Flashcards

Question 1

Q

What kind(s) of analysis immediately comes to mind when we talk of association among variables? Give some examples

Answer

A

1.) Symmetric form of relationship - All variables same functional role and form
Continuous - Correlation
Categorical - Contingency Table

Question 2

Q

What kind(s) of analysis immediately comes to mind when we talk of prediction of one variable by other variables

Answer

A

Prediction, Linear regression which involves prediction of scores on a continuous dependent variable by one or more IV (either continuous/categorical)

Question 3

Q

If we have a scatterplot between two variables and a regression line is placed on the graph, where will we find the predicted values from the regression of Y on X?

Answer

A

On regression line.

Question 4

Q

What is the distance between the observed and predicted Y values called in such a graph (regression line). What is it signified by

Answer

A

Residual. Ei

Question 5

Q

Why is a (1) covariance/correlation and (2)contingency table said to reflect a symmetric relationship among variables.

Answer

A

Same functional role and form.

(1) Correlation: Does not depend on specification
(2) Contingency Table: Does not depend on specification, can form row/column.

Question 6

Q

Why is linear regression said to reflect an asymmetric relationship among variables

Answer

A

Not all variables have same functional form and role.

Question 7

Q

What is the residual term in linear regression equal to

Answer

A

Residual = Observed score on DV - Predicted score on DV

Question 8

Q

What does the total sum of squares of a dependent variable get decomposed into in a linear regression model

Answer

A

SStotal = SSreg (Explained/accounted for by the regression model) + SSres (Not explained/accounted for by the regression model)

Question 9

Q

What is the difference between a simple regression model and a multiple regression model

Answer

A

Simple: 1 IV
Multiple: >1 IV

Question 10

Q

List three advantages in using a confidence interval on R2

Answer

A

) CI can indicate whether data is consistent with no prediction/prediction at a population level
) CI width can indicate precision of interval estimation of R^2
3) Lower bound being close (but not) 0 indicate regression model may explain trivial amount of variation in DV

Question 11

Q

Why is a regression coefficient in a multiple regression analysis referred to as being a partial regression coefficient

Answer

A

Partial regression coefficient value indicate expected change in DV for focal IV when all other IVs due to their joint correlation has been partialed out.

Question 12

Q

How does the interpretation of a standardised partial regression coefficient differ from that of its corresponding unstandardised partial regression coefficient

Answer

A

Standardized: In terms of SD units. Can be compared to on common metric.
Unstandardized: In terms of raw score units. Cannot be directly compared in size because size depends on metric of IV to which it is attached.

Question 13

Q

Why would you expect the partial regression coefficients for a set of IVs in a multiple regression analysis to differ from the value of the regression coefficients obtained when each IV is used separately in a set of simple regression analyses?

Answer

A

IVs correlate with other IVs and DV.
Partial regression coefficient removes effect of each IV where overlap with other IVs in predicting DV has been partialled out (Using least squares estimator)

Question 14

Q

Define a 95% confidence interval that is placed around a sample R2 value, which is analogous to the interpretation of the sample R2 itself

Answer

A

95% confident that POPULATION R^2 value will lie between lower bound and upper bound.

Question 15

Q

Define a 95% confidence interval that is placed around a partial regression coefficient that is analogous to the interpretation of the coefficient itself

Answer

A

95% confident that 1 unit increase in focal IV will result in expected change in scores of DV ranging between lower bound and upper bound, keeping constant scores on other IVs.

Question 16

Q

What would we infer about the plausible population value if the 95% CI on an unstandardised partial regression coefficient does not contain 0. Why (What are values in the CI anyways)

Answer

Study These Flashcards

A

Non-zero: Zero is not a plausible population value for corresponding IV.
- Values within CI indicate set of null hypothesized values that would NOT be rejected if any one of those values was defined as the null hypothesized population value in a null hypothesis test

Question 17

Q

In a plot of residual versus predicted values from a linear regression analysis, which variable (usually) corresponds to the Y-axis and which one is placed on the X-axis?

Answer

Study These Flashcards

A

Y-Axis: Studendised deleted residuals (mean = 0; sd = 1)

X-axis: Standardised predicted value on DV

Question 18

Q

What is meant by the term heteroscedasticity?

Answer

Study These Flashcards

A

Residual scores from regression model do not have same variance for different predicted values of DV

Question 19

Q

What is the opposite to heteroscedasticity, and what does it mean…apart from being the opposite of heteroscedasticity

Answer

Study These Flashcards

A

Residual scores have same variance for different predicted values of DV

Question 20

Q

How is heteroscedasticity identified in a linear regression analysis?

Answer

Study These Flashcards

A

Scatterplot of studentised deleted residuals (Y) on predicted scores of DV (X).

Question 21

Q

What are two ways that non-normality of residuals can be identified in linear regression?

Answer

Study These Flashcards

A

) Histogram of residuals
) QQ Plot of residuals
) ScatterPlot of residuals vs predicted values and observed large proportion of data points either below or above imagined horizontal line drawn from 0 on Y

Question 22

Q

What way can non-linearity be identified in linear regression?

Answer

Study These Flashcards

A

Scatterplot of studentised deleted residuals versus standardised predicted values:

Obvious systematic U-shaped pattern (or inverted U-shaped pattern), or any other pattern that displays a systematic change in residual values along the X-axis.

Question 23

Q

What does Cook’s d statistic do? When is the effect more noticable.

Answer

Study These Flashcards

A

Cook’s d statistic:

Identify cases in your sample data which are aberrant in their values on the DV and IVs relative to the scores of other members of the sample, which may substantially change the results of the regression model (i.e., R2 or more particularly one or more of the partial regression coefficient values).

Small sample size: More noticeable

Question 24

Q

How many values of a Cook’s d statistic will a linear regression model typically contain

Answer

Study These Flashcards

A

= Cases in sample data

W4/5: Practice Questions for Predictions Flashcards

(24 cards)