W4/5: Practice Questions for Predictions Flashcards

1
Q

What kind(s) of analysis immediately comes to mind when we talk of association among variables? Give some examples

A

1.) Symmetric form of relationship - All variables same functional role and form
Continuous - Correlation
Categorical - Contingency Table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What kind(s) of analysis immediately comes to mind when we talk of prediction of one variable by other variables

A

Prediction, Linear regression which involves prediction of scores on a continuous dependent variable by one or more IV (either continuous/categorical)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

If we have a scatterplot between two variables and a regression line is placed on the graph, where will we find the predicted values from the regression of Y on X?

A

On regression line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the distance between the observed and predicted Y values called in such a graph (regression line). What is it signified by

A

Residual. Ei

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is a (1) covariance/correlation and (2)contingency table said to reflect a symmetric relationship among variables.

A

Same functional role and form.

(1) Correlation: Does not depend on specification
(2) Contingency Table: Does not depend on specification, can form row/column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is linear regression said to reflect an asymmetric relationship among variables

A

Not all variables have same functional form and role.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the residual term in linear regression equal to

A

Residual = Observed score on DV - Predicted score on DV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the total sum of squares of a dependent variable get decomposed into in a linear regression model

A

SStotal = SSreg (Explained/accounted for by the regression model) + SSres (Not explained/accounted for by the regression model)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between a simple regression model and a multiple regression model

A

Simple: 1 IV
Multiple: >1 IV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

List three advantages in using a confidence interval on R2

A
  1. ) CI can indicate whether data is consistent with no prediction/prediction at a population level
  2. ) CI width can indicate precision of interval estimation of R^2
    3) Lower bound being close (but not) 0 indicate regression model may explain trivial amount of variation in DV
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is a regression coefficient in a multiple regression analysis referred to as being a partial regression coefficient

A

Partial regression coefficient value indicate expected change in DV for focal IV when all other IVs due to their joint correlation has been partialed out.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does the interpretation of a standardised partial regression coefficient differ from that of its corresponding unstandardised partial regression coefficient

A

Standardized: In terms of SD units. Can be compared to on common metric.
Unstandardized: In terms of raw score units. Cannot be directly compared in size because size depends on metric of IV to which it is attached.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why would you expect the partial regression coefficients for a set of IVs in a multiple regression analysis to differ from the value of the regression coefficients obtained when each IV is used separately in a set of simple regression analyses?

A

IVs correlate with other IVs and DV.
Partial regression coefficient removes effect of each IV where overlap with other IVs in predicting DV has been partialled out (Using least squares estimator)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define a 95% confidence interval that is placed around a sample R2 value, which is analogous to the interpretation of the sample R2 itself

A

95% confident that POPULATION R^2 value will lie between lower bound and upper bound.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define a 95% confidence interval that is placed around a partial regression coefficient that is analogous to the interpretation of the coefficient itself

A

95% confident that 1 unit increase in focal IV will result in expected change in scores of DV ranging between lower bound and upper bound, keeping constant scores on other IVs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What would we infer about the plausible population value if the 95% CI on an unstandardised partial regression coefficient does not contain 0. Why (What are values in the CI anyways)

A

Non-zero: Zero is not a plausible population value for corresponding IV.
- Values within CI indicate set of null hypothesized values that would NOT be rejected if any one of those values was defined as the null hypothesized population value in a null hypothesis test

17
Q

In a plot of residual versus predicted values from a linear regression analysis, which variable (usually) corresponds to the Y-axis and which one is placed on the X-axis?

A

Y-Axis: Studendised deleted residuals (mean = 0; sd = 1)

X-axis: Standardised predicted value on DV

18
Q

What is meant by the term heteroscedasticity?

A

Residual scores from regression model do not have same variance for different predicted values of DV

19
Q

What is the opposite to heteroscedasticity, and what does it mean…apart from being the opposite of heteroscedasticity

A

Residual scores have same variance for different predicted values of DV

20
Q

How is heteroscedasticity identified in a linear regression analysis?

A

Scatterplot of studentised deleted residuals (Y) on predicted scores of DV (X).

21
Q

What are two ways that non-normality of residuals can be identified in linear regression?

A
  1. ) Histogram of residuals
  2. ) QQ Plot of residuals
  3. ) ScatterPlot of residuals vs predicted values and observed large proportion of data points either below or above imagined horizontal line drawn from 0 on Y
22
Q

What way can non-linearity be identified in linear regression?

A

Scatterplot of studentised deleted residuals versus standardised predicted values:

Obvious systematic U-shaped pattern (or inverted U-shaped pattern), or any other pattern that displays a systematic change in residual values along the X-axis.

23
Q

What does Cook’s d statistic do? When is the effect more noticable.

A

Cook’s d statistic:

Identify cases in your sample data which are aberrant in their values on the DV and IVs relative to the scores of other members of the sample, which may substantially change the results of the regression model (i.e., R2 or more particularly one or more of the partial regression coefficient values).

Small sample size: More noticeable

24
Q

How many values of a Cook’s d statistic will a linear regression model typically contain

A

= Cases in sample data