Chapter 9: Linear Model (Regression) Flashcards

1
Q

What is the difference between a linear model and correlation?

A

Linear model differs from that of a correlation only in that it uses an unstandardized measure of the relationship (b1) and a parameter (b0) that tells us the value of the outcome when the predictor is 0.

b0 = intercept & represents the value of the outcome when predictor is 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Any straight line can be defined by two things:

A

1) the slope (b1) - also known as the relationship between the predictor and the outcome in unstandardized units
2) the point at which the line crosses the vertical axis of the graph, known as the intercept (b0)

b1 & b0 are parameters known as regression coefficients

b1 represents mean change in the outcome for one unit change in predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Regression is used for two things:

A

1) prediction
2) test theories/explanations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A good model…

A

1) fits the data better than a model with no predictors
2) should account for an amount of variance that is judged to be of practical and/or scientific significance
3) individual predictors/regression coefficients are significantly different than 0
4) should not have any outliers biasing the model
5) expected to predict the outcomes well in other samples
6) can be generalized to additional samples (cross validation) because assumptions are met

does not prove causation, sees if data consistent w/ causal hypothesis

once a model is estimated, it can be used for forecasting, called predicted value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Same intercept, different slopes

A

b0 (intercept) is the same in each but b1 (slope) is different. it looks like three lines coming out of the same point in the graph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Same slope, different intercept

A

b1(slope) is the same in each but b0(intercept) is different in each model. it looks like three separate lines going in the same direction, but do not connect at any point.

+ b1 = + relationship w/ outcome; - b1 = - relationship w/ outcome

the slope (b1) tells us what the model looks like (shape); the intercept (b0) locates the model in geometric space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a regression analysis?
What are the two types?

A

term for fitting a linear model to data and using it to predict values of an outcome (dependent) variable from one or more predictor (independent) variables.

simple regression: one predictor variable in the linear model
multiple regression: several predictors in the linear model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are residuals?

A

differences between what the linear model predicts and the observed data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the residual sum of squares (SSr)?

A

gauge of how well a linear model fits the data. it represents the degree of inaccuracy when the best model is fitted to the data. if it is a large number, it means the model is not representative of the data (a lot of error in prediction). if it is a small number, the line is representative of the data.

total amount of error in a model/does not tell us how good the model is

higher with more people

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Ordinarly least squares (OLS) regression

A

method of regression in which the parameters of the model are estimated using the method of least squares
essentially, getting b values that make the sum of the squared residuals (error b/w observed and model) as small as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What happens wheen there are more predictors?

A

expanded models with more predictors account for even more variance. unless the IVs are completely independent of one another, the predictors change as additional predictors are added

regression planes must be used to visualize.

w/ 3 or more, you cant visualize it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is goodness-of-fit?

good models exist on a continuum

A

assesses how well the model fits the observed data. we do this because even though the model may be the best one available, it can still be a bad fit. usually based on how well the data predicted by the model corresponds to the data actually collected.
done by comparing the complex model against a baseline model to see whether it improves how well we can predict the outcome, then calculating the error. if the complex model is any good, it should have significantly less error than the simple model

simple model is usually the mean of the outcome

R^2 and F Statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the total sum of squares (SSt)?

A

represents how good the mean is as a model of the observed outcome scores.
it is a measure of the total variability within a set of observation (total squared deviance between each observation and the overall mean of all observations)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the model sum of squares (SSm)?

A

the improvement in prediction resulting from using the linear model rather than the mean. this difference shows us the reduction in the inaccuracy of the model resulting from fitting the regression model to the data.
if SSm is large, the linear model is very different from using the mean to predict the outcome variable. this implies that the linear model has made a big improvement in predicting the outcome.
if SSm is small, using the linear model is little better than using the mean

SSm = SSt (total sum of squares) - SSr (residual sum of squares)

higher with more predictors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is R^2?

A

represents the amount of variance in the outcome explained by the model (SSm) relative to how much variation there was to explain in the first place (SSt).
proportion of variation in the outcome that can be predicted from the model
indicates whether the model is of scientific and/or practical significance

R^2 = SSm / SSt

we want this number to be high - r will tell us overall fit of the regression model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Mean Squares (MSm and MSr)

A

measure of average variability, based on the number of differences that were added up
MSm = SSm/k (df; number of predictors in the model)
MSr = SSr/ N - k - 1 (N = number of observations; k = number of parameters being estimates aka as all the b values)

dividing undoes the biasing effect of # of predictors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the F-statistic?

A

measure of how much the model has improved the prediction of the outcome compared to the level of inaccuracy of the model.
if a model is good MSm will be large and MSr will be small
large F statistic = greater than 1 (1 indicated no improvement)

F = MSm/MSr

if associated p-value is less than .05, there is significant improvement in prediction over the baseline model w/ no predictors
as the F gets higher, p gets lower (H0 less tenable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What can the F statistic tell us about R^2?

A

F can be used to calculate the significance of R^2 (how different it is from 0)
F = (N - k - 1)R^2 / k(1-R^2)
N = # of cases/participants, k = number of predictors in the model

if associated p-value is less than .05, R^2 is significantly different from 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a flat model?

A

model in which the same predicted value arises from all values of the predictor values, and will have b-values of 0 for the predictors
if a variable significantly predicts an outcome, it should haev a b value that is different from 0

regression coeff of 0 means a unit change in the predictor results in no change in the predicted value of the outcome and the linear model is flat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the t-statistic?

A

tests whether a b-value is significantly different than 0.
if the test is significant, we might interpret this as supporting a hypothesis that the b (regression coeffs) is significantly different (p < .05) from 0 and that the predictor variable contributes significantly to our ability to estimate values of the outcome, after accounting for the other predictors in the model. Each b gets its own test.

t = b(observed)-b(expected)/ SEb = b(observed)/ SEb

b(expected) = b value we expect to obtain if null is true, so 0.
b(observed) = b we calculate
SEb = how much error we estimate is likely in our b

when the SE is small even a small deciation from 0 can reflect a sig difference because b is representative of the majority of possible samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is generalization?

A

ability of a model to be applied to other samples aside from the one which it was based on. if it is not generalizable, then we must restrict conclusions only to the sample used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are outliers?

A

data with extreme value on the outcome variable, Y (large residuals)
case that differs substantially from the main data trend. it can affect the estimates of the regression coefficients. outliers can be assessed by unstandardized, standardized, and studentized residuals. they are NOT always influential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

unstandardized residuals

A

raw differences between the predicted and observed values of the outcome variable

24
Q

standardized residuals

A

residuals converted to z scores/represented in SD units
1) if they are greater than 3, then they are cause for concern because a value that high is unlikely to occur
2) if more than 1% of our sample cases have residuals w an absolute value greater than 2.5 there is evidence that the level of error in the model is unacceptable
3) if more than 5% of cases have residuals with an absolute value greater than 2, then the model may be a poor representation of the data

use this bc its easier to interpret

25
Q

studentized residuals

A

unstandardized residual divided by an estimate of its SD that varies point by point
have the same properties as standardized, but provide a more precise estimate of the error variance of a special case

use this bc its easier to interpret

26
Q

adjusted predicted value

A

predicted value of the outcome for that case from a model in which the case is excluded
estimate the model parameters excluding a particular case and use this new model to predict the outcome for the case that was excluded. if the model is stable then the predicted value of a case should be the same regardless of whether the case was used to estimate the model

27
Q

deleted residual

A

difference between the adjusted predicted value and the original observed value

tell us about the influence of cases on the ability of the model to predict that case, but not about how the case influences the whole model

28
Q

studentized deleted residual

A

the deleted residual is divided by the standard error to give us this value. it can then be compared across different regression analyses

29
Q

Leverage

A

gauges the influence of the observed value of the outcome over the predicted values.

30
Q

high leverage points

A

data with an extreme value on a predictor variable (x)
these points are extreme on the x-axis. if it drags the regression line towards it, it is influential

all leverage points should be close to the average value (k+1)/n
cases with 2-3x the average leverage value should be investigated/concerning

range from 0-1. a value of 1 indicates the case has complete influence over prediction

31
Q

When is a case influential?

A

a case is influential if the model parameter estimates change substantially if the case is deleted and the model reestimated
a good model should not be so fragile that 1-2 cases change it a lot
cases will be influential if: they have some combination of being extreme on X (leverage) and extreme on Y (outliers)
influence is what matters most
if conclustions change based on 1-2 data points, conclusions are said to be fragile

not all influential points have large residuals

can be examined using: Cook’s distance, Difference in Beta (DFBeta), Difference in Fit (DFFit).

32
Q

Cook’s distance

A

measure of the overall influence of a case on the model
abs values greater than 1 may be a concern

33
Q

Difference in Beta (DFBeta)

Standardized

A

measure of how much the estimates of the b’s change when a case is deleted
abs values greater than 1 may be a concern

34
Q

Difference in Fit (DFFit)

Standardized

A

measure of the difference in prediction when a case is deleted
abs values greater than 1 may be a concern

35
Q

Consider deleting high leverage points or outliers only when:

A

they are influential
first check that influential points are not coding error
if it is not coding error: does the case change the conclusions? is it possible to get more observations near that value of X?
if case does change the conclusion: report the results with and without the influential case or restrict your analysis to values of X where the relationship holds
do not use this to drop cases to create desired results (phack)

36
Q

Assumptions of the linear model

A

1) additivity and linearity
2) independent errors
3) homoscedasticity
4) normally distributed error

37
Q

additivity & linearity

A

outcome variable and predictors are linearly related/can be described by a linear model
do not use a linear model to describe a nonlinear relationship

38
Q

independent errors

A

residual terms should be uncorrelated/independent
assumption necessary for CIs and significance tests to be valid
if violated, use robust methods or a multilevel model

39
Q

homoscedasticity

A

residuals at each level of the predictor should have the same variance
if violated, CIs and significance tests are invalidated. use weighted least squares regression instead.

40
Q

normally distributed errors

A

residuals in the model are random, normally distributed variables with a mean of 0
does not matter with large sample sized because of CLT
if violated with a small sample size, use bootstrapped CIs

41
Q

other assumptions/considerations of the linear model

A

1) predictors are uncorrelated with external variables
2) Variable type: quantitative or dichotomous predictors, and continuous unbounded criterion
3) no perfect mullticollinearity
4) non zero variance

42
Q

predictors are uncorrelated with external variables

A

should be no external variables that correlate with any of the variables included in the model
regression results can be biased by an omitted (3rd) variable
if violated, conclusions are unreliable

43
Q

Variable type: quantitative or dichotomous predictors, and continuous unbounded criterion

A

all predictor variables must be quanititative or categorical, and the outcome must be quantitative, continuous, and unbounded

44
Q

no perfect multicollinearity

A

if your model has more than one predictor then there should be no perfect linear relationship between 2 or more of the predictors (predictors should not correlate too highly)
if violated, lead to untrustworthy estimates of the b’s, and SE gets very big

45
Q

non zero variance

A

predictors should have some variation in value (not have variance of 0)

46
Q

cross validation of the model

A

assessing the accuracy of a model across different samples (how it generalizes to different samples)
two main methods: adjusted R^2 and data splitting

47
Q

adjusted R^2

A

tells us how much variance in the predicted outcome would be accounted for if the model had been derived from the population from which the sample was taken (estimates what r^2 would be in a new sample)
when you try to apply a model to a different sample, R^2 from sample 1 to sample 2 will drop, causing a loss of predictive power known as SHRINKAGE.
shrinkage occurs because the process of fitting a model capitalizes on chance

more capitalizing on change + more shrinkage will occur when there are more predictors and with smaller sample sizes.
with large samples, a model will be well estimated and shrinkage will be minimal

regression models are optimized for the sample they were created from

48
Q

data splitting

A

randomly splitting your sample data, estimating the model in both halves of the data, and comparing the resulting models (80/20(model))

49
Q

To assume b’s are normally distibuted, we need a large sample. How big does it have to be?

A

situation specific power analysis is the best way to determine sample size needed.
General guidelines:
- if you expect to find a large effect: 80 people or higher
- if you expect to find a medium effect: 100 people or higher if there are 6 predictors or less
- if you expect to find a small effect: don’t bother unless you can get a very large sample

50
Q

Multiple regression: methods of entering predictors into a model

A

1) forced: predictors are added into the model at once. it is useful for testing theory. no established predictors
2)** hierarchical:** predictors added in blocks. established predictors added earlier in the process. new predictors asses as a group last. useful for testing theory or validity of new predictors.
3) stepwise: automatic method. predictors added to the model one by one based on partial correlations. process stops when removal criterion is met (i.e., regression coeff is not significant for the added predictor). useful for exploratory analyses when you have no idea what’s going on what want to generate hypotheses. it gives models that can’t generalize, and is frowned upon.
- this method is sensitive to sampling variation and the results don’t generalize

adding control variables into the model doesn’t purify the analysis and their inclusion can result in inappropriate inferences

51
Q

Parsimony

A
  • explaining data while being as simple as possible
  • accounting for variance in the simplest way
  • more predictors account for more variance
  • R^2 increases with more predictors added/see if new predictors have value
  • change R^2 from simple model to complex model: if its sig, then sampling error alone cannot account for this difference.
52
Q

Assessing parsimony

A
  • change in R^2 (significance and magnitude)
  • Akaike Information Criterion (AIC): lowest AIC = most parsimonious model, penalizes you for adding predictors

you can compare different models and access parsimony using R^2 change and AIC.
R^2 should significantly change, magnitude change, lower AIC = argument for the complex model

53
Q

What is multicollinearity?

multiple regression

A
  • occurs when the predictor variables themselves are related to each other (highly correlated)
  • for ex. trying to predict lawyer salary based on age and experience (hard to tease apart)
  • level of multicollinearity varies from none to perfect multicollinearity (continuum). if none, it means that all predictor variables are unrelated. if perfect, can’t fit a regression model because there are infinite models
  • mild multicollinearity is not a big deal but high multicollinearity makes it more difficult to estimate the b’s
  • high multicollinearity can result in the standard errors of the coeffs being very high. this means more errors in each estimate and a wider sampling distribution (parameters not well estimated)
  • parameter estimates change wildly from sample to sample; CI’s will be wide; and harder to find significance
54
Q

How to diagnose multicollinearity?

A
  • simplest way: examine the correlations among predictors. correlations that are higher than .8 or .9 suggest multicollinearity may be an issue
  • examining the correlation matrix will miss more subtle forms of multicollinearity
  • we need stats that are specific to detecting multicollinearity. one is the variance inflation factor (VIF) for each predictor. (VIF = 1/(1-R^2k)
  • VIF > 10 is a problem
55
Q

how to deal with multicollinearity?

A

1) do nothing
2) get rid of the variables
3) combine the correlated variables
4) use a method that can handle highly correlated variables like partial least squares or principal component analysis

56
Q

how can the unstandardized simple regression equation be written in standard form?

A
  • variables expressed as z scores
  • Zy = r(Zx)
  • predicting Y’s z score, input is not the person’s score on X but rather their z score on X
57
Q

how can the unstandardized multiple regression equation be written in standard form for 2 predictors?

A

Zy = Beta(Zx) + Beta(Zx) …
- the betas take place of the correlation coeffs becaue we have to account for the other predictors
- the std multiple regression equation works the same way as the std simple regression equation. Plug in someone’s z score on the X variables to get their predicted z score on the Y variable
- the std betas refer to how many SDs the outcome will change for every SD change in the predictor
- can be extended to include more predictors

std betas help assess relative importance of dif predictors in SD units