Linear Regression Flashcards

Question

If there is a large effect size in a regression model, does this mean the same model will represent that effect size in a different model?

Answer 1

NO. Because an effect size from one sample is overestimated because of overfitting isues. Therefore next model you run on a new sample might have smaller effect size. Or one that is not statistically signicant.

Answer 2

How generalisable your model is Estimated shrinkage in your r2 value and adjusted r2 value

Answer 3

Small data

Answer 4

The regression model does not generalise well to the population

Answer 5

generalise More than a few percent (+3%) between r2 and adjusted r2 is considered unaccpetable

Answer 6

5%. | More leeway for shrinkage.

Answer 7

F2! Unique variance for predictors.

Answer 8

It's based on R2. IT tells us the unique effect of a variable on the outcome. The f2 gives an effect size for the proportion of residual variance explained - for unique effects of a predictor

Answer 9

UP. SO, f2 gives an effect size for the PROPORTION of residual variance explained.

Answer 10

If two or more explanatory variables have a linear relationship with the dependent variable, the regression is called a multiple linear regression. Linear is one predictor variable with a dependent variable.

Answer 11

Generalise the sample model to the population

Answer 12

Because violating these assumptions can affect how well the regression model fits the data and how well the regression model can be generalised .Calls into question the validity

Answer 13

The latter

Answer 14

If data is linear or not

Answer 15

It's a plot of fitted values not about prediction. If you have multiple predictors in your model, you may just want to look at one. BUT This plot is NOT the effect of X on Y. This is just a plot of FITTED VALUES. This is the plot of the fitted values on the X axis. On the Y axis we have the residuals, here showing ranges for regular residuals and standardised residuals (error), how far away observed data points are from line of best fit.

Answer 16

We want to see a horizontal line going through 0 on the Y axis to see if residual data points are randomly dispersed around horizontal axis. Which means model probably OK for linearity.

Answer 17

Try transforming the data with Log etc.

Answer 18

Because the model would then be about the transformed variable than the actual one. And there may not be a significant relationship between untrasnformed predictor and outcome variable. So perhaps look at a different test to linear regression.

Answer 19

1. To check error terms are not correlated, you can look at the correlation coefficient I think mentioned in correlations the other week... 2. For normally distributed error, look at partial regression plot (residuals vs. predicted) and see if there is an absence of a pattern. Want them to be random, normally distributed with mean of 0. BUT the predictors do NOT need to be normally distributed, just improves chances of this assumption being met 3. Homoscedasticity - for each value or level of the predictor, variance of error term is constant (residuals should have the SAME variance). This is looked at visuallt

Answer 20

False - know this

Answer 21

Because it's an error term, which represents what is left over therefore it won't be exactly on the line or an error term would be zero. The closer the better, yes, but each individual data point has its own error. The linear equation has error term with subscript i. I's out come (Y) is = the intercept, slope, plus i's resisual for error.

Answer 22

That at each level of the predictor, resisuals have the same variance. So equal vairances. Heteroscedasticity = unequal

Answer 23

They tell us how good the model is and if it's modelling what we think it is modelling.

Answer 24

The estimated outcome values

Answer 25

The deviations of observed from predicted values

Answer 26

They allow for easier interpretation (why?)

Answer 27

unstandardised coefficient, the standard error, and the p-value

Answer 28

STANDARDISED coefficient, AND confidence intervals instead of the standard error

Answer 29

A more precise estimate of the true population value, whereas a wide CI indicates more uncertainty above the true value, usually due to sample sie.

Answer 30

Unstandardised (B) because 1 unit = difference between male and female as an example. Whereas Beta (b) which refers to 1 SD change, is more difficult to show difference between male and female

Answer 31

No. Unstandardised (B) can be useful for categorical predictors due to 1 unit change indicating more hepful information than 1 unit change.

Answer 32

Because in a MR, each coefficient is adjusted for all the other predictors in the model. MR takes into account variance explained by other IVs in the model.

Answer 33

Because when formulating the prediction equation we are PREDICTING, and using the data we can not predict what someones error term would be because we don't know it - it does not exist and we are just using a regression line to predict the score. We are not observing a score and seeing how far off the model is - we did that with observed data. So this is prediction.

Answer 34

Predicted score. It means it wasn't something observed, but something you think will happen if you use the model you hope it predicted. But Y(i) is actual observed scores.

Answer 35

SCALE issue. Because unstandardised coefficients do not have a mean of zero, and are no z score versions therefore can not compare directly. As predictors might be measured on different scales, standardised coefficients (beta values) are the z score version of B, which means they all have the same mean of zero and a SD of one, so can compare them directly

Answer 36

Because the standardised co efficient data values are Z score version of the b values (the unstandardised values) Which all have the same mean (0). Standard deviation, in other words, is 1. So can compare directly.

Answer 37

The standardised coefficient. IT tells us how much outcome changes for each increase in 1 SD OF THE PREDICTOR.

Answer 38

When two predictor variables overlap alot, so HIGH intercorrelations between preditor variables. If you have a VIF more than 10, something is multicollinear Tolerances

Answer 39

That perhaps these variables, given their overlap, are measuring something similar conceptually

Answer 40

Multiple ways of measuring the same conceptual thing can give rich data, and help explain some construct with more varaince.

Answer 41

The proportion of variability in the outcome uniquely accounted for by that predictor. So, just like correlations, its is the unique shared vriance of that predictor taking into account shared variance of the other predictor.

Answer 42

sr2 and r2

Answer 43

Eah time you had a predictor variable you decrease your degrees o freedom. This means your r2 value will appraoch 1, and r2 appraoching 1 meansyour model will explain 100% of the variance. But our model will probably fitting noise. It might just be a model specific to your sample and not generalisable to the population

Answer 44

By computing an F ratio. A

Answer 45

By computing an F ratio.

Answer 46

That the equation predicts a significant proportion of the variability in the Y scores (i.e more than would be expected by chance alone) It examines whether total model variance accounted for is significantly greater than 0.

Answer 47

The F ratio compares the variance predicted by the model with the variance that’s left over. The residual variance. Or the variance not predicted by the model.

Answer 48

By taking the sum of squares and dividing my degees of freedom.

Answer 49

TRUE. Because this means a good model

Answer 50

The ANOVA table - specifically the F ratio and the P value.

Answer 51

Whatevr variance the model we made explains AND variance in data that is leftover.

Answer 52

residual variance

Answer 53

Expect model sum of squares to be GREATER than residual sum of squares

Answer 54

That it is greater than the error sum of squares

Answer 55

Dummy code. 0 and 1. If 1 refers to woman, then first variable recoding is Woman YES NO. Then Man, YES NO

Answer 56

Woman YES NO

Answer 57

Three. one is reference group, and two is man or woman. These will go into multiple predictors.

Linear Regression Flashcards

(98 cards)