Week 3: Regression Flashcards

Question

Adjusted R squared tells

Answer 1

how well R squared generalises to population

Answer 2

predictor variable explains the variance in the outcome variable, but adjusts the statistic based on the number of independent variables in the model.

Answer 3

It’s a more conservative statistic for how much variance in the outcome variable the predictor variable explains

Answer 4

adjusted R squared will decrease

Answer 5

SSM (SST - SSR)/SST

Answer 6

explained variance (SSM) to total variancw (SST)

Answer 7

overall model (fitted regression line) is a good fit

Answer 8

* Sum of squares (SS) are total values * Can be expressed as averages * These are called Mean squares MS

Answer 9

mean sum of squares (MSM)

Answer 10

Mean sum of residuals (MSR)

Answer 11

MSM to MSR

Answer 12

account for a large portion of variance MSM as compared to what is left - residuals MSR

Answer 13

number of variable in the model

Answer 14

number of observation minus number of parameters

Answer 15

* Line coefficients is intercept B0 and slope bi

Answer 16

* change of outcome associated with a unit change in predictor

Answer 17

indicates how far off you would be, on average, if you were to use the independent variable and th model to predict scores on the dependent variable

Answer 18

beta = r standardised coefficient gives correlation coefficient in simple regresion

Answer 19

t statistic and associated p-value

Answer 20

variance that we cannot explain vs variance we can explain with model

Answer 21

1. Variale type = outcome must be continous and predictors can be continous or dichotomous 2. Non-zerio variance - predictors must not have zero variance 3. Independent = all values of outcome should come from different person 4. Linearity = relationship we model is in reality linear 5. Hommoscedasticity -> for each value of the predictors the variancw of the error term should constant 6. Independent Erros: For any pair of observations, the error terms should be uncorrelated (see Durbin-Watson test) 7. Normally distributed errors

Answer 22

Good = all data points occupy all four quarrters of plot Bad = residuals look like a cone

Answer 23

“having the same scatter.”

Answer 24

hetrodasticity --> points higher on x axis have larger variance than smaller ones , points are at widely varying distances from regression line

Answer 25

Bad histogram = positively skewed

Answer 26

causation e.g., even if tthey make sense

Answer 27

unknown variable could drive the effect

Answer 28

relationship between the predictor variable - visits to the pub - and the outcome variable - exam score. There is a correlation between the two variables, but would we really think that more visits to the pub would cause better exam performance? Perhaps there was a third variable that might explain the link? Maybe there was a support session on statistics that was held between 4 and 5pm in a building next to a Pub?

Answer 29

Does poverty levels predict the number of teen births?

Answer 30

x = poverty rate, which is the percent of the state’s population living in households with incomes below the federally defined poverty level. y = year 2002 birth rate per 1000 females 15 to 17 years old

Answer 31

H0: The slope equals 0, i.e. poverty levels do not predict teen birth rate H1: The slope is different than 0, i.e. poverty levels predict teen birth rate

Answer 32

The slope (Β1= 1.373) indicates that the 15 to 17 year old birth rate increases 1.373 units, on average, for each one unit (one percent) increase in the poverty rate. The intercept (B0: =4.267) means that if there were states with poverty rate = 0, the predicted average for the 15 to 17 year old birth rate would be 4.267 for those states.

Answer 33

1. Does not imply causation 2. All we can say is that two variables are related/associated 3. X and y can be swapped 4. One outcome value 5. No regression line on scatterplots!

Answer 34

1. Independent variable influences the dependent (outcome) variable 2. X and y cannot be swapped! 3. Has a model: equation to allow predictions outside of current measurements 4. Regression line of the model on a scatterplot

Answer 35

p < 0.001 as p is never 0

Answer 36

Our model is significantly better at predicting the data than the null model (F (1, 118) = 729.43, p<.001) and explains 86% of the variance in our data (R2=.86)

Answer 37

y = 3.19 x + 391.67 Our model is significantly better at predicting the data than the null model (F (1, 118) = 729.43, p<.001) and explains 86% of the variance in our data (R2=.86). For every 1 unit increase in alcohol there is a 3.19 increase in break reaction time (B = 3.19, t = 27.01, p<.001)

Answer 38

A - biserial and point biserial correlation , Pearson correlaiton coefficient can be usd witth binary and ccategorical variables

Answer 39

D As news exposure increases by 1 standard deviation, depression decreases by 0.224 of a standard deviation

Answer 40

D We don’t know the overall variability, but only the error. The other options are wrong because we do not know how much variability there is in depression score. We don’t measure the variability of the population, but only of the observed one, and this is the sample. If, instead of closing with “depression score” there was the specification “of the sample”, then this would have been correct.

Answer 41

B The proportion of the variation in the outcome variable (Y) that is predictable from the predictor variable (X). A measure of how much variability in one variable can be “explained by another”. R² shows how well terms (data points) fit a model curve or line. An R² value of 0.78 indicates that 78% of the variation in Y is determined by the relationship between Y and X.

Answer 42

A - +/- 0.1 represents small, +/- 0.3 represents medium and +/- medium effect

Answer 43

A - high scores on one scale tend tto produce high score son other and low scores on one also correspond with low socres in another

Answer 44

A , not D as variables correlated are not independent

Answer 45

* R squared = 0.03 * DF = 1,664 (DF of regression, DF of residual) * P-value = < 0.001 * Adjusted R squared = 0.02

Answer 46

* overall proportion of variance in outcome explained by predictor is significant * Used to calculate significance of predictor

Answer 47

___ (e.g., 1.373x) increase in y

Week 3: Regression Flashcards

(85 cards)