Intro to linear models Flashcards

Question

coefficient of determination - R^2

Answer 1

R^2 qantifies the amount of variance in our model that is accounted for by our predictors. it is presented as a decimal/percentage and the more variability accounted for, the better equations: R^2 = SS model / SS total R^2 = 1 - SSresidual/SStotal

Answer 2

SStotal = (observed y value - mean of y) squared and then added up for all observations SStotal is the squared distance of each data point from the mean of y (since the mean is our baseline/best guess of what y should be)

Answer 3

SSmodel = (model estimated y value - mean of y) squared and the added up for all observations it measures the distance from the model predicted line to the line for mean of y

Answer 4

this is used when our linear model has more than 2 predictors as it adjusts for n and k. with more predictors there is more chance of random sampling fluctuation which has an effect on R^2 equation: adjusted R^2 = 1 - (1 - R^2) * (n-1)/(n-k-1)

Answer 5

F-tests test the significance of the overall model as a whole by testing the significance of the F-ratio

Answer 6

the F-ratio is the ratio of explained to unexplained variance. F-ratio tests the null hypothesis that all regression slopes (model lines) will be 0 as this means our predictors tell us nothing about the outcome. if our predictors do explain some variance, our F-ratio will be significant - bigger F-ratios indicate better models. When the null is true, the F-ratio will be close to 1 so we want it to be >1 so there is more model than residual variance equation: F = MSmodel / MSresidual

Answer 7

mean squares are sums of squares calculations divided by the associated degrees of freedom

Answer 8

= n - k - 1 as these are based off our model in which we estimate k beta terms ( = -k) and the intercept ( = -1)

Answer 9

= n - 1 in order to estimate y, all but one value of y are free to vary, hence -1

Answer 10

= k as its dependent on our beta estimates hence k

Answer 11

an f-ratio is evaluated against an f-distribution using dfModel and dfResidual, our alpha level and our critcal values

Answer 12

the maximum number of logically independent values which have freedom to vary in the data sample

Answer 13

standardising allows us to compare the effects of variables on arbitrary scales or scales with differing units -but be careful, for regression, unstandardised coefficients are often more useful equation: ^β*i = ^βi * (Sx/Sy) standardised β = estimated β * (sd of x / sd of y)

Answer 14

method of standardising for continuous variables that transforms the IV and DV into z-scores (mean = 0, sd = 1) prior to fitting the model equation zx = xi - mean of x / Sx translation = divide the deviation from the mean by the standard deviation (for both x and y values)

Answer 15

R^2, F-test and t-test remain the same our beta coefficients will change e.g. β0 = 0 when all variables are standardised β1 = increase in y (in sd units) for every sd increase in x standardised slope = correlation coefficient (r)

Answer 16

variables that can only take discrete values that are mutually exclusive. a binary variable is a type of categorical variable with only 2 levels.

Answer 17

binary variables are coded as 0 and 1 and often referred to as dummy variables - when we have multiple dummies we use the general procedure of dummy coding. one level is chosen as a baseline and all other levels are compared to this baseline. for any categorical variables we create k-1 dummy variables.

Answer 18

β0 = expected value of y when x is 0 this is the mean of our baseline level β1 = the predicted difference between the means of the two groups this interpretation becomes more complicated as we get more predictors

Answer 19

1. choose a baseline variable 2. assign everyone in the baseline group 0 for all k-1 dummy variables 3. assign everyone in the next group a 1 for the first dummy variable and 0 in all others 4. repeat step 3 until all k-1 dummy variables have a 0 and 1 assigned 5. enter the dummy variables into your regression

Answer 20

test score based on 3 method of revising: re-read (baseline), summarise notes or self test β0 = mean of re-reading β1 = difference between mean of summarise and intercept β2 = difference between mean of self test and intercept

Intro to linear models Flashcards

(44 cards)