Week 2 SCM (regressions) Flashcards

Question

what does b1 represent in regression?

Answer 1

the gradient of the regression line

Answer 2

- parametre estimates are also known as coefficients - so in regression they are b0 and b1

Answer 3

t= b1/SE of the model

Answer 4

SSt uses the difference between the observed data and the mean value of y. This shows the total variance in the data. SSr uses the difference between the observed data and the regression line. This shows the error in the model. SSm uses the difference between the mean value of y and the regression line. This shows the improvement due to the model.

Answer 5

- The coefficient of determination shows the percentage variation in Y which is explained by all the x values together - the coefficient of correlation is the degree of the relationship between two variables x and y - when there is only one x variable the coefficient of determination is the same as the coefficient of correlation - the coefficient of determination is between 0 and 1. It cannot be negative because it is squared. The higher the better. - The coefficient of correlation is between -1 and 1. 1 would indicate that the two variables are moving in unison and -1 would indicate that the two variables are perfect opposites. 0 would suggest that they are not correlated at all.

Answer 6

SSt= SSr + SSe

Answer 7

Rsquared= SSr/SSt also R squared = 1- SSe/SSt

Answer 8

- its number always falls between 0 and 1 because it is a proportion - If R squared is 1, all the data points fit perfectly on the regression line. The predictor x accounts for all the variation in y. - If R squared is 0 the estimated regression line is perfectly horizontal. The predictor x accounts for none of the variation in y. - So if R squared = 0.2, 20% of the variation in y is explained by variation in x

Answer 9

- we calculate the fit of the data against the mean, and compare it with the fit of the data to the regression line - the equation we use to calculate the fit of each model is this: deviation= sum( (observed - model)^2) - The SSt represents how good of a model the mean is - The SSr represents how good of a model the regression is - We then use these to calculate the SSm, which represents how much better the regression model is to the mean model - We can then calculate Rsquared to represent the proportion of improvement due to the model - Rsquared = SSm/SSt - in simple regression the square root of this value is the same as pearsons correlation coefficient

Answer 10

- F tests follow the usual test statistic formula of measuring the amount of systematic variance/the amount of unsystematic variance - F is based on the ratio of : improvement due to the model (SSm) and the difference between the model and the observed data (SSr) - however instead of using the sum of squares we standardise that to use the Mean sum of squares, by dividing by the degrees of freedom - For SSm the degrees of freedom are the number of variables in the model. For SSr they are the number of observations minus the number of parametres being estimated - so: F= MSm/MSr - so its a measure of how much the model has improved the prediction of the outcome compared to the level of innacuracy in the model - if the model is good we would expect the improvement (Top) to be large and innacury (bottom) to be small. Therefore the larger the F value, the better the model. - A good model must have an F value of greater than 1.

Answer 11

- the regression coefficient B1 represents the regression lines predictive power - if the reggression line was useless at prediction B1 would = 0 and the line would be flat - So if a variable significantly predicts an outcome it should have a value of B1 significantly higher than 0 - We can use a t-test to test this - This t-test tests the null hypothesis that the value of B1=0 - This t-test is based on whether the value of B1 is big compared to the amount of error in the estimate - To estimate how much error we would expect to find based of sampling difference alone we use the standard error of the B values of each sample. (If we were to plot a frequency distribution of all the b values of all our samples this would be the standard deviation of this distribution) - If the standard error is very small it means there is little variation between b values - The T-test therefore tells us whether B is different relative to the level of variation we'd expect to to find from sampling differences - T = Bobserved - Bexpected/ standard error of B - Because we are testing the null hypothesis that B=0 we can simplify this to: T= B/SE of B - The larger T is the better

Answer 12

N - P - 1 N = total sample size P = number of predictors

Answer 13

The ratio of the mean square error that is explained by the model and the mean square error that is not explained by the model MSm/MSr

Answer 14

The ratio of the variance that is explained by the model and the total variance SSm/SSt

Week 2 SCM (regressions) Flashcards

(39 cards)