Week 8 Flashcards
What does correlation tell us?
About the degree of association between two variables
What is the regression equation used to do?
Express the relationship between 2 or more variables, therefore allowing us to estimate one variable on the basis of another
What is testing for statistical significance of the regression slope and how is it done?
Tests if regression coefficient is significantly different from 0.
1) find r and b
2) set hypothesese (H0: β=0)
Rest of normal steps, use t distribution
t = (b-0)/s
Where s is standard error of slope estimate
3 methods to tell us coefficient is significantly different from 0?
Use confidence intervals for coefficient
Statistical test using t-dist.
Use p-value
What is the p-value?
The exact level of α at which the null hypothesis will be rejected
Same as:
Probability of being wrong if we reject H0
What is the decision rule in hypothesis testing when only given p-value and α?
Reject null hypothesis if p-value
What does the coefficient of determination measure?
It measures the proportion of total variation in the dependent variable (y) that is explained by the variation in the independent variable (x)
How to calculate R^2 if there is only one independent variable?
R^2 = r^2
How to calculate R^2 if more than one IV?
R^2 = SSR/SStotal = 1 - SSE/SStotal
SStotal = Σ(Y-Ybar)^2 SSR = Σ (Yhat-Ybar)^2 SSE = Σ (Y-Yhat)^2
What does the regression output give?
An estimate of the joint significance of all variables.
How to test goodness of fit? (Which test?)
F test
3 ways to analyse real data regression output?
Look at (+/-) sign and significance of the coefficients
R^2
F statistic
Different parts of multiple regression equation? (5)
x = independent variable b = regression coefficient a = y-intercept k = number of independent variables y = dependent variable
Equation used to test for significance of coefficients and DofF?
t = (b-0)/s
DofF = n - (k+1)
n = number sampled
k = number IVs
+1 is to account for constant ‘a’
T-dist. test
What are control variables and why?
Variables put in for the purpose of excluding possible alternative explanations for significant relationships between y and the variable(s) of interest.
By including more CVs in the regression equation, this allows us to analyse the relationship between one of the variables and y.