Chapter 3 lecture Flashcards
difference regression and correlation
a regression line predicts the dependent variable y on the basis of the independent variable x. It describes the relationship between x and the estimated values of y at the various levels of x.
a correlation describes the strength of the association between y and x. It indicates to what extend the data points deviate from the regression line.
what happens if r increases
then the data points will be closer to the regression line
correlation cannot be used to..
describe the lineair relationship between a and b
intercept =
a, value of y when x=0
slope =
b, how much y changes if x increases with 1.
positive b = … association, negative b = … associaton
positive, negative
u should draw the regression line so that…
there are as many dots above as below the line
residual
difference between the observation and the prediction (which is the regression line)
choose the line with the…
smallest sum of squares
sum of squares
sum van (y-y^)^2
hoe heet de lijn met de least sum of squares
least squares line
if we do not know x, what is the best guess for y
average y!
dus wat is het idee achter een regressieanalyse
kijken naar het verschil tussen het average (zonder x) en de regressielijn. how much is the error decreased by adding the predictor?
wat is r2
the proportional decrease in the prediction error
wat betekenen large/small r2?
large r2 -> groot verschil tussen average and prediction. dit betekent dat je een goede predictielijn hebt!
small r2 -> klein verschil tussen average and prediction. dit betekent dat je een slechte predictielijn hebt, je had net zo goed het gemiddelde kunnen gebruiken! dat geeft dan dezelfde informatie.
r2 formule
(total SS - RSS)/total SS
wat is SS
SS= total sum of squares = total variance:
vanaf punt tot average/mean (rechte lijn) = 𝒚-
wat is RSS
RSS = residual sum of squares
vanaf punt tot regression line = y^
RSS -> R, dus vanaf REGRESSION prediction line
wat is de relatie tussen total ss en rss
de RSS zit verwerkt in de total SS
total sum of squares (SS) = (formule kort)
SS (y-y-) = RSS (y-y^) + SSR (y^- y-)
tekening relatie SS, SSR, RSS
—————- SS (tot mean/average)
———– RSS (tot regressielijn)
——(SSR) (tot regressielijn rest)
good regression model =
small RSS
large r2
regression model does better than simply predicting the mean.
r2 uitleg definitie
the proportion of variance in y explained by x
hoe kan je r ook berekenen
regression coefficient ^2
de regressielijn moet de … mogelijke RSS hebben
LAAGST (dan ligt het data punt het dichtste bij de predictielijn)
kijken naar samenvatting tekening van rss etc.
oke echt doen he
wat als RSS ongeveer = SS
dan r2 = 0
wat als SS > RSS
r2 > 0 (goed)
wat is de predictor van ss
de mean
wat is de predictor van rss
de regressielijn
hoe kan je de value van r2 interpreteren
“the variance around the regression line is … % (/100!) less than the total variance.”
wat als je de regressie voor de populatie wil testen
𝜇𝑦 =𝛼+𝛽𝑥
assumpties van hypothese test regressie
- random
- x and y are linear related
- for every value of x, y is normally distributed with the same standard deviation
wat is H0 en Ha bij regressielijn testen
H0: 𝛽= 0 vs HA: 𝛽≠ 0
(of one sided)
wat is de mean en sd van de sampling distribution
mean = 𝛽
standard deviation = se
hoe p value van slope berekenen
2*t.dist!!
hoe bereken je 𝑡.025
t.inv(0,975;df) !!!!!
the closer the data points are to the regression line…
the smaller RSS
the larger r2
the smaller SSR
explained variance
Explained variance does not
actually mean that we
have explained anything, at least
not in a causal sense. It simply
means that we can use one or
more variables to predict things
more accurately than we could
before. It is the proportion by
which the variance of the
prediction errors shrinks.
regression and correlation difference kort
regression indicates what a relationship looks like and how you can predict the variable y.
correlation indicates how strong the relationship is.
t becomes (slope and standard error)
larger with larger scope
smaller with larger standard error
r2 interpretatie kort
..% of the variance of y can be explained by x
wat als regressielijn helemaal verticaal is
r=0
residual ander woord
prediction error!
relationship between b and r
correlation r is the standardized version of b